diff mbox series

How to tame CI?

Message ID 87fs5aho6e.fsf@secure.mitica
State New
Headers show
Series How to tame CI? | expand

Commit Message

Juan Quintela July 26, 2023, 12:06 p.m. UTC
Hi

Now a not on CI, thas has been really bad.  After too many problems
with last PULLS, I decided to learn to use qemu CI.  On one hand, it
is not so difficult, even I can use it O:-)

On the other hand, the amount of problems that I got is inmense.  Some
of them dissapear when I rerun the checks, but I never know if it is
my PULL request, the CI system or the tests themselves.

So it ends going something like:

while (true); do
- git pull
- git rebase
- git push ci blah, blah
- Next day comes, and too many errors, so I rebase again

The last step takes more time than expected and not always trivial to
know how the failure is.

This (last) patch is not part of the PULL request, but I have found
that it _always_ makes gcov fail.  I had to use bisect to find where
the problem was.

https://gitlab.com/juan.quintela/qemu/-/jobs/4571878922

To make things easier, this is the part that show how it breaks (this is
the gcov test):

357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I could use help to know how a change in test/qtest/migration-test.c
can break block layer tests, I am all ears.

This is the commit:

https://gitlab.com/juan.quintela/qemu/-/commit/7455ee794c01662b5efa1ee67396d85943663ded

Yes, I tried several times.  It always fails on that patch.  The
previous commint passes CI with flying colors.

Later, Juan.

Comments

Peter Maydell July 26, 2023, 1 p.m. UTC | #1
On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
> To make things easier, this is the part that show how it breaks (this is
> the gcov test):
>
> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
> >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
> @@ -1,5 +1,21 @@
> -....
> +...F
> +======================================================================
> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
> +    self.assertEqual(log, """\
> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
> +  wrote 524288/524288 bytes at offset 0
> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +  wrote 524288/524288 bytes at offset 524288
> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> ++ read failed: Permission denied
> +- read 1048576/1048576 bytes at offset 0
> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +

This iotest failing is an intermittent that I've seen running
pullreqs on master. I tend to see it on the s390 host. I
suspect a race condition somewhere where it fails if the host
is heavily loaded.

-- PMM
Thomas Huth July 26, 2023, 1:32 p.m. UTC | #2
On 26/07/2023 15.00, Peter Maydell wrote:
> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>> To make things easier, this is the part that show how it breaks (this is
>> the gcov test):
>>
>> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>> stderr:
>> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>> @@ -1,5 +1,21 @@
>> -....
>> +...F
>> +======================================================================
>> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>> +----------------------------------------------------------------------
>> +Traceback (most recent call last):
>> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>> +    self.assertEqual(log, """\
>> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
>> +  wrote 524288/524288 bytes at offset 0
>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +  wrote 524288/524288 bytes at offset 524288
>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> ++ read failed: Permission denied
>> +- read 1048576/1048576 bytes at offset 0
>> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +
> 
> This iotest failing is an intermittent that I've seen running
> pullreqs on master. I tend to see it on the s390 host. I
> suspect a race condition somewhere where it fails if the host
> is heavily loaded.

It's obviously a failure in an iotest, so let's CC: the corresponding people 
(done now).

  Thomas
Daniel P. Berrangé July 26, 2023, 2:17 p.m. UTC | #3
On Wed, Jul 26, 2023 at 02:00:03PM +0100, Peter Maydell wrote:
> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
> > To make things easier, this is the part that show how it breaks (this is
> > the gcov test):
> >
> > 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
> > >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
> > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> > stderr:
> > --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
> > +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
> > @@ -1,5 +1,21 @@
> > -....
> > +...F
> > +======================================================================
> > +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
> > +----------------------------------------------------------------------
> > +Traceback (most recent call last):
> > +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
> > +    self.assertEqual(log, """\
> > +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
> > +  wrote 524288/524288 bytes at offset 0
> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > +  wrote 524288/524288 bytes at offset 524288
> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > ++ read failed: Permission denied
> > +- read 1048576/1048576 bytes at offset 0
> > +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > +
> 
> This iotest failing is an intermittent that I've seen running
> pullreqs on master. I tend to see it on the s390 host. I
> suspect a race condition somewhere where it fails if the host
> is heavily loaded.

Since it is known flakey, we should just commit the change

--- a/tests/qemu-iotests/tests/copy-before-write
+++ b/tests/qemu-iotests/tests/copy-before-write
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# group: auto backup
+# group: backup
 #
 # Copyright (c) 2022 Virtuozzo International GmbH
 #


and if someone wants to re-enable it, they get the job of fixing its
reliability first.

With regards,
Daniel
Juan Quintela July 26, 2023, 2:36 p.m. UTC | #4
Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Wed, Jul 26, 2023 at 02:00:03PM +0100, Peter Maydell wrote:
>> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>> > To make things easier, this is the part that show how it breaks (this is
>> > the gcov test):
>> >
>> > 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>> > >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3
>> > MALLOC_PERTURB_=44
>> > /builds/juan.quintela/qemu/build/pyvenv/bin/python3
>> > /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap
>> > -qcow2 copy-before-write --source-dir
>> > /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir
>> > /builds/juan.quintela/qemu/build/tests/qemu-iotests
>> > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>> > stderr:
>> > --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>> > +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>> > @@ -1,5 +1,21 @@
>> > -....
>> > +...F
>> > +======================================================================
>> > +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>> > +----------------------------------------------------------------------
>> > +Traceback (most recent call last):
>> > +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>> > +    self.assertEqual(log, """\
>> > +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at
>> > offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed:
>> > Permission denied\n'
>> > +  wrote 524288/524288 bytes at offset 0
>> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> > +  wrote 524288/524288 bytes at offset 524288
>> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> > ++ read failed: Permission denied
>> > +- read 1048576/1048576 bytes at offset 0
>> > +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> > +
>> 
>> This iotest failing is an intermittent that I've seen running
>> pullreqs on master. I tend to see it on the s390 host. I
>> suspect a race condition somewhere where it fails if the host
>> is heavily loaded.

What is weird to me is that I was unable to reproduce it on the previous
commit.  But with this one happened always.  No, I have no clue why, and
as said, it makes zero sense, it is for a binary that it is not used on
the block test.

Later, Juan.

>
> Since it is known flakey, we should just commit the change
>
> --- a/tests/qemu-iotests/tests/copy-before-write
> +++ b/tests/qemu-iotests/tests/copy-before-write
> @@ -1,5 +1,5 @@
>  #!/usr/bin/env python3
> -# group: auto backup
> +# group: backup
>  #
>  # Copyright (c) 2022 Virtuozzo International GmbH
>  #
>
>
> and if someone wants to re-enable it, they get the job of fixing its
> reliability first.
>
> With regards,
> Daniel
Daniel P. Berrangé July 26, 2023, 2:43 p.m. UTC | #5
On Wed, Jul 26, 2023 at 04:36:32PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Wed, Jul 26, 2023 at 02:00:03PM +0100, Peter Maydell wrote:
> >> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
> >> > To make things easier, this is the part that show how it breaks (this is
> >> > the gcov test):
> >> >
> >> > 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
> >> > >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3
> >> > MALLOC_PERTURB_=44
> >> > /builds/juan.quintela/qemu/build/pyvenv/bin/python3
> >> > /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap
> >> > -qcow2 copy-before-write --source-dir
> >> > /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir
> >> > /builds/juan.quintela/qemu/build/tests/qemu-iotests
> >> > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> >> > stderr:
> >> > --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
> >> > +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
> >> > @@ -1,5 +1,21 @@
> >> > -....
> >> > +...F
> >> > +======================================================================
> >> > +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
> >> > +----------------------------------------------------------------------
> >> > +Traceback (most recent call last):
> >> > +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
> >> > +    self.assertEqual(log, """\
> >> > +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at
> >> > offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed:
> >> > Permission denied\n'
> >> > +  wrote 524288/524288 bytes at offset 0
> >> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> > +  wrote 524288/524288 bytes at offset 524288
> >> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> > ++ read failed: Permission denied
> >> > +- read 1048576/1048576 bytes at offset 0
> >> > +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> > +
> >> 
> >> This iotest failing is an intermittent that I've seen running
> >> pullreqs on master. I tend to see it on the s390 host. I
> >> suspect a race condition somewhere where it fails if the host
> >> is heavily loaded.
> 
> What is weird to me is that I was unable to reproduce it on the previous
> commit.  But with this one happened always.  No, I have no clue why, and
> as said, it makes zero sense, it is for a binary that it is not used on
> the block test.

Your commit changes the migration test, which could change the overall
tests running time, and thus impact what tests are running in parallel.
This could be enough to trigger the race more reliably.

With regards,
Daniel
Vladimir Sementsov-Ogievskiy Oct. 5, 2023, 12:35 p.m. UTC | #6
On 26.07.23 16:32, Thomas Huth wrote:
> On 26/07/2023 15.00, Peter Maydell wrote:
>> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>>> To make things easier, this is the part that show how it breaks (this is
>>> the gcov test):
>>>
>>> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>>>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
>>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>>> stderr:
>>> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>>> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>>> @@ -1,5 +1,21 @@
>>> -....
>>> +...F
>>> +======================================================================
>>> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>>> +----------------------------------------------------------------------
>>> +Traceback (most recent call last):
>>> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>>> +    self.assertEqual(log, """\
>>> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
>>> +  wrote 524288/524288 bytes at offset 0
>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> +  wrote 524288/524288 bytes at offset 524288
>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> ++ read failed: Permission denied
>>> +- read 1048576/1048576 bytes at offset 0
>>> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> +
>>
>> This iotest failing is an intermittent that I've seen running
>> pullreqs on master. I tend to see it on the s390 host. I
>> suspect a race condition somewhere where it fails if the host
>> is heavily loaded.
> 
> It's obviously a failure in an iotest, so let's CC: the corresponding people (done now).
> 

Sorry for long delay.

Does it still fail?

In the test we expect that copy-before-write operation fails (because of throttling and timeout), and therefore snapshot is broken and next read from snapshot should fail.

But most probably the copy-before-write operation succeeded in this case for some reason.. I don't think that throttling and timeouts in block layer can guarantee some determinism.. But usually it works.

we use throttling with bps-write = 300 * 1024, i.e. 300KB per second. and cbw-timeout is set to 1 second.

Then we do write 512K,

then the comment say:
# We need second write to trigger throttling

and we write another 512K.

first 512K are written, and we should wait 512/300 = 1.7 seconds since _start_ of that write before issuing the second one.. But if write was slow we may have to wait less than a second from finish of the first write start the second one. Then timeout will not fire.

====

I see two possible ways to fix that:

1. decrease bps-write a bit. For example to 200 BPS.

2. rework the test to use null-co instead of real images. This way we will not suffer from unstable IO duration.


So, is the problem still fire sometimes?
Juan Quintela Oct. 5, 2023, 2:36 p.m. UTC | #7
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
> On 26.07.23 16:32, Thomas Huth wrote:
>> On 26/07/2023 15.00, Peter Maydell wrote:
>>> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>>>> To make things easier, this is the part that show how it breaks (this is
>>>> the gcov test):
>>>>
>>>> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>>>>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3
>>>> MALLOC_PERTURB_=44
>>>> /builds/juan.quintela/qemu/build/pyvenv/bin/python3
>>>> /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap
>>>> -qcow2 copy-before-write --source-dir
>>>> /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir
>>>> /builds/juan.quintela/qemu/build/tests/qemu-iotests
>>>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>>>> stderr:
>>>> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>>>> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>>>> @@ -1,5 +1,21 @@
>>>> -....
>>>> +...F
>>>> +======================================================================
>>>> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>>>> +----------------------------------------------------------------------
>>>> +Traceback (most recent call last):
>>>> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>>>> +    self.assertEqual(log, """\
>>>> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at
>>>> offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed:
>>>> Permission denied\n'
>>>> +  wrote 524288/524288 bytes at offset 0
>>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> +  wrote 524288/524288 bytes at offset 524288
>>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> ++ read failed: Permission denied
>>>> +- read 1048576/1048576 bytes at offset 0
>>>> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> +
>>>
>>> This iotest failing is an intermittent that I've seen running
>>> pullreqs on master. I tend to see it on the s390 host. I
>>> suspect a race condition somewhere where it fails if the host
>>> is heavily loaded.
>> It's obviously a failure in an iotest, so let's CC: the
>> corresponding people (done now).
>> 
>
> Sorry for long delay.
>
> Does it still fail?
>
> In the test we expect that copy-before-write operation fails (because
> of throttling and timeout), and therefore snapshot is broken and next
> read from snapshot should fail.
>
> But most probably the copy-before-write operation succeeded in this
> case for some reason.. I don't think that throttling and timeouts in
> block layer can guarantee some determinism.. But usually it works.
>
> we use throttling with bps-write = 300 * 1024, i.e. 300KB per second. and cbw-timeout is set to 1 second.
>
> Then we do write 512K,
>
> then the comment say:
> # We need second write to trigger throttling
>
> and we write another 512K.
>
> first 512K are written, and we should wait 512/300 = 1.7 seconds since
> _start_ of that write before issuing the second one.. But if write was
> slow we may have to wait less than a second from finish of the first
> write start the second one. Then timeout will not fire.
>
> ====
>
> I see two possible ways to fix that:
>
> 1. decrease bps-write a bit. For example to 200 BPS.
>
> 2. rework the test to use null-co instead of real images. This way we will not suffer from unstable IO duration.
>
>
> So, is the problem still fire sometimes?	

For me it is random.  When it happens, it do it forever.
And then it stops, and don't happens for a while.

It is not happening for me now.

Later, Juan.
diff mbox series

Patch

--- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
+++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
@@ -1,5 +1,21 @@ 
-....
+...F
+======================================================================
+FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
+    self.assertEqual(log, """\
+AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
+  wrote 524288/524288 bytes at offset 0
+  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+  wrote 524288/524288 bytes at offset 524288
+  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++ read failed: Permission denied
+- read 1048576/1048576 bytes at offset 0
+- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+
 ----------------------------------------------------------------------
 Ran 4 tests
-OK
+FAILED (failures=1)
(test program exited with status code 1)