diff mbox series

tests: migration-test: Allow test to run without uffd

Message ID 20220707184600.24164-1-peterx@redhat.com
State New
Headers show
Series tests: migration-test: Allow test to run without uffd | expand

Commit Message

Peter Xu July 7, 2022, 6:46 p.m. UTC
We used to stop running all tests if uffd is not detected.  However
logically that's only needed for postcopy not the rest of tests.

Keep running the rest when still possible.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Daniel P. Berrangé July 8, 2022, 9:57 a.m. UTC | #1
On Thu, Jul 07, 2022 at 02:46:00PM -0400, Peter Xu wrote:
> We used to stop running all tests if uffd is not detected.  However
> logically that's only needed for postcopy not the rest of tests.
> 
> Keep running the rest when still possible.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration-test.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
Thomas Huth July 18, 2022, 6:23 p.m. UTC | #2
On 07/07/2022 20.46, Peter Xu wrote:
> We used to stop running all tests if uffd is not detected.  However
> logically that's only needed for postcopy not the rest of tests.
> 
> Keep running the rest when still possible.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   tests/qtest/migration-test.c | 11 +++++------
>   1 file changed, 5 insertions(+), 6 deletions(-)

Did you test your patch in the gitlab-CI? I just added it to my testing-next 
branch and the the test is failing reproducibly on macOS here:

  https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
  https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275

(without your patch the whole test is skipped instead)

  Thomas
Peter Xu July 18, 2022, 7:14 p.m. UTC | #3
Hi, Thomas,

On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
> On 07/07/2022 20.46, Peter Xu wrote:
> > We used to stop running all tests if uffd is not detected.  However
> > logically that's only needed for postcopy not the rest of tests.
> > 
> > Keep running the rest when still possible.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   tests/qtest/migration-test.c | 11 +++++------
> >   1 file changed, 5 insertions(+), 6 deletions(-)
> 
> Did you test your patch in the gitlab-CI? I just added it to my testing-next
> branch and the the test is failing reproducibly on macOS here:
> 
>  https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
>  https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
> 
> (without your patch the whole test is skipped instead)

Thanks for reporting this.

Is it easy to figure out which test was failing on your side?  I cannot
easily reproduce this here on a MacOS with M1.

Or any hint on how I could kick the same CI as you do would help too.  I
remembered I used to kick the test after any push with .gitlab-ci.yml but
it seems it's not triggering for some reason here.
Daniel P. Berrangé July 19, 2022, 8:32 a.m. UTC | #4
On Mon, Jul 18, 2022 at 03:14:37PM -0400, Peter Xu wrote:
> Hi, Thomas,
> 
> On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
> > On 07/07/2022 20.46, Peter Xu wrote:
> > > We used to stop running all tests if uffd is not detected.  However
> > > logically that's only needed for postcopy not the rest of tests.
> > > 
> > > Keep running the rest when still possible.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >   tests/qtest/migration-test.c | 11 +++++------
> > >   1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > Did you test your patch in the gitlab-CI? I just added it to my testing-next
> > branch and the the test is failing reproducibly on macOS here:
> > 
> >  https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
> >  https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
> > 
> > (without your patch the whole test is skipped instead)
> 
> Thanks for reporting this.
> 
> Is it easy to figure out which test was failing on your side?  I cannot
> easily reproduce this here on a MacOS with M1.
> 
> Or any hint on how I could kick the same CI as you do would help too.  I
> remembered I used to kick the test after any push with .gitlab-ci.yml but
> it seems it's not triggering for some reason here.

It is now opt-in with gitlab,  'git push -o ci.variable=QEMU_CI=1' to
create the pipeline, then in the UI manually start the jobs you wish
to run. Or QEMU_CI=2 to auto-run everything.

Note for MacOS you'll need to configure Cirrus CI integration first
though, per .gitlab-ci.d/cirrus/README


With regards,
Daniel
Thomas Huth July 19, 2022, 10:28 a.m. UTC | #5
On 18/07/2022 21.14, Peter Xu wrote:
> Hi, Thomas,
> 
> On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
>> On 07/07/2022 20.46, Peter Xu wrote:
>>> We used to stop running all tests if uffd is not detected.  However
>>> logically that's only needed for postcopy not the rest of tests.
>>>
>>> Keep running the rest when still possible.
>>>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>    tests/qtest/migration-test.c | 11 +++++------
>>>    1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> Did you test your patch in the gitlab-CI? I just added it to my testing-next
>> branch and the the test is failing reproducibly on macOS here:
>>
>>   https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
>>   https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
>>
>> (without your patch the whole test is skipped instead)
> 
> Thanks for reporting this.
> 
> Is it easy to figure out which test was failing on your side?  I cannot
> easily reproduce this here on a MacOS with M1.

I've modified the yml file to only run the migration test in verbose mode 
and got this:

...
ok 5 /x86_64/migration/validate_uuid_src_not_set
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
source,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-ef2fMr/src_serial -drive 
file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid 
11111111-1111-1111-1111-111111111111 2>/dev/null -accel qtest
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
target,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-ef2fMr/dest_serial -incoming 
unix:/tmp/migration-test-ef2fMr/migsocket -drive 
file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel qtest
ok 6 /x86_64/migration/validate_uuid_dst_not_set
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
source,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-ef2fMr/src_serial -drive 
file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
target,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-ef2fMr/dest_serial -incoming 
unix:/tmp/migration-test-ef2fMr/migsocket -drive 
file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
**
ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status: 
assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
Bail out! 
ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status: 
assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): -5
qemu-system-x86_64: Unable to write to socket: Broken pipe
/var/folders/tn/f_9sf1xx5t14qm_6f83q3b840000gn/T/scripts81855ad8681d0d86d1e91e00167939cb.sh: 
line 9: 58011 Abort trap: 6           QTEST_QEMU_BINARY=./qemu-system-x86_64 
tests/qtest/migration-test

(see: https://cirrus-ci.com/task/5719789887815680?logs=build#L7205 )

So it seems like validate_uuid_dst_not_set was the last successful test, and 
it's likely failing with test_migrate_auto_converge ?

> Or any hint on how I could kick the same CI as you do would help too.  I
> remembered I used to kick the test after any push with .gitlab-ci.yml but
> it seems it's not triggering for some reason here.

As Daniel already said, you need to set up Cirrus-CI according to 
.gitlab-ci.d/cirrus/README.rst to get the macOS jobs in your CI.

  Thomas
Daniel P. Berrangé July 19, 2022, 10:37 a.m. UTC | #6
On Tue, Jul 19, 2022 at 12:28:24PM +0200, Thomas Huth wrote:
> On 18/07/2022 21.14, Peter Xu wrote:
> > Hi, Thomas,
> > 
> > On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
> > > On 07/07/2022 20.46, Peter Xu wrote:
> > > > We used to stop running all tests if uffd is not detected.  However
> > > > logically that's only needed for postcopy not the rest of tests.
> > > > 
> > > > Keep running the rest when still possible.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >    tests/qtest/migration-test.c | 11 +++++------
> > > >    1 file changed, 5 insertions(+), 6 deletions(-)
> > > 
> > > Did you test your patch in the gitlab-CI? I just added it to my testing-next
> > > branch and the the test is failing reproducibly on macOS here:
> > > 
> > >   https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
> > >   https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
> > > 
> > > (without your patch the whole test is skipped instead)
> > 
> > Thanks for reporting this.
> > 
> > Is it easy to figure out which test was failing on your side?  I cannot
> > easily reproduce this here on a MacOS with M1.
> 
> I've modified the yml file to only run the migration test in verbose mode
> and got this:
> 
> ...
> ok 5 /x86_64/migration/validate_uuid_src_not_set
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> source,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-ef2fMr/src_serial -drive
> file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid
> 11111111-1111-1111-1111-111111111111 2>/dev/null -accel qtest
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> target,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> unix:/tmp/migration-test-ef2fMr/migsocket -drive
> file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel
> qtest
> ok 6 /x86_64/migration/validate_uuid_dst_not_set
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> source,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-ef2fMr/src_serial -drive
> file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> target,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> unix:/tmp/migration-test-ef2fMr/migsocket -drive
> file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> **
> ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> Bail out!
> ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)

This is the safety net we put it to catch case where the test has
got stuck. It is set at 2 minutes.

There's a chance that is too short, so one first step might be to
increase to 10 minutes and see if the tests pass. If it still fails,
then its likely a genuine bug

> qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): -5
> qemu-system-x86_64: Unable to write to socket: Broken pipe
> /var/folders/tn/f_9sf1xx5t14qm_6f83q3b840000gn/T/scripts81855ad8681d0d86d1e91e00167939cb.sh:
> line 9: 58011 Abort trap: 6           QTEST_QEMU_BINARY=./qemu-system-x86_64
> tests/qtest/migration-test
> 
> (see: https://cirrus-ci.com/task/5719789887815680?logs=build#L7205 )
> 
> So it seems like validate_uuid_dst_not_set was the last successful test, and
> it's likely failing with test_migrate_auto_converge ?

Agreed, looks like auto_converge test, which is the first test that
actually tries to run a migration to completion. 

With regards,
Daniel
Peter Xu July 19, 2022, 7:53 p.m. UTC | #7
On Tue, Jul 19, 2022 at 11:37:55AM +0100, Daniel P. Berrangé wrote:
> On Tue, Jul 19, 2022 at 12:28:24PM +0200, Thomas Huth wrote:
> > On 18/07/2022 21.14, Peter Xu wrote:
> > > Hi, Thomas,
> > > 
> > > On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
> > > > On 07/07/2022 20.46, Peter Xu wrote:
> > > > > We used to stop running all tests if uffd is not detected.  However
> > > > > logically that's only needed for postcopy not the rest of tests.
> > > > > 
> > > > > Keep running the rest when still possible.
> > > > > 
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > ---
> > > > >    tests/qtest/migration-test.c | 11 +++++------
> > > > >    1 file changed, 5 insertions(+), 6 deletions(-)
> > > > 
> > > > Did you test your patch in the gitlab-CI? I just added it to my testing-next
> > > > branch and the the test is failing reproducibly on macOS here:
> > > > 
> > > >   https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
> > > >   https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
> > > > 
> > > > (without your patch the whole test is skipped instead)
> > > 
> > > Thanks for reporting this.
> > > 
> > > Is it easy to figure out which test was failing on your side?  I cannot
> > > easily reproduce this here on a MacOS with M1.
> > 
> > I've modified the yml file to only run the migration test in verbose mode
> > and got this:
> > 
> > ...
> > ok 5 /x86_64/migration/validate_uuid_src_not_set
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > source,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid
> > 11111111-1111-1111-1111-111111111111 2>/dev/null -accel qtest
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > target,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel
> > qtest
> > ok 6 /x86_64/migration/validate_uuid_dst_not_set
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > source,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > target,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> > **
> > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> > Bail out!
> > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> 
> This is the safety net we put it to catch case where the test has
> got stuck. It is set at 2 minutes.
> 
> There's a chance that is too short, so one first step might be to
> increase to 10 minutes and see if the tests pass. If it still fails,
> then its likely a genuine bug

Agreed, it worths another try.

Thanks both for your answers on CI.  I wanted to go through the setup of
Cirrus CI and kick it myself, but I got stuck at the step on generating the
API token for Cirrus.

It seems the button to generate API token just didn't have a respond for me
until I refresh the page (then I can see some token generated), however I
still haven't figured out a way to see the initial 6 letters since they'll
be always masked out..  Changing browser didn't work for me either. :(
Thomas Huth July 20, 2022, 10:52 a.m. UTC | #8
On 19/07/2022 21.53, Peter Xu wrote:
...
> It seems the button to generate API token just didn't have a respond for me
> until I refresh the page (then I can see some token generated), however I
> still haven't figured out a way to see the initial 6 letters since they'll
> be always masked out..  Changing browser didn't work for me either. :(

I haven't tried in a while, but IIRC the token is indeed only shown at the 
first access - and if that's not happening for you, then there is likely 
something broken. Are you using some plug in like uMatrix or the like? Maybe 
it helps to switch that off?

  Thomas
Peter Xu July 20, 2022, 12:55 p.m. UTC | #9
On Wed, Jul 20, 2022 at 12:52:02PM +0200, Thomas Huth wrote:
> On 19/07/2022 21.53, Peter Xu wrote:
> ...
> > It seems the button to generate API token just didn't have a respond for me
> > until I refresh the page (then I can see some token generated), however I
> > still haven't figured out a way to see the initial 6 letters since they'll
> > be always masked out..  Changing browser didn't work for me either. :(
> 
> I haven't tried in a while, but IIRC the token is indeed only shown at the
> first access - and if that's not happening for you, then there is likely
> something broken. Are you using some plug in like uMatrix or the like? Maybe
> it helps to switch that off?

Sadly no, besides the Chrome I commonly use I also tried a fresh new
Firefox and Safari on different hosts.  None worked for me..
Thomas Huth July 20, 2022, 2:11 p.m. UTC | #10
On 19/07/2022 12.37, Daniel P. Berrangé wrote:
> On Tue, Jul 19, 2022 at 12:28:24PM +0200, Thomas Huth wrote:
>> On 18/07/2022 21.14, Peter Xu wrote:
>>> Hi, Thomas,
>>>
>>> On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
>>>> On 07/07/2022 20.46, Peter Xu wrote:
>>>>> We used to stop running all tests if uffd is not detected.  However
>>>>> logically that's only needed for postcopy not the rest of tests.
>>>>>
>>>>> Keep running the rest when still possible.
>>>>>
>>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>>>> ---
>>>>>     tests/qtest/migration-test.c | 11 +++++------
>>>>>     1 file changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> Did you test your patch in the gitlab-CI? I just added it to my testing-next
>>>> branch and the the test is failing reproducibly on macOS here:
>>>>
>>>>    https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
>>>>    https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
>>>>
>>>> (without your patch the whole test is skipped instead)
>>>
>>> Thanks for reporting this.
>>>
>>> Is it easy to figure out which test was failing on your side?  I cannot
>>> easily reproduce this here on a MacOS with M1.
>>
>> I've modified the yml file to only run the migration test in verbose mode
>> and got this:
>>
>> ...
>> ok 5 /x86_64/migration/validate_uuid_src_not_set
>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>> source,debug-threads=on -m 150M -serial
>> file:/tmp/migration-test-ef2fMr/src_serial -drive
>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid
>> 11111111-1111-1111-1111-111111111111 2>/dev/null -accel qtest
>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>> target,debug-threads=on -m 150M -serial
>> file:/tmp/migration-test-ef2fMr/dest_serial -incoming
>> unix:/tmp/migration-test-ef2fMr/migsocket -drive
>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel
>> qtest
>> ok 6 /x86_64/migration/validate_uuid_dst_not_set
>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>> source,debug-threads=on -m 150M -serial
>> file:/tmp/migration-test-ef2fMr/src_serial -drive
>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>> target,debug-threads=on -m 150M -serial
>> file:/tmp/migration-test-ef2fMr/dest_serial -incoming
>> unix:/tmp/migration-test-ef2fMr/migsocket -drive
>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
>> **
>> ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
>> assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
>> Bail out!
>> ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
>> assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> 
> This is the safety net we put it to catch case where the test has
> got stuck. It is set at 2 minutes.
> 
> There's a chance that is too short, so one first step might be to
> increase to 10 minutes and see if the tests pass. If it still fails,
> then its likely a genuine bug

I tried to increase it to 5 minutes first, but that did not help. In a 
second try, I increased it to 10 minutes, and then the test was passing, indeed:

https://cirrus-ci.com/task/5819072351830016?logs=build#L7208

Could it maybe be accelerated, e.g. by tweaking the downtime limit again?

  Thomas
Daniel P. Berrangé July 20, 2022, 2:32 p.m. UTC | #11
On Wed, Jul 20, 2022 at 04:11:43PM +0200, Thomas Huth wrote:
> On 19/07/2022 12.37, Daniel P. Berrangé wrote:
> > On Tue, Jul 19, 2022 at 12:28:24PM +0200, Thomas Huth wrote:
> > > On 18/07/2022 21.14, Peter Xu wrote:
> > > > Hi, Thomas,
> > > > 
> > > > On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
> > > > > On 07/07/2022 20.46, Peter Xu wrote:
> > > > > > We used to stop running all tests if uffd is not detected.  However
> > > > > > logically that's only needed for postcopy not the rest of tests.
> > > > > > 
> > > > > > Keep running the rest when still possible.
> > > > > > 
> > > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > > ---
> > > > > >     tests/qtest/migration-test.c | 11 +++++------
> > > > > >     1 file changed, 5 insertions(+), 6 deletions(-)
> > > > > 
> > > > > Did you test your patch in the gitlab-CI? I just added it to my testing-next
> > > > > branch and the the test is failing reproducibly on macOS here:
> > > > > 
> > > > >    https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
> > > > >    https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
> > > > > 
> > > > > (without your patch the whole test is skipped instead)
> > > > 
> > > > Thanks for reporting this.
> > > > 
> > > > Is it easy to figure out which test was failing on your side?  I cannot
> > > > easily reproduce this here on a MacOS with M1.
> > > 
> > > I've modified the yml file to only run the migration test in verbose mode
> > > and got this:
> > > 
> > > ...
> > > ok 5 /x86_64/migration/validate_uuid_src_not_set
> > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > source,debug-threads=on -m 150M -serial
> > > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid
> > > 11111111-1111-1111-1111-111111111111 2>/dev/null -accel qtest
> > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > target,debug-threads=on -m 150M -serial
> > > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel
> > > qtest
> > > ok 6 /x86_64/migration/validate_uuid_dst_not_set
> > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > source,debug-threads=on -m 150M -serial
> > > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > target,debug-threads=on -m 150M -serial
> > > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> > > **
> > > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> > > Bail out!
> > > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> > 
> > This is the safety net we put it to catch case where the test has
> > got stuck. It is set at 2 minutes.
> > 
> > There's a chance that is too short, so one first step might be to
> > increase to 10 minutes and see if the tests pass. If it still fails,
> > then its likely a genuine bug
> 
> I tried to increase it to 5 minutes first, but that did not help. In a
> second try, I increased it to 10 minutes, and then the test was passing,
> indeed:
> 
> https://cirrus-ci.com/task/5819072351830016?logs=build#L7208
> 
> Could it maybe be accelerated, e.g. by tweaking the downtime limit again?

Oh when I tweaked convergance tunables i missed the auto-converge
case as its code looks a bit different.

Possibly change test_migrate_auto_converge

    /* Now, when we tested that throttling works, let it converge */
    migrate_set_parameter_int(from, "downtime-limit", downtime_limit);
    migrate_set_parameter_int(from, "max-bandwidth", max_bandwidth);

to

    migrate_ensure_converge(from);


With regards,
Daniel
Peter Xu July 21, 2022, 6:24 p.m. UTC | #12
On Wed, Jul 20, 2022 at 03:32:20PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 20, 2022 at 04:11:43PM +0200, Thomas Huth wrote:
> > On 19/07/2022 12.37, Daniel P. Berrangé wrote:
> > > On Tue, Jul 19, 2022 at 12:28:24PM +0200, Thomas Huth wrote:
> > > > On 18/07/2022 21.14, Peter Xu wrote:
> > > > > Hi, Thomas,
> > > > > 
> > > > > On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
> > > > > > On 07/07/2022 20.46, Peter Xu wrote:
> > > > > > > We used to stop running all tests if uffd is not detected.  However
> > > > > > > logically that's only needed for postcopy not the rest of tests.
> > > > > > > 
> > > > > > > Keep running the rest when still possible.
> > > > > > > 
> > > > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > > > ---
> > > > > > >     tests/qtest/migration-test.c | 11 +++++------
> > > > > > >     1 file changed, 5 insertions(+), 6 deletions(-)
> > > > > > 
> > > > > > Did you test your patch in the gitlab-CI? I just added it to my testing-next
> > > > > > branch and the the test is failing reproducibly on macOS here:
> > > > > > 
> > > > > >    https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
> > > > > >    https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
> > > > > > 
> > > > > > (without your patch the whole test is skipped instead)
> > > > > 
> > > > > Thanks for reporting this.
> > > > > 
> > > > > Is it easy to figure out which test was failing on your side?  I cannot
> > > > > easily reproduce this here on a MacOS with M1.
> > > > 
> > > > I've modified the yml file to only run the migration test in verbose mode
> > > > and got this:
> > > > 
> > > > ...
> > > > ok 5 /x86_64/migration/validate_uuid_src_not_set
> > > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > > source,debug-threads=on -m 150M -serial
> > > > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid
> > > > 11111111-1111-1111-1111-111111111111 2>/dev/null -accel qtest
> > > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > > target,debug-threads=on -m 150M -serial
> > > > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > > > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel
> > > > qtest
> > > > ok 6 /x86_64/migration/validate_uuid_dst_not_set
> > > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > > source,debug-threads=on -m 150M -serial
> > > > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> > > > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > > > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > > > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > > > target,debug-threads=on -m 150M -serial
> > > > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > > > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > > > file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
> > > > **
> > > > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > > > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> > > > Bail out!
> > > > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > > > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> > > 
> > > This is the safety net we put it to catch case where the test has
> > > got stuck. It is set at 2 minutes.
> > > 
> > > There's a chance that is too short, so one first step might be to
> > > increase to 10 minutes and see if the tests pass. If it still fails,
> > > then its likely a genuine bug
> > 
> > I tried to increase it to 5 minutes first, but that did not help. In a
> > second try, I increased it to 10 minutes, and then the test was passing,
> > indeed:
> > 
> > https://cirrus-ci.com/task/5819072351830016?logs=build#L7208
> > 
> > Could it maybe be accelerated, e.g. by tweaking the downtime limit again?
> 
> Oh when I tweaked convergance tunables i missed the auto-converge
> case as its code looks a bit different.
> 
> Possibly change test_migrate_auto_converge
> 
>     /* Now, when we tested that throttling works, let it converge */
>     migrate_set_parameter_int(from, "downtime-limit", downtime_limit);
>     migrate_set_parameter_int(from, "max-bandwidth", max_bandwidth);
> 
> to
> 
>     migrate_ensure_converge(from);

Sounds good to me.

Thomas, would that work for you too?  I'm wondering whether you'd like to
post a patch for that.

I could have reposted both patches (including what Dan suggested) but I
still have no good way to kick that macos test so I cannot verify it.  Let
me know if you want me to post those, I can do it (and test as much as I
could) but I may need some help on kicking a test to verify it.

Thanks!
Thomas Huth July 22, 2022, 8:14 a.m. UTC | #13
On 21/07/2022 20.24, Peter Xu wrote:
> On Wed, Jul 20, 2022 at 03:32:20PM +0100, Daniel P. Berrangé wrote:
>> On Wed, Jul 20, 2022 at 04:11:43PM +0200, Thomas Huth wrote:
>>> On 19/07/2022 12.37, Daniel P. Berrangé wrote:
>>>> On Tue, Jul 19, 2022 at 12:28:24PM +0200, Thomas Huth wrote:
>>>>> On 18/07/2022 21.14, Peter Xu wrote:
>>>>>> Hi, Thomas,
>>>>>>
>>>>>> On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
>>>>>>> On 07/07/2022 20.46, Peter Xu wrote:
>>>>>>>> We used to stop running all tests if uffd is not detected.  However
>>>>>>>> logically that's only needed for postcopy not the rest of tests.
>>>>>>>>
>>>>>>>> Keep running the rest when still possible.
>>>>>>>>
>>>>>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>>>>>>> ---
>>>>>>>>      tests/qtest/migration-test.c | 11 +++++------
>>>>>>>>      1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> Did you test your patch in the gitlab-CI? I just added it to my testing-next
>>>>>>> branch and the the test is failing reproducibly on macOS here:
>>>>>>>
>>>>>>>     https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
>>>>>>>     https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
>>>>>>>
>>>>>>> (without your patch the whole test is skipped instead)
>>>>>>
>>>>>> Thanks for reporting this.
>>>>>>
>>>>>> Is it easy to figure out which test was failing on your side?  I cannot
>>>>>> easily reproduce this here on a MacOS with M1.
>>>>>
>>>>> I've modified the yml file to only run the migration test in verbose mode
>>>>> and got this:
>>>>>
>>>>> ...
>>>>> ok 5 /x86_64/migration/validate_uuid_src_not_set
>>>>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>>>>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>>>>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>>>>> source,debug-threads=on -m 150M -serial
>>>>> file:/tmp/migration-test-ef2fMr/src_serial -drive
>>>>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid
>>>>> 11111111-1111-1111-1111-111111111111 2>/dev/null -accel qtest
>>>>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>>>>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>>>>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>>>>> target,debug-threads=on -m 150M -serial
>>>>> file:/tmp/migration-test-ef2fMr/dest_serial -incoming
>>>>> unix:/tmp/migration-test-ef2fMr/migsocket -drive
>>>>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel
>>>>> qtest
>>>>> ok 6 /x86_64/migration/validate_uuid_dst_not_set
>>>>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>>>>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>>>>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>>>>> source,debug-threads=on -m 150M -serial
>>>>> file:/tmp/migration-test-ef2fMr/src_serial -drive
>>>>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
>>>>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
>>>>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
>>>>> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
>>>>> target,debug-threads=on -m 150M -serial
>>>>> file:/tmp/migration-test-ef2fMr/dest_serial -incoming
>>>>> unix:/tmp/migration-test-ef2fMr/migsocket -drive
>>>>> file=/tmp/migration-test-ef2fMr/bootsect,format=raw    -accel qtest
>>>>> **
>>>>> ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
>>>>> assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
>>>>> Bail out!
>>>>> ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
>>>>> assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
>>>>
>>>> This is the safety net we put it to catch case where the test has
>>>> got stuck. It is set at 2 minutes.
>>>>
>>>> There's a chance that is too short, so one first step might be to
>>>> increase to 10 minutes and see if the tests pass. If it still fails,
>>>> then its likely a genuine bug
>>>
>>> I tried to increase it to 5 minutes first, but that did not help. In a
>>> second try, I increased it to 10 minutes, and then the test was passing,
>>> indeed:
>>>
>>> https://cirrus-ci.com/task/5819072351830016?logs=build#L7208
>>>
>>> Could it maybe be accelerated, e.g. by tweaking the downtime limit again?
>>
>> Oh when I tweaked convergance tunables i missed the auto-converge
>> case as its code looks a bit different.
>>
>> Possibly change test_migrate_auto_converge
>>
>>      /* Now, when we tested that throttling works, let it converge */
>>      migrate_set_parameter_int(from, "downtime-limit", downtime_limit);
>>      migrate_set_parameter_int(from, "max-bandwidth", max_bandwidth);
>>
>> to
>>
>>      migrate_ensure_converge(from);
> 
> Sounds good to me.
> 
> Thomas, would that work for you too?  I'm wondering whether you'd like to
> post a patch for that.
> 
> I could have reposted both patches (including what Dan suggested) but I
> still have no good way to kick that macos test so I cannot verify it.  Let
> me know if you want me to post those, I can do it (and test as much as I
> could) but I may need some help on kicking a test to verify it.

Please go ahead and post the patches - I'll then try to provide a Tested-by 
as soon as possible.

  Thomas
diff mbox series

Patch

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9e64125f02..55acf9612c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2086,14 +2086,11 @@  int main(int argc, char **argv)
 {
     char template[] = "/tmp/migration-test-XXXXXX";
     const bool has_kvm = qtest_has_accel("kvm");
+    const bool has_uffd = ufd_version_check();
     int ret;
 
     g_test_init(&argc, &argv, NULL);
 
-    if (!ufd_version_check()) {
-        return g_test_run();
-    }
-
     /*
      * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG
      * is touchy due to race conditions on dirty bits (especially on PPC for
@@ -2122,8 +2119,10 @@  int main(int argc, char **argv)
 
     module_call_init(MODULE_INIT_QOM);
 
-    qtest_add_func("/migration/postcopy/unix", test_postcopy);
-    qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+    if (has_uffd) {
+        qtest_add_func("/migration/postcopy/unix", test_postcopy);
+        qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+    }
     qtest_add_func("/migration/bad_dest", test_baddest);
     qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
     qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);