diff mbox series

[v4,6/8] iotests/300: avoid abnormal shutdown race condition

Message ID 20211013215748.2154908-7-jsnow@redhat.com
State New
Headers show
Series Switch iotests to using Async QMP | expand

Commit Message

John Snow Oct. 13, 2021, 9:57 p.m. UTC
Wait for the destination VM to close itself instead of racing to shut it
down first, which produces different error log messages from AQMP
depending on precisely when we tried to shut it down.

(For example: We may try to issue 'quit' immediately prior to the target
VM closing its QMP socket, which will cause an ECONNRESET error to be
logged. Waiting for the VM to exit itself avoids the race on shutdown
behavior.)

Reported-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 tests/qemu-iotests/300 | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Oct. 14, 2021, 5:06 p.m. UTC | #1
14.10.2021 00:57, John Snow wrote:
> Wait for the destination VM to close itself instead of racing to shut it
> down first, which produces different error log messages from AQMP
> depending on precisely when we tried to shut it down.
> 
> (For example: We may try to issue 'quit' immediately prior to the target
> VM closing its QMP socket, which will cause an ECONNRESET error to be
> logged. Waiting for the VM to exit itself avoids the race on shutdown
> behavior.)
> 
> Reported-by: Hanna Reitz<hreitz@redhat.com>
> Signed-off-by: John Snow<jsnow@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Hanna Czenczek Oct. 25, 2021, 12:10 p.m. UTC | #2
On 13.10.21 23:57, John Snow wrote:
> Wait for the destination VM to close itself instead of racing to shut it
> down first, which produces different error log messages from AQMP
> depending on precisely when we tried to shut it down.
>
> (For example: We may try to issue 'quit' immediately prior to the target
> VM closing its QMP socket, which will cause an ECONNRESET error to be
> logged. Waiting for the VM to exit itself avoids the race on shutdown
> behavior.)
>
> Reported-by: Hanna Reitz <hreitz@redhat.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   tests/qemu-iotests/300 | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Hanna Czenczek Oct. 25, 2021, 1:20 p.m. UTC | #3
On 13.10.21 23:57, John Snow wrote:
> Wait for the destination VM to close itself instead of racing to shut it
> down first, which produces different error log messages from AQMP
> depending on precisely when we tried to shut it down.
>
> (For example: We may try to issue 'quit' immediately prior to the target
> VM closing its QMP socket, which will cause an ECONNRESET error to be
> logged. Waiting for the VM to exit itself avoids the race on shutdown
> behavior.)
>
> Reported-by: Hanna Reitz <hreitz@redhat.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   tests/qemu-iotests/300 | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
> index 10f9f2a8da6..bbea7248005 100755
> --- a/tests/qemu-iotests/300
> +++ b/tests/qemu-iotests/300
> @@ -24,8 +24,6 @@ import random
>   import re
>   from typing import Dict, List, Optional
>   
> -from qemu.machine import machine
> -
>   import iotests
>   
>   
> @@ -461,12 +459,10 @@ class TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
>                         f"'{self.src_node_name}': Name is longer than 255 bytes",
>                         log)
>   
> -        # Expect abnormal shutdown of the destination VM because of
> -        # the failed migration
> -        try:
> -            self.vm_b.shutdown()
> -        except machine.AbnormalShutdown:
> -            pass
> +        # Destination VM will terminate w/ error of its own accord
> +        # due to the failed migration.
> +        self.vm_b.wait()
> +        assert self.vm_b.exitcode() > 0

Trying to test, I can see that this fails iotest 297, because 
`.exitcode()` is `Optional[int]`...

(I can’t believe how long it took me to figure this out – the message 
“300:465: Unsupported operand types for < ("int" and "None")” made me 
believe that it was 300 that was failing, because `exitcode()` was 
returning `None` for some inconceivable reason.  I couldn’t understand 
why my usual test setup failed on every run, but I couldn’t get 300 to 
fail manually...  Until I noticed that the message came below the “297” 
line, not the “300” line...)

Hanna
John Snow Oct. 26, 2021, 5:07 p.m. UTC | #4
On Mon, Oct 25, 2021 at 9:20 AM Hanna Reitz <hreitz@redhat.com> wrote:

> On 13.10.21 23:57, John Snow wrote:
> > Wait for the destination VM to close itself instead of racing to shut it
> > down first, which produces different error log messages from AQMP
> > depending on precisely when we tried to shut it down.
> >
> > (For example: We may try to issue 'quit' immediately prior to the target
> > VM closing its QMP socket, which will cause an ECONNRESET error to be
> > logged. Waiting for the VM to exit itself avoids the race on shutdown
> > behavior.)
> >
> > Reported-by: Hanna Reitz <hreitz@redhat.com>
> > Signed-off-by: John Snow <jsnow@redhat.com>
> > ---
> >   tests/qemu-iotests/300 | 12 ++++--------
> >   1 file changed, 4 insertions(+), 8 deletions(-)
> >
> > diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
> > index 10f9f2a8da6..bbea7248005 100755
> > --- a/tests/qemu-iotests/300
> > +++ b/tests/qemu-iotests/300
> > @@ -24,8 +24,6 @@ import random
> >   import re
> >   from typing import Dict, List, Optional
> >
> > -from qemu.machine import machine
> > -
> >   import iotests
> >
> >
> > @@ -461,12 +459,10 @@ class
> TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
> >                         f"'{self.src_node_name}': Name is longer than
> 255 bytes",
> >                         log)
> >
> > -        # Expect abnormal shutdown of the destination VM because of
> > -        # the failed migration
> > -        try:
> > -            self.vm_b.shutdown()
> > -        except machine.AbnormalShutdown:
> > -            pass
> > +        # Destination VM will terminate w/ error of its own accord
> > +        # due to the failed migration.
> > +        self.vm_b.wait()
> > +        assert self.vm_b.exitcode() > 0
>
> Trying to test, I can see that this fails iotest 297, because
> `.exitcode()` is `Optional[int]`...
>
> (I can’t believe how long it took me to figure this out – the message
> “300:465: Unsupported operand types for < ("int" and "None")” made me
> believe that it was 300 that was failing, because `exitcode()` was
> returning `None` for some inconceivable reason.  I couldn’t understand
> why my usual test setup failed on every run, but I couldn’t get 300 to
> fail manually...  Until I noticed that the message came below the “297”
> line, not the “300” line...)
>
>
Oops. Is there anything we can do to improve the visual clarity there?


> Hanna
>
>
Embarrassing. I scrutinized the other series I sent out, but forgot to
apply the same tests to this one. :(
Fixed, sorry for the noise.

--js
Hanna Czenczek Oct. 27, 2021, 7:44 a.m. UTC | #5
On 26.10.21 19:07, John Snow wrote:
>
>
> On Mon, Oct 25, 2021 at 9:20 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>     On 13.10.21 23:57, John Snow wrote:
>     > Wait for the destination VM to close itself instead of racing to
>     shut it
>     > down first, which produces different error log messages from AQMP
>     > depending on precisely when we tried to shut it down.
>     >
>     > (For example: We may try to issue 'quit' immediately prior to
>     the target
>     > VM closing its QMP socket, which will cause an ECONNRESET error
>     to be
>     > logged. Waiting for the VM to exit itself avoids the race on
>     shutdown
>     > behavior.)
>     >
>     > Reported-by: Hanna Reitz <hreitz@redhat.com>
>     > Signed-off-by: John Snow <jsnow@redhat.com>
>     > ---
>     >   tests/qemu-iotests/300 | 12 ++++--------
>     >   1 file changed, 4 insertions(+), 8 deletions(-)
>     >
>     > diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
>     > index 10f9f2a8da6..bbea7248005 100755
>     > --- a/tests/qemu-iotests/300
>     > +++ b/tests/qemu-iotests/300
>     > @@ -24,8 +24,6 @@ import random
>     >   import re
>     >   from typing import Dict, List, Optional
>     >
>     > -from qemu.machine import machine
>     > -
>     >   import iotests
>     >
>     >
>     > @@ -461,12 +459,10 @@ class
>     TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
>     >                         f"'{self.src_node_name}': Name is longer
>     than 255 bytes",
>     >                         log)
>     >
>     > -        # Expect abnormal shutdown of the destination VM because of
>     > -        # the failed migration
>     > -        try:
>     > -            self.vm_b.shutdown()
>     > -        except machine.AbnormalShutdown:
>     > -            pass
>     > +        # Destination VM will terminate w/ error of its own accord
>     > +        # due to the failed migration.
>     > +        self.vm_b.wait()
>     > +        assert self.vm_b.exitcode() > 0
>
>     Trying to test, I can see that this fails iotest 297, because
>     `.exitcode()` is `Optional[int]`...
>
>     (I can’t believe how long it took me to figure this out – the message
>     “300:465: Unsupported operand types for < ("int" and "None")” made me
>     believe that it was 300 that was failing, because `exitcode()` was
>     returning `None` for some inconceivable reason.  I couldn’t
>     understand
>     why my usual test setup failed on every run, but I couldn’t get
>     300 to
>     fail manually...  Until I noticed that the message came below the
>     “297”
>     line, not the “300” line...)
>
>
> Oops. Is there anything we can do to improve the visual clarity there?

You mean, like, separating iotests from the Python linting tests? :)

I think until then I’ll just need to pay more attention.

Hanna
diff mbox series

Patch

diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
index 10f9f2a8da6..bbea7248005 100755
--- a/tests/qemu-iotests/300
+++ b/tests/qemu-iotests/300
@@ -24,8 +24,6 @@  import random
 import re
 from typing import Dict, List, Optional
 
-from qemu.machine import machine
-
 import iotests
 
 
@@ -461,12 +459,10 @@  class TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
                       f"'{self.src_node_name}': Name is longer than 255 bytes",
                       log)
 
-        # Expect abnormal shutdown of the destination VM because of
-        # the failed migration
-        try:
-            self.vm_b.shutdown()
-        except machine.AbnormalShutdown:
-            pass
+        # Destination VM will terminate w/ error of its own accord
+        # due to the failed migration.
+        self.vm_b.wait()
+        assert self.vm_b.exitcode() > 0
 
     def test_aliased_bitmap_name_too_long(self) -> None:
         # Longer than the maximum for bitmap names