diff mbox series

block: Don't inactivate bs if it is aleady inactive

Message ID b6caaaf3bfa84e3d913c81361b32ddae@h3c.com
State New
Headers show
Series block: Don't inactivate bs if it is aleady inactive | expand

Commit Message

Tuguoyi Nov. 24, 2020, 10:04 a.m. UTC
The following steps will cause qemu assertion failure:
- pause vm
- save memory snapshot into local file through fd migration
- do the above operation again will cause qemu assertion failure

The backtrace looks like:
#0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at /build/qemu-5.0/block.c:5724
#5  0x000055ca8dece967 in bdrv_inactivate_all () at /build//qemu-5.0/block.c:5792
#6  0x000055ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
    at /build/qemu-5.0/migration/savevm.c:1401
#7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true)
    at /build/qemu-5.0/migration/savevm.c:1453
#8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:2941
#9  migration_iteration_run (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3295
#10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3459
#11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at /build/qemu-5.0/util/qemu-thread-posix.c:519
#12 0x00007fbf95c5f184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6

When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE flag
by bdrv_inactivate_recurse(), and during the second migration the
bdrv_inactivate_recurse assert that the bs->open_flags is already BDRV_O_INACTIVE
enabled which cause crash.

This patch just make the bdrv_inactivate_all() to don't inactivate the bs if it is
already inactive

Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
---
 block.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Vladimir Sementsov-Ogievskiy Nov. 28, 2020, 8:48 a.m. UTC | #1
24.11.2020 13:04, Tuguoyi wrote:
> The following steps will cause qemu assertion failure:
> - pause vm
> - save memory snapshot into local file through fd migration
> - do the above operation again will cause qemu assertion failure

Hi!

Why do you need such scenario? Isn't it more correct and safer to just error-out on savevm if disks are already inactive? Inactive disks are a sign, that vm may be migrated to other host and already running, so creating any kind of snapshots of this old state may be bad idea. I mean, you try to allow a doubtful feature to avoid an assertion. If you don't have strong reasons for the feature, it's better to turn a crash into clean error-out.

As far as I remember, bdrv_inactive_all() is the only source of inactivation, so actually, it's more like "inactive" state of the vm, not just some disks are inactive.. And you change the code in a way that it looks like that some disks may be inactive and some not, which would actually be unexpected behavior.

> 
> The backtrace looks like:
> #0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at /build/qemu-5.0/block.c:5724
> #5  0x000055ca8dece967 in bdrv_inactivate_all () at /build//qemu-5.0/block.c:5792
> #6  0x000055ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
>      at /build/qemu-5.0/migration/savevm.c:1401
> #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true)
>      at /build/qemu-5.0/migration/savevm.c:1453
> #8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:2941
> #9  migration_iteration_run (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3295
> #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3459
> #11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at /build/qemu-5.0/util/qemu-thread-posix.c:519
> #12 0x00007fbf95c5f184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE flag
> by bdrv_inactivate_recurse(), and during the second migration the
> bdrv_inactivate_recurse assert that the bs->open_flags is already BDRV_O_INACTIVE
> enabled which cause crash.
> 
> This patch just make the bdrv_inactivate_all() to don't inactivate the bs if it is
> already inactive
> 
> Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
> ---
>   block.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index f1cedac..02361e1 100644
> --- a/block.c
> +++ b/block.c
> @@ -5938,6 +5938,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
>       return 0;
>   }
>   
> +static bool bdrv_is_inactive(BlockDriverState *bs)
> +{
> +    return bs->open_flags & BDRV_O_INACTIVE;
> +}
> +
>   int bdrv_inactivate_all(void)
>   {
>       BlockDriverState *bs = NULL;
> @@ -5958,7 +5963,7 @@ int bdrv_inactivate_all(void)
>           /* Nodes with BDS parents are covered by recursion from the last
>            * parent that gets inactivated. Don't inactivate them a second
>            * time if that has already happened. */
> -        if (bdrv_has_bds_parent(bs, false)) {
> +        if (bdrv_has_bds_parent(bs, false) || bdrv_is_inactive(bs)) {
>               continue;
>           }
>           ret = bdrv_inactivate_recurse(bs);
>
Tuguoyi Dec. 3, 2020, 9:22 a.m. UTC | #2
> -----Original Message-----
> From: Vladimir Sementsov-Ogievskiy [mailto:vsementsov@virtuozzo.com]
> Sent: Saturday, November 28, 2020 4:48 PM
> To: tuguoyi (Cloud) <tu.guoyi@h3c.com>; Kevin Wolf <kwolf@redhat.com>;
> Max Reitz <mreitz@redhat.com>; qemu-block@nongnu.org
> Cc: qemu-devel@nongnu.org; wangyong (Cloud) <wang.yongD@h3c.com>
> Subject: Re: [PATCH] block: Don't inactivate bs if it is aleady inactive
> 
> 24.11.2020 13:04, Tuguoyi wrote:
> > The following steps will cause qemu assertion failure:
> > - pause vm
> > - save memory snapshot into local file through fd migration
> > - do the above operation again will cause qemu assertion failure
> 
> Hi!
> 
> Why do you need such scenario? Isn't it more correct and safer to just
> error-out on savevm if disks are already inactive? Inactive disks are a sign,
> that vm may be migrated to other host and already running, so creating any
> kind of snapshots of this old state may be bad idea. I mean, you try to allow a
> doubtful feature to avoid an assertion. If you don't have strong reasons for the
> feature, it's better to turn a crash into clean error-out.

This scenario is triggered by auto test tool which has a snapshot policy that create 
external snapshots of disk and memory periodically by virsh. But during the test, 
the virtual machine get paused due to storage shortage, so the second external snapshot
after that point will cause qemu crash.

I agree that it's better to error-out when that case happens.

> As far as I remember, bdrv_inactive_all() is the only source of inactivation, so
> actually, it's more like "inactive" state of the vm, not just some disks are
> inactive.. And you change the code in a way that it looks like that some disks
> may be inactive and some not, which would actually be unexpected behavior.

Thanks for your comments, and I'll send a new patch

> >
> > The backtrace looks like:
> > #0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #3  0x00007fbf958beca2 in __assert_fail () from
> /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400)
> at /build/qemu-5.0/block.c:5724
> > #5  0x000055ca8dece967 in bdrv_inactivate_all () at
> /build//qemu-5.0/block.c:5792
> > #6  0x000055ca8de5539d in
> qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true,
> in_postcopy=false, f=0x55ca907044b0)
> >      at /build/qemu-5.0/migration/savevm.c:1401
> > #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0,
> iterable_only=iterable_only@entry=false,
> inactivate_disks=inactivate_disks@entry=true)
> >      at /build/qemu-5.0/migration/savevm.c:1453
> > #8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:2941
> > #9  migration_iteration_run (s=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:3295
> > #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:3459
> > #11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at
> /build/qemu-5.0/util/qemu-thread-posix.c:519
> > #12 0x00007fbf95c5f184 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> > #13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >
> > When the first migration completes, bs->open_flags will set
> BDRV_O_INACTIVE flag
> > by bdrv_inactivate_recurse(), and during the second migration the
> > bdrv_inactivate_recurse assert that the bs->open_flags is already
> BDRV_O_INACTIVE
> > enabled which cause crash.
> >
> > This patch just make the bdrv_inactivate_all() to don't inactivate the bs if it is
> > already inactive
> >
> > Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
> > ---
> >   block.c | 7 ++++++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/block.c b/block.c
> > index f1cedac..02361e1 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -5938,6 +5938,11 @@ static int
> bdrv_inactivate_recurse(BlockDriverState *bs)
> >       return 0;
> >   }
> >
> > +static bool bdrv_is_inactive(BlockDriverState *bs)
> > +{
> > +    return bs->open_flags & BDRV_O_INACTIVE;
> > +}
> > +
> >   int bdrv_inactivate_all(void)
> >   {
> >       BlockDriverState *bs = NULL;
> > @@ -5958,7 +5963,7 @@ int bdrv_inactivate_all(void)
> >           /* Nodes with BDS parents are covered by recursion from the
> last
> >            * parent that gets inactivated. Don't inactivate them a second
> >            * time if that has already happened. */
> > -        if (bdrv_has_bds_parent(bs, false)) {
> > +        if (bdrv_has_bds_parent(bs, false) || bdrv_is_inactive(bs)) {
> >               continue;
> >           }
> >           ret = bdrv_inactivate_recurse(bs);
> >
> 
> 
> 
> --
> Best regards,
> Vladimir
diff mbox series

Patch

diff --git a/block.c b/block.c
index f1cedac..02361e1 100644
--- a/block.c
+++ b/block.c
@@ -5938,6 +5938,11 @@  static int bdrv_inactivate_recurse(BlockDriverState *bs)
     return 0;
 }
 
+static bool bdrv_is_inactive(BlockDriverState *bs)
+{
+    return bs->open_flags & BDRV_O_INACTIVE;
+}
+
 int bdrv_inactivate_all(void)
 {
     BlockDriverState *bs = NULL;
@@ -5958,7 +5963,7 @@  int bdrv_inactivate_all(void)
         /* Nodes with BDS parents are covered by recursion from the last
          * parent that gets inactivated. Don't inactivate them a second
          * time if that has already happened. */
-        if (bdrv_has_bds_parent(bs, false)) {
+        if (bdrv_has_bds_parent(bs, false) || bdrv_is_inactive(bs)) {
             continue;
         }
         ret = bdrv_inactivate_recurse(bs);