diff mbox series

[v2,2/3] qcow2: handle reopening bitmaps on bdrv_invalidate_cache

Message ID 20171212160450.17510-3-vsementsov@virtuozzo.com
State New
Headers show
Series fix bitmaps migration through shared storage | expand

Commit Message

Vladimir Sementsov-Ogievskiy Dec. 12, 2017, 4:04 p.m. UTC
Consider migration with shared storage. Persistent bitmaps are stored
on bdrv_inactivate. Then, on destination
process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
are already loaded on destination start. In this case we should call
qcow2_reopen_bitmaps_rw instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
---
 block/qcow2.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Kevin Wolf Dec. 22, 2017, 1:39 p.m. UTC | #1
Am 12.12.2017 um 17:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Consider migration with shared storage. Persistent bitmaps are stored
> on bdrv_inactivate. Then, on destination
> process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
> leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
> are already loaded on destination start. In this case we should call
> qcow2_reopen_bitmaps_rw instead.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: John Snow <jsnow@redhat.com>

qcow2_invalidate_cache() calls qcow2_close() first, so why are there
still any bitmaps loaded? Isn't this a bug? Do we leak bitmaps when a
qcow2 image is closed?

Kevin
Vladimir Sementsov-Ogievskiy Dec. 22, 2017, 2:25 p.m. UTC | #2
22.12.2017 16:39, Kevin Wolf wrote:
> Am 12.12.2017 um 17:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Consider migration with shared storage. Persistent bitmaps are stored
>> on bdrv_inactivate. Then, on destination
>> process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
>> leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
>> are already loaded on destination start. In this case we should call
>> qcow2_reopen_bitmaps_rw instead.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Reviewed-by: John Snow <jsnow@redhat.com>
> qcow2_invalidate_cache() calls qcow2_close() first, so why are there
> still any bitmaps loaded? Isn't this a bug? Do we leak bitmaps when a
> qcow2 image is closed?
>
> Kevin

Interesting point.

Now persistent dirty bitmaps are released at the end of 
bdrv_inactivate_recurse,
which is not called here. It was a separate patch

commit 615b5dcf2decbc5f0abb512d13d7e5db2385fa23
Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date:   Wed Jun 28 15:05:30 2017 +0300

     block: release persistent bitmaps on inactivate


May be it is more correct to release them immediately after storing, in
qcow2_inactivete. But it will not fix the issue, because qcow2_close will
call qcow2_inactivate only if (!(s->flags & BDRV_O_INACTIVE)), but it is
not the case.

or we can do like this, it fixes the new test:
   static void qcow2_close(BlockDriverState *bs)
{
       BDRVQcow2State *s = bs->opaque;
qemu_vfree(s->l1_table);
       /* else pre-write overlap checks in cache_destroy may crash */
       s->l1_table = NULL;

       if (!(s->flags & BDRV_O_INACTIVE)) {
qcow2_inactivate(bs);
}
+     bdrv_release_persistent_dirty_bitmaps(bs);


What do you think?
Kevin Wolf Dec. 22, 2017, 3:43 p.m. UTC | #3
Am 22.12.2017 um 15:25 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 22.12.2017 16:39, Kevin Wolf wrote:
> > Am 12.12.2017 um 17:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > Consider migration with shared storage. Persistent bitmaps are stored
> > > on bdrv_inactivate. Then, on destination
> > > process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
> > > leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
> > > are already loaded on destination start. In this case we should call
> > > qcow2_reopen_bitmaps_rw instead.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > Reviewed-by: John Snow <jsnow@redhat.com>
> > qcow2_invalidate_cache() calls qcow2_close() first, so why are there
> > still any bitmaps loaded? Isn't this a bug? Do we leak bitmaps when a
> > qcow2 image is closed?
> > 
> > Kevin
> 
> Interesting point.
> 
> Now persistent dirty bitmaps are released at the end of
> bdrv_inactivate_recurse,
> which is not called here. It was a separate patch
> 
> commit 615b5dcf2decbc5f0abb512d13d7e5db2385fa23
> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Date:   Wed Jun 28 15:05:30 2017 +0300
> 
>     block: release persistent bitmaps on inactivate
> 
> May be it is more correct to release them immediately after storing, in
> qcow2_inactivete.

I chose the question form because I'm really not deep enough into the
bitmap code to have a solid opinion, but I have a feeling that releasing
the bitmaps from the block driver that provided them would be cleaner
indeed. I suppose the same is true for .bdrv_close.

> But it will not fix the issue, because qcow2_close will
> call qcow2_inactivate only if (!(s->flags & BDRV_O_INACTIVE)), but it is
> not the case.

Yes, good point.

Is there a reason why bitmaps are already loaded in qcow2_do_open() even
if the image is inactive? Can bitmaps be meaningfully used on inactive
images?

Otherwise, we could just make qcow2_load_autoloading_dirty_bitmaps()
conditional on cleared BDRV_O_INACTIVE.

> or we can do like this, it fixes the new test:
>   static void qcow2_close(BlockDriverState *bs)
> {
>       BDRVQcow2State *s = bs->opaque;
> qemu_vfree(s->l1_table);
>       /* else pre-write overlap checks in cache_destroy may crash */
>       s->l1_table = NULL;
> 
>       if (!(s->flags & BDRV_O_INACTIVE)) {
> qcow2_inactivate(bs);
> }
> +     bdrv_release_persistent_dirty_bitmaps(bs);
> 
> What do you think?

Hm, I think I don't like this much.

We just need to decide what the status of inactive images is supposed to
be. If they should have bitmaps, then your patch is probably right. But
if inactive images shouldn't have any, we need to change qcow2_do_open()
and qcow2_inactivate().

Kevin
Vladimir Sementsov-Ogievskiy Dec. 22, 2017, 4:12 p.m. UTC | #4
22.12.2017 18:43, Kevin Wolf wrote:
> Am 22.12.2017 um 15:25 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 22.12.2017 16:39, Kevin Wolf wrote:
>>> Am 12.12.2017 um 17:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> Consider migration with shared storage. Persistent bitmaps are stored
>>>> on bdrv_inactivate. Then, on destination
>>>> process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
>>>> leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
>>>> are already loaded on destination start. In this case we should call
>>>> qcow2_reopen_bitmaps_rw instead.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> Reviewed-by: John Snow <jsnow@redhat.com>
>>> qcow2_invalidate_cache() calls qcow2_close() first, so why are there
>>> still any bitmaps loaded? Isn't this a bug? Do we leak bitmaps when a
>>> qcow2 image is closed?
>>>
>>> Kevin
>> Interesting point.
>>
>> Now persistent dirty bitmaps are released at the end of
>> bdrv_inactivate_recurse,
>> which is not called here. It was a separate patch
>>
>> commit 615b5dcf2decbc5f0abb512d13d7e5db2385fa23
>> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Date:   Wed Jun 28 15:05:30 2017 +0300
>>
>>      block: release persistent bitmaps on inactivate
>>
>> May be it is more correct to release them immediately after storing, in
>> qcow2_inactivete.
> I chose the question form because I'm really not deep enough into the
> bitmap code to have a solid opinion, but I have a feeling that releasing
> the bitmaps from the block driver that provided them would be cleaner
> indeed. I suppose the same is true for .bdrv_close.
>
>> But it will not fix the issue, because qcow2_close will
>> call qcow2_inactivate only if (!(s->flags & BDRV_O_INACTIVE)), but it is
>> not the case.
> Yes, good point.
>
> Is there a reason why bitmaps are already loaded in qcow2_do_open() even
> if the image is inactive? Can bitmaps be meaningfully used on inactive
> images?
>
> Otherwise, we could just make qcow2_load_autoloading_dirty_bitmaps()
> conditional on cleared BDRV_O_INACTIVE.
>
>> or we can do like this, it fixes the new test:
>>    static void qcow2_close(BlockDriverState *bs)
>> {
>>        BDRVQcow2State *s = bs->opaque;
>> qemu_vfree(s->l1_table);
>>        /* else pre-write overlap checks in cache_destroy may crash */
>>        s->l1_table = NULL;
>>
>>        if (!(s->flags & BDRV_O_INACTIVE)) {
>> qcow2_inactivate(bs);
>> }
>> +     bdrv_release_persistent_dirty_bitmaps(bs);
>>
>> What do you think?
> Hm, I think I don't like this much.
>
> We just need to decide what the status of inactive images is supposed to
> be. If they should have bitmaps, then your patch is probably right. But
> if inactive images shouldn't have any, we need to change qcow2_do_open()
> and qcow2_inactivate().
>
> Kevin

Does Qemu start in inactive mode when and only when it is incoming 
migration? In this case I don't
see any reason of early-load the bitmaps. Backup in inactive mode should 
not be allowed too, yes?

So, it looks like it's ok to just do not autoload bitmaps if we are in 
inactive mode. The difference
would be that user will not see these bitmaps during migration. And he 
even may create bitmaps
with same names, which will lead to fault. But it don't look like real 
problem.

So, if there will not be other thoughts, I'll make another patch.
Kevin Wolf Dec. 22, 2017, 4:28 p.m. UTC | #5
Am 22.12.2017 um 17:12 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 22.12.2017 18:43, Kevin Wolf wrote:
> > Am 22.12.2017 um 15:25 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 22.12.2017 16:39, Kevin Wolf wrote:
> > > > Am 12.12.2017 um 17:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > Consider migration with shared storage. Persistent bitmaps are stored
> > > > > on bdrv_inactivate. Then, on destination
> > > > > process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
> > > > > leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
> > > > > are already loaded on destination start. In this case we should call
> > > > > qcow2_reopen_bitmaps_rw instead.
> > > > > 
> > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > > Reviewed-by: John Snow <jsnow@redhat.com>
> > > > qcow2_invalidate_cache() calls qcow2_close() first, so why are there
> > > > still any bitmaps loaded? Isn't this a bug? Do we leak bitmaps when a
> > > > qcow2 image is closed?
> > > > 
> > > > Kevin
> > > Interesting point.
> > > 
> > > Now persistent dirty bitmaps are released at the end of
> > > bdrv_inactivate_recurse,
> > > which is not called here. It was a separate patch
> > > 
> > > commit 615b5dcf2decbc5f0abb512d13d7e5db2385fa23
> > > Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > Date:   Wed Jun 28 15:05:30 2017 +0300
> > > 
> > >      block: release persistent bitmaps on inactivate
> > > 
> > > May be it is more correct to release them immediately after storing, in
> > > qcow2_inactivete.
> > I chose the question form because I'm really not deep enough into the
> > bitmap code to have a solid opinion, but I have a feeling that releasing
> > the bitmaps from the block driver that provided them would be cleaner
> > indeed. I suppose the same is true for .bdrv_close.
> > 
> > > But it will not fix the issue, because qcow2_close will
> > > call qcow2_inactivate only if (!(s->flags & BDRV_O_INACTIVE)), but it is
> > > not the case.
> > Yes, good point.
> > 
> > Is there a reason why bitmaps are already loaded in qcow2_do_open() even
> > if the image is inactive? Can bitmaps be meaningfully used on inactive
> > images?
> > 
> > Otherwise, we could just make qcow2_load_autoloading_dirty_bitmaps()
> > conditional on cleared BDRV_O_INACTIVE.
> > 
> > > or we can do like this, it fixes the new test:
> > >    static void qcow2_close(BlockDriverState *bs)
> > > {
> > >        BDRVQcow2State *s = bs->opaque;
> > > qemu_vfree(s->l1_table);
> > >        /* else pre-write overlap checks in cache_destroy may crash */
> > >        s->l1_table = NULL;
> > > 
> > >        if (!(s->flags & BDRV_O_INACTIVE)) {
> > > qcow2_inactivate(bs);
> > > }
> > > +     bdrv_release_persistent_dirty_bitmaps(bs);
> > > 
> > > What do you think?
> > Hm, I think I don't like this much.
> > 
> > We just need to decide what the status of inactive images is supposed to
> > be. If they should have bitmaps, then your patch is probably right. But
> > if inactive images shouldn't have any, we need to change qcow2_do_open()
> > and qcow2_inactivate().
> > 
> > Kevin
> 
> Does Qemu start in inactive mode when and only when it is incoming
> migration? In this case I don't see any reason of early-load the
> bitmaps. Backup in inactive mode should not be allowed too, yes?

Yes, inactive images are only for incoming migration at the moment.

I would like to change that into opening all images as inactive and then
immediately activate them (just to unify two code paths, one of which
isn't tested as much as it should be), but that wouldn't be visible
externally, so it should be fine, too.

> So, it looks like it's ok to just do not autoload bitmaps if we are in
> inactive mode. The difference would be that user will not see these
> bitmaps during migration. And he even may create bitmaps with same
> names, which will lead to fault. But it don't look like real problem.

You can't create persistent bitmaps on an inactive image anyway, because
that would involve writes.

Conflicts with temporary bitmaps might be possible, though.

Kevin
Vladimir Sementsov-Ogievskiy Jan. 10, 2018, 12:52 p.m. UTC | #6
22.12.2017 19:28, Kevin Wolf wrote:
> Am 22.12.2017 um 17:12 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 22.12.2017 18:43, Kevin Wolf wrote:
>>> Am 22.12.2017 um 15:25 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 22.12.2017 16:39, Kevin Wolf wrote:
>>>>> Am 12.12.2017 um 17:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> Consider migration with shared storage. Persistent bitmaps are stored
>>>>>> on bdrv_inactivate. Then, on destination
>>>>>> process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
>>>>>> leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
>>>>>> are already loaded on destination start. In this case we should call
>>>>>> qcow2_reopen_bitmaps_rw instead.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> Reviewed-by: John Snow <jsnow@redhat.com>
>>>>> qcow2_invalidate_cache() calls qcow2_close() first, so why are there
>>>>> still any bitmaps loaded? Isn't this a bug? Do we leak bitmaps when a
>>>>> qcow2 image is closed?
>>>>>
>>>>> Kevin
>>>> Interesting point.
>>>>
>>>> Now persistent dirty bitmaps are released at the end of
>>>> bdrv_inactivate_recurse,
>>>> which is not called here. It was a separate patch
>>>>
>>>> commit 615b5dcf2decbc5f0abb512d13d7e5db2385fa23
>>>> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> Date:   Wed Jun 28 15:05:30 2017 +0300
>>>>
>>>>       block: release persistent bitmaps on inactivate
>>>>
>>>> May be it is more correct to release them immediately after storing, in
>>>> qcow2_inactivete.
>>> I chose the question form because I'm really not deep enough into the
>>> bitmap code to have a solid opinion, but I have a feeling that releasing
>>> the bitmaps from the block driver that provided them would be cleaner
>>> indeed. I suppose the same is true for .bdrv_close.
>>>
>>>> But it will not fix the issue, because qcow2_close will
>>>> call qcow2_inactivate only if (!(s->flags & BDRV_O_INACTIVE)), but it is
>>>> not the case.
>>> Yes, good point.
>>>
>>> Is there a reason why bitmaps are already loaded in qcow2_do_open() even
>>> if the image is inactive? Can bitmaps be meaningfully used on inactive
>>> images?
>>>
>>> Otherwise, we could just make qcow2_load_autoloading_dirty_bitmaps()
>>> conditional on cleared BDRV_O_INACTIVE.
>>>
>>>> or we can do like this, it fixes the new test:
>>>>     static void qcow2_close(BlockDriverState *bs)
>>>> {
>>>>         BDRVQcow2State *s = bs->opaque;
>>>> qemu_vfree(s->l1_table);
>>>>         /* else pre-write overlap checks in cache_destroy may crash */
>>>>         s->l1_table = NULL;
>>>>
>>>>         if (!(s->flags & BDRV_O_INACTIVE)) {
>>>> qcow2_inactivate(bs);
>>>> }
>>>> +     bdrv_release_persistent_dirty_bitmaps(bs);
>>>>
>>>> What do you think?
>>> Hm, I think I don't like this much.
>>>
>>> We just need to decide what the status of inactive images is supposed to
>>> be. If they should have bitmaps, then your patch is probably right. But
>>> if inactive images shouldn't have any, we need to change qcow2_do_open()
>>> and qcow2_inactivate().
>>>
>>> Kevin
>> Does Qemu start in inactive mode when and only when it is incoming
>> migration? In this case I don't see any reason of early-load the
>> bitmaps. Backup in inactive mode should not be allowed too, yes?
> Yes, inactive images are only for incoming migration at the moment.
>
> I would like to change that into opening all images as inactive and then
> immediately activate them (just to unify two code paths, one of which
> isn't tested as much as it should be), but that wouldn't be visible
> externally, so it should be fine, too.

similar approach led to a problem for us:

To make it possible to modify drives through nbd before start (a kind of 
external reatore), I've added simple
@@ -4727,6 +4727,8 @@ int main(int argc, char **argv, char **envp)
          }
      } else if (autostart) {
          vm_start();
+    } else {
+        bdrv_inactivate_all();
      }

and now, we have no dirty bitmaps between paused start and cont. And it 
is not comfortable, we can't query, can't
do something with these bitmaps before vm start.

>
>> So, it looks like it's ok to just do not autoload bitmaps if we are in
>> inactive mode. The difference would be that user will not see these
>> bitmaps during migration. And he even may create bitmaps with same
>> names, which will lead to fault. But it don't look like real problem.
> You can't create persistent bitmaps on an inactive image anyway, because
> that would involve writes.
>
> Conflicts with temporary bitmaps might be possible, though.
>
> Kevin
diff mbox series

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index 1914a940e5..f5f205e1b7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1450,7 +1450,13 @@  static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
         s->autoclear_features &= QCOW2_AUTOCLEAR_MASK;
     }
 
-    if (qcow2_load_autoloading_dirty_bitmaps(bs, &local_err)) {
+    if (bdrv_has_readonly_bitmaps(bs)) {
+        if (!bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
+            bool header_updated = false;
+            qcow2_reopen_bitmaps_rw_hint(bs, &header_updated, &local_err);
+            update_header = update_header && !header_updated;
+        }
+    } else if (qcow2_load_autoloading_dirty_bitmaps(bs, &local_err)) {
         update_header = false;
     }
     if (local_err != NULL) {