diff mbox

[v15,08/25] block: introduce auto-loading bitmaps

Message ID 1487153430-17260-9-git-send-email-vsementsov@virtuozzo.com
State New
Headers show

Commit Message

Vladimir Sementsov-Ogievskiy Feb. 15, 2017, 10:10 a.m. UTC
Auto loading bitmaps are bitmaps stored in the disk image, which should
be loaded when the image is opened and become BdrvDirtyBitmaps for the
corresponding drive.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   | 14 ++++++++++++++
 include/block/block.h     |  2 ++
 include/block/block_int.h |  3 +++
 3 files changed, 19 insertions(+)

Comments

Kevin Wolf Feb. 16, 2017, 11:25 a.m. UTC | #1
Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Auto loading bitmaps are bitmaps stored in the disk image, which should
> be loaded when the image is opened and become BdrvDirtyBitmaps for the
> corresponding drive.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

Why do we need a new BlockDriver callback and special code for it in
bdrv_open_common()? The callback is only ever called immediately after
.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
in their .bdrv_open implementation? Even more so because qcow2 is the
only driver that supports this callback.

Kevin
Kevin Wolf Feb. 16, 2017, 11:49 a.m. UTC | #2
Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > Auto loading bitmaps are bitmaps stored in the disk image, which should
> > be loaded when the image is opened and become BdrvDirtyBitmaps for the
> > corresponding drive.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > Reviewed-by: John Snow <jsnow@redhat.com>
> > Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> Why do we need a new BlockDriver callback and special code for it in
> bdrv_open_common()? The callback is only ever called immediately after
> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
> in their .bdrv_open implementation? Even more so because qcow2 is the
> only driver that supports this callback.

Actually, don't we have to call this in qcow2_invalidate_cache()?
Currently, I think, after a migration, the autoload bitmaps aren't
loaded.

By moving the qcow2_load_autoloading_dirty_bitmaps() call to
qcow2_open(), this would be fixed.

Kevin
Vladimir Sementsov-Ogievskiy Feb. 17, 2017, 11:46 a.m. UTC | #3
16.02.2017 14:49, Kevin Wolf wrote:
> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Auto loading bitmaps are bitmaps stored in the disk image, which should
>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
>>> corresponding drive.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> Reviewed-by: John Snow <jsnow@redhat.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>> Why do we need a new BlockDriver callback and special code for it in
>> bdrv_open_common()? The callback is only ever called immediately after
>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
>> in their .bdrv_open implementation? Even more so because qcow2 is the
>> only driver that supports this callback.
> Actually, don't we have to call this in qcow2_invalidate_cache()?
> Currently, I think, after a migration, the autoload bitmaps aren't
> loaded.
>
> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> qcow2_open(), this would be fixed.
>
> Kevin

Bitmap should not be reloaded on any intermediate qcow2-open's, reopens, 
etc. It should be loaded once, on bdrv_open, to not create extra 
collisions (between in-memory bitmap and it's stored version). That was 
the idea.

For bitmaps migration there are separate series, we shouldn't load 
bitmap from file on migration, as it's version in the file is outdated.

,
Kevin Wolf Feb. 17, 2017, 12:09 p.m. UTC | #4
Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.02.2017 14:49, Kevin Wolf wrote:
> >Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> >>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>Auto loading bitmaps are bitmaps stored in the disk image, which should
> >>>be loaded when the image is opened and become BdrvDirtyBitmaps for the
> >>>corresponding drive.
> >>>
> >>>Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>Reviewed-by: John Snow <jsnow@redhat.com>
> >>>Reviewed-by: Max Reitz <mreitz@redhat.com>
> >>Why do we need a new BlockDriver callback and special code for it in
> >>bdrv_open_common()? The callback is only ever called immediately after
> >>.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
> >>in their .bdrv_open implementation? Even more so because qcow2 is the
> >>only driver that supports this callback.
> >Actually, don't we have to call this in qcow2_invalidate_cache()?
> >Currently, I think, after a migration, the autoload bitmaps aren't
> >loaded.
> >
> >By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> >qcow2_open(), this would be fixed.
> >
> >Kevin
> 
> Bitmap should not be reloaded on any intermediate qcow2-open's,
> reopens, etc. It should be loaded once, on bdrv_open, to not create
> extra collisions (between in-memory bitmap and it's stored version).
> That was the idea.
> 
> For bitmaps migration there are separate series, we shouldn't load
> bitmap from file on migration, as it's version in the file is
> outdated.

That's not what your series is doing, though. It loads the bitmaps when
migration starts and doesn't reload then when migration completes, even
though they are stale. Migration with shared storage would just work
without an extra series if you did these things in the correct places.

As a reminder, this is how migration with shared storage works (or
should work with your series):

1. Start destination qemu instance. This calls bdrv_open() with
   BDRV_O_INACTIVE. We can read in some metadata, though we don't need
   much more than the image size at this point. Writing to the image is
   still impossible.

2. Start migration on the source, while the VM is still writing to the
   image, rendering the cached metadata from step 1 stale.

3. Migration completes:

    a. Stop the VM

    b. Inactivate all images in the source qemu. This is where all
       metadata needs to be written back to the image file, including
       bitmaps. No writes to the image are possible after this point
       because BDRV_O_INACTIVE is set.

    c. Invalidate the caches in the destination qemu, i.e. reload
       everything from the file that could have changed since step 1,
       including bitmaps. BDRV_O_INACTIVE is cleared, making the image
       ready for writes.

    d. Resume the VM on the destination

4. Exit the source qemu process, which involves bdrv_close(). Note that
   at this point, no writing to the image file is possible any more,
   it's the destination qemu process that own the image file now.

Your series loads and stores bitmaps in steps 1 and 4. This means that
they are stale on the destination when migration completes, and that
bdrv_close() wants to write to an image file that it doesn't own any
more, which will cause an assertion failure. If you instead move things
to steps 3b and 3c, it will just work.

Kevin
Vladimir Sementsov-Ogievskiy Feb. 17, 2017, 12:40 p.m. UTC | #5
17.02.2017 15:09, Kevin Wolf wrote:
> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 16.02.2017 14:49, Kevin Wolf wrote:
>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should
>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
>>>>> corresponding drive.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> Reviewed-by: John Snow <jsnow@redhat.com>
>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>>> Why do we need a new BlockDriver callback and special code for it in
>>>> bdrv_open_common()? The callback is only ever called immediately after
>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
>>>> in their .bdrv_open implementation? Even more so because qcow2 is the
>>>> only driver that supports this callback.
>>> Actually, don't we have to call this in qcow2_invalidate_cache()?
>>> Currently, I think, after a migration, the autoload bitmaps aren't
>>> loaded.
>>>
>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
>>> qcow2_open(), this would be fixed.
>>>
>>> Kevin
>> Bitmap should not be reloaded on any intermediate qcow2-open's,
>> reopens, etc. It should be loaded once, on bdrv_open, to not create
>> extra collisions (between in-memory bitmap and it's stored version).
>> That was the idea.
>>
>> For bitmaps migration there are separate series, we shouldn't load
>> bitmap from file on migration, as it's version in the file is
>> outdated.
> That's not what your series is doing, though. It loads the bitmaps when

Actually, they will not be loaded as they will have IN_USE flag.

> migration starts and doesn't reload then when migration completes, even
> though they are stale. Migration with shared storage would just work
> without an extra series if you did these things in the correct places.
>
> As a reminder, this is how migration with shared storage works (or
> should work with your series):
>
> 1. Start destination qemu instance. This calls bdrv_open() with
>     BDRV_O_INACTIVE. We can read in some metadata, though we don't need
>     much more than the image size at this point. Writing to the image is
>     still impossible.
>
> 2. Start migration on the source, while the VM is still writing to the
>     image, rendering the cached metadata from step 1 stale.
>
> 3. Migration completes:
>
>      a. Stop the VM
>
>      b. Inactivate all images in the source qemu. This is where all
>         metadata needs to be written back to the image file, including
>         bitmaps. No writes to the image are possible after this point
>         because BDRV_O_INACTIVE is set.
>
>      c. Invalidate the caches in the destination qemu, i.e. reload
>         everything from the file that could have changed since step 1,
>         including bitmaps. BDRV_O_INACTIVE is cleared, making the image
>         ready for writes.
>
>      d. Resume the VM on the destination
>
> 4. Exit the source qemu process, which involves bdrv_close(). Note that
>     at this point, no writing to the image file is possible any more,
>     it's the destination qemu process that own the image file now.
>
> Your series loads and stores bitmaps in steps 1 and 4. This means that

Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and it 
is checked), nothing is stored.

> they are stale on the destination when migration completes, and that
> bdrv_close() wants to write to an image file that it doesn't own any
> more, which will cause an assertion failure. If you instead move things
> to steps 3b and 3c, it will just work.

Hmm, I understand the idea.. But this will interfere with postcopy 
bitmap migration. So if we really need this, there should be some 
additional control flags or capabilities.. The problem of your approach 
is that bitmap actually migrated in the short state when source and 
destination are stopped, it may take time, as bitmaps may be large.

>
> Kevin
Kevin Wolf Feb. 17, 2017, 12:48 p.m. UTC | #6
Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 17.02.2017 15:09, Kevin Wolf wrote:
> >Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>16.02.2017 14:49, Kevin Wolf wrote:
> >>>Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> >>>>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>>>Auto loading bitmaps are bitmaps stored in the disk image, which should
> >>>>>be loaded when the image is opened and become BdrvDirtyBitmaps for the
> >>>>>corresponding drive.
> >>>>>
> >>>>>Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>>>Reviewed-by: John Snow <jsnow@redhat.com>
> >>>>>Reviewed-by: Max Reitz <mreitz@redhat.com>
> >>>>Why do we need a new BlockDriver callback and special code for it in
> >>>>bdrv_open_common()? The callback is only ever called immediately after
> >>>>.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
> >>>>in their .bdrv_open implementation? Even more so because qcow2 is the
> >>>>only driver that supports this callback.
> >>>Actually, don't we have to call this in qcow2_invalidate_cache()?
> >>>Currently, I think, after a migration, the autoload bitmaps aren't
> >>>loaded.
> >>>
> >>>By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> >>>qcow2_open(), this would be fixed.
> >>>
> >>>Kevin
> >>Bitmap should not be reloaded on any intermediate qcow2-open's,
> >>reopens, etc. It should be loaded once, on bdrv_open, to not create
> >>extra collisions (between in-memory bitmap and it's stored version).
> >>That was the idea.
> >>
> >>For bitmaps migration there are separate series, we shouldn't load
> >>bitmap from file on migration, as it's version in the file is
> >>outdated.
> >That's not what your series is doing, though. It loads the bitmaps when
> 
> Actually, they will not be loaded as they will have IN_USE flag.
> 
> >migration starts and doesn't reload then when migration completes, even
> >though they are stale. Migration with shared storage would just work
> >without an extra series if you did these things in the correct places.
> >
> >As a reminder, this is how migration with shared storage works (or
> >should work with your series):
> >
> >1. Start destination qemu instance. This calls bdrv_open() with
> >    BDRV_O_INACTIVE. We can read in some metadata, though we don't need
> >    much more than the image size at this point. Writing to the image is
> >    still impossible.
> >
> >2. Start migration on the source, while the VM is still writing to the
> >    image, rendering the cached metadata from step 1 stale.
> >
> >3. Migration completes:
> >
> >     a. Stop the VM
> >
> >     b. Inactivate all images in the source qemu. This is where all
> >        metadata needs to be written back to the image file, including
> >        bitmaps. No writes to the image are possible after this point
> >        because BDRV_O_INACTIVE is set.
> >
> >     c. Invalidate the caches in the destination qemu, i.e. reload
> >        everything from the file that could have changed since step 1,
> >        including bitmaps. BDRV_O_INACTIVE is cleared, making the image
> >        ready for writes.
> >
> >     d. Resume the VM on the destination
> >
> >4. Exit the source qemu process, which involves bdrv_close(). Note that
> >    at this point, no writing to the image file is possible any more,
> >    it's the destination qemu process that own the image file now.
> >
> >Your series loads and stores bitmaps in steps 1 and 4. This means that
> 
> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
> it is checked), nothing is stored.
> 
> >they are stale on the destination when migration completes, and that
> >bdrv_close() wants to write to an image file that it doesn't own any
> >more, which will cause an assertion failure. If you instead move things
> >to steps 3b and 3c, it will just work.
> 
> Hmm, I understand the idea.. But this will interfere with postcopy
> bitmap migration. So if we really need this, there should be some
> additional control flags or capabilities.. The problem of your
> approach is that bitmap actually migrated in the short state when
> source and destination are stopped, it may take time, as bitmaps may
> be large.

You can always add optimisations, but this is the basic lifecycle
process of block devices in qemu, so it would be good to adhere to it.
So far there are no other pieces of information that are ignored in
bdrv_invalidate()/bdrv_inactivate() and instead only handled in
bdrv_open()/bdrv_close(). It's a matter of consistency, too.

And not having to add special cases for specific features in the generic
bdrv_open()/close() paths is a big plus for me anyway.

Kevin
Denis V. Lunev Feb. 17, 2017, 1:22 p.m. UTC | #7
On 02/17/2017 03:48 PM, Kevin Wolf wrote:
> Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 17.02.2017 15:09, Kevin Wolf wrote:
>>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 16.02.2017 14:49, Kevin Wolf wrote:
>>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
>>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should
>>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
>>>>>>> corresponding drive.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>> Reviewed-by: John Snow <jsnow@redhat.com>
>>>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>>>>> Why do we need a new BlockDriver callback and special code for it in
>>>>>> bdrv_open_common()? The callback is only ever called immediately after
>>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
>>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the
>>>>>> only driver that supports this callback.
>>>>> Actually, don't we have to call this in qcow2_invalidate_cache()?
>>>>> Currently, I think, after a migration, the autoload bitmaps aren't
>>>>> loaded.
>>>>>
>>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
>>>>> qcow2_open(), this would be fixed.
>>>>>
>>>>> Kevin
>>>> Bitmap should not be reloaded on any intermediate qcow2-open's,
>>>> reopens, etc. It should be loaded once, on bdrv_open, to not create
>>>> extra collisions (between in-memory bitmap and it's stored version).
>>>> That was the idea.
>>>>
>>>> For bitmaps migration there are separate series, we shouldn't load
>>>> bitmap from file on migration, as it's version in the file is
>>>> outdated.
>>> That's not what your series is doing, though. It loads the bitmaps when
>> Actually, they will not be loaded as they will have IN_USE flag.
>>
>>> migration starts and doesn't reload then when migration completes, even
>>> though they are stale. Migration with shared storage would just work
>>> without an extra series if you did these things in the correct places.
>>>
>>> As a reminder, this is how migration with shared storage works (or
>>> should work with your series):
>>>
>>> 1. Start destination qemu instance. This calls bdrv_open() with
>>>    BDRV_O_INACTIVE. We can read in some metadata, though we don't need
>>>    much more than the image size at this point. Writing to the image is
>>>    still impossible.
>>>
>>> 2. Start migration on the source, while the VM is still writing to the
>>>    image, rendering the cached metadata from step 1 stale.
>>>
>>> 3. Migration completes:
>>>
>>>     a. Stop the VM
>>>
>>>     b. Inactivate all images in the source qemu. This is where all
>>>        metadata needs to be written back to the image file, including
>>>        bitmaps. No writes to the image are possible after this point
>>>        because BDRV_O_INACTIVE is set.
>>>
>>>     c. Invalidate the caches in the destination qemu, i.e. reload
>>>        everything from the file that could have changed since step 1,
>>>        including bitmaps. BDRV_O_INACTIVE is cleared, making the image
>>>        ready for writes.
>>>
>>>     d. Resume the VM on the destination
>>>
>>> 4. Exit the source qemu process, which involves bdrv_close(). Note that
>>>    at this point, no writing to the image file is possible any more,
>>>    it's the destination qemu process that own the image file now.
>>>
>>> Your series loads and stores bitmaps in steps 1 and 4. This means that
>> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
>> it is checked), nothing is stored.
>>
>>> they are stale on the destination when migration completes, and that
>>> bdrv_close() wants to write to an image file that it doesn't own any
>>> more, which will cause an assertion failure. If you instead move things
>>> to steps 3b and 3c, it will just work.
>> Hmm, I understand the idea.. But this will interfere with postcopy
>> bitmap migration. So if we really need this, there should be some
>> additional control flags or capabilities.. The problem of your
>> approach is that bitmap actually migrated in the short state when
>> source and destination are stopped, it may take time, as bitmaps may
>> be large.
> You can always add optimisations, but this is the basic lifecycle
> process of block devices in qemu, so it would be good to adhere to it.
> So far there are no other pieces of information that are ignored in
> bdrv_invalidate()/bdrv_inactivate() and instead only handled in
> bdrv_open()/bdrv_close(). It's a matter of consistency, too.
>
> And not having to add special cases for specific features in the generic
> bdrv_open()/close() paths is a big plus for me anyway.
>
> Kevin
But for sure this is bad from the downtime point of view.
On migrate you will have to write to the image and re-read
it again on the target. This would be very slow. This will
not help for the migration with non-shared disk too.

That is why we have specifically worked in a migration,
which for a good does not influence downtime at all now.

With a write we are issuing several write requests + sync.
Our measurements shows that bdrv_drain could take around
a second on an averagely loaded conventional system, which
seems unacceptable addition to me.

Den
Kevin Wolf Feb. 17, 2017, 1:34 p.m. UTC | #8
Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> On 02/17/2017 03:48 PM, Kevin Wolf wrote:
> > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 17.02.2017 15:09, Kevin Wolf wrote:
> >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>> 16.02.2017 14:49, Kevin Wolf wrote:
> >>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> >>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should
> >>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
> >>>>>>> corresponding drive.
> >>>>>>>
> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>>>>> Reviewed-by: John Snow <jsnow@redhat.com>
> >>>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
> >>>>>> Why do we need a new BlockDriver callback and special code for it in
> >>>>>> bdrv_open_common()? The callback is only ever called immediately after
> >>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
> >>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the
> >>>>>> only driver that supports this callback.
> >>>>> Actually, don't we have to call this in qcow2_invalidate_cache()?
> >>>>> Currently, I think, after a migration, the autoload bitmaps aren't
> >>>>> loaded.
> >>>>>
> >>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> >>>>> qcow2_open(), this would be fixed.
> >>>>>
> >>>>> Kevin
> >>>> Bitmap should not be reloaded on any intermediate qcow2-open's,
> >>>> reopens, etc. It should be loaded once, on bdrv_open, to not create
> >>>> extra collisions (between in-memory bitmap and it's stored version).
> >>>> That was the idea.
> >>>>
> >>>> For bitmaps migration there are separate series, we shouldn't load
> >>>> bitmap from file on migration, as it's version in the file is
> >>>> outdated.
> >>> That's not what your series is doing, though. It loads the bitmaps when
> >> Actually, they will not be loaded as they will have IN_USE flag.
> >>
> >>> migration starts and doesn't reload then when migration completes, even
> >>> though they are stale. Migration with shared storage would just work
> >>> without an extra series if you did these things in the correct places.
> >>>
> >>> As a reminder, this is how migration with shared storage works (or
> >>> should work with your series):
> >>>
> >>> 1. Start destination qemu instance. This calls bdrv_open() with
> >>>    BDRV_O_INACTIVE. We can read in some metadata, though we don't need
> >>>    much more than the image size at this point. Writing to the image is
> >>>    still impossible.
> >>>
> >>> 2. Start migration on the source, while the VM is still writing to the
> >>>    image, rendering the cached metadata from step 1 stale.
> >>>
> >>> 3. Migration completes:
> >>>
> >>>     a. Stop the VM
> >>>
> >>>     b. Inactivate all images in the source qemu. This is where all
> >>>        metadata needs to be written back to the image file, including
> >>>        bitmaps. No writes to the image are possible after this point
> >>>        because BDRV_O_INACTIVE is set.
> >>>
> >>>     c. Invalidate the caches in the destination qemu, i.e. reload
> >>>        everything from the file that could have changed since step 1,
> >>>        including bitmaps. BDRV_O_INACTIVE is cleared, making the image
> >>>        ready for writes.
> >>>
> >>>     d. Resume the VM on the destination
> >>>
> >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that
> >>>    at this point, no writing to the image file is possible any more,
> >>>    it's the destination qemu process that own the image file now.
> >>>
> >>> Your series loads and stores bitmaps in steps 1 and 4. This means that
> >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
> >> it is checked), nothing is stored.
> >>
> >>> they are stale on the destination when migration completes, and that
> >>> bdrv_close() wants to write to an image file that it doesn't own any
> >>> more, which will cause an assertion failure. If you instead move things
> >>> to steps 3b and 3c, it will just work.
> >> Hmm, I understand the idea.. But this will interfere with postcopy
> >> bitmap migration. So if we really need this, there should be some
> >> additional control flags or capabilities.. The problem of your
> >> approach is that bitmap actually migrated in the short state when
> >> source and destination are stopped, it may take time, as bitmaps may
> >> be large.
> > You can always add optimisations, but this is the basic lifecycle
> > process of block devices in qemu, so it would be good to adhere to it.
> > So far there are no other pieces of information that are ignored in
> > bdrv_invalidate()/bdrv_inactivate() and instead only handled in
> > bdrv_open()/bdrv_close(). It's a matter of consistency, too.
> >
> > And not having to add special cases for specific features in the generic
> > bdrv_open()/close() paths is a big plus for me anyway.
> >
> > Kevin
> But for sure this is bad from the downtime point of view.
> On migrate you will have to write to the image and re-read
> it again on the target. This would be very slow. This will
> not help for the migration with non-shared disk too.
> 
> That is why we have specifically worked in a migration,
> which for a good does not influence downtime at all now.
> 
> With a write we are issuing several write requests + sync.
> Our measurements shows that bdrv_drain could take around
> a second on an averagely loaded conventional system, which
> seems unacceptable addition to me.

I'm not arguing against optimising migration, I fully agree with you. I
just think that we should start with a correct if slow base version and
then add optimisation to that, instead of starting with a broken base
version and adding to that.

Look, whether you do the expensive I/O on open/close and make that a
slow operation or whether you do it on invalidate_cache/inactivate
doesn't really make a difference in term of slowness because in general
both operations are called exactly once. But it does make a difference
in terms of correctness.

Once you do the optimisation, of course, you'll skip writing those
bitmaps that you transfer using a different channel, no matter whether
you skip it in bdrv_close() or in bdrv_inactivate().

Kevin
Denis V. Lunev Feb. 17, 2017, 1:48 p.m. UTC | #9
On 02/17/2017 04:34 PM, Kevin Wolf wrote:
> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
>> On 02/17/2017 03:48 PM, Kevin Wolf wrote:
>>> Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 17.02.2017 15:09, Kevin Wolf wrote:
>>>>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> 16.02.2017 14:49, Kevin Wolf wrote:
>>>>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
>>>>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should
>>>>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
>>>>>>>>> corresponding drive.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>> Reviewed-by: John Snow <jsnow@redhat.com>
>>>>>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> Why do we need a new BlockDriver callback and special code for it in
>>>>>>>> bdrv_open_common()? The callback is only ever called immediately after
>>>>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
>>>>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the
>>>>>>>> only driver that supports this callback.
>>>>>>> Actually, don't we have to call this in qcow2_invalidate_cache()?
>>>>>>> Currently, I think, after a migration, the autoload bitmaps aren't
>>>>>>> loaded.
>>>>>>>
>>>>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
>>>>>>> qcow2_open(), this would be fixed.
>>>>>>>
>>>>>>> Kevin
>>>>>> Bitmap should not be reloaded on any intermediate qcow2-open's,
>>>>>> reopens, etc. It should be loaded once, on bdrv_open, to not create
>>>>>> extra collisions (between in-memory bitmap and it's stored version).
>>>>>> That was the idea.
>>>>>>
>>>>>> For bitmaps migration there are separate series, we shouldn't load
>>>>>> bitmap from file on migration, as it's version in the file is
>>>>>> outdated.
>>>>> That's not what your series is doing, though. It loads the bitmaps when
>>>> Actually, they will not be loaded as they will have IN_USE flag.
>>>>
>>>>> migration starts and doesn't reload then when migration completes, even
>>>>> though they are stale. Migration with shared storage would just work
>>>>> without an extra series if you did these things in the correct places.
>>>>>
>>>>> As a reminder, this is how migration with shared storage works (or
>>>>> should work with your series):
>>>>>
>>>>> 1. Start destination qemu instance. This calls bdrv_open() with
>>>>>    BDRV_O_INACTIVE. We can read in some metadata, though we don't need
>>>>>    much more than the image size at this point. Writing to the image is
>>>>>    still impossible.
>>>>>
>>>>> 2. Start migration on the source, while the VM is still writing to the
>>>>>    image, rendering the cached metadata from step 1 stale.
>>>>>
>>>>> 3. Migration completes:
>>>>>
>>>>>     a. Stop the VM
>>>>>
>>>>>     b. Inactivate all images in the source qemu. This is where all
>>>>>        metadata needs to be written back to the image file, including
>>>>>        bitmaps. No writes to the image are possible after this point
>>>>>        because BDRV_O_INACTIVE is set.
>>>>>
>>>>>     c. Invalidate the caches in the destination qemu, i.e. reload
>>>>>        everything from the file that could have changed since step 1,
>>>>>        including bitmaps. BDRV_O_INACTIVE is cleared, making the image
>>>>>        ready for writes.
>>>>>
>>>>>     d. Resume the VM on the destination
>>>>>
>>>>> 4. Exit the source qemu process, which involves bdrv_close(). Note that
>>>>>    at this point, no writing to the image file is possible any more,
>>>>>    it's the destination qemu process that own the image file now.
>>>>>
>>>>> Your series loads and stores bitmaps in steps 1 and 4. This means that
>>>> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
>>>> it is checked), nothing is stored.
>>>>
>>>>> they are stale on the destination when migration completes, and that
>>>>> bdrv_close() wants to write to an image file that it doesn't own any
>>>>> more, which will cause an assertion failure. If you instead move things
>>>>> to steps 3b and 3c, it will just work.
>>>> Hmm, I understand the idea.. But this will interfere with postcopy
>>>> bitmap migration. So if we really need this, there should be some
>>>> additional control flags or capabilities.. The problem of your
>>>> approach is that bitmap actually migrated in the short state when
>>>> source and destination are stopped, it may take time, as bitmaps may
>>>> be large.
>>> You can always add optimisations, but this is the basic lifecycle
>>> process of block devices in qemu, so it would be good to adhere to it.
>>> So far there are no other pieces of information that are ignored in
>>> bdrv_invalidate()/bdrv_inactivate() and instead only handled in
>>> bdrv_open()/bdrv_close(). It's a matter of consistency, too.
>>>
>>> And not having to add special cases for specific features in the generic
>>> bdrv_open()/close() paths is a big plus for me anyway.
>>>
>>> Kevin
>> But for sure this is bad from the downtime point of view.
>> On migrate you will have to write to the image and re-read
>> it again on the target. This would be very slow. This will
>> not help for the migration with non-shared disk too.
>>
>> That is why we have specifically worked in a migration,
>> which for a good does not influence downtime at all now.
>>
>> With a write we are issuing several write requests + sync.
>> Our measurements shows that bdrv_drain could take around
>> a second on an averagely loaded conventional system, which
>> seems unacceptable addition to me.
> I'm not arguing against optimising migration, I fully agree with you. I
> just think that we should start with a correct if slow base version and
> then add optimisation to that, instead of starting with a broken base
> version and adding to that.
>
> Look, whether you do the expensive I/O on open/close and make that a
> slow operation or whether you do it on invalidate_cache/inactivate
> doesn't really make a difference in term of slowness because in general
> both operations are called exactly once. But it does make a difference
> in terms of correctness.
>
> Once you do the optimisation, of course, you'll skip writing those
> bitmaps that you transfer using a different channel, no matter whether
> you skip it in bdrv_close() or in bdrv_inactivate().
>
> Kevin
I do not understand this point as in order to optimize this
we will have to create specific code path or option from
the migration code and keep this as an ugly kludge forever.

Den
Kevin Wolf Feb. 17, 2017, 2:24 p.m. UTC | #10
Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
> > Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> >> But for sure this is bad from the downtime point of view.
> >> On migrate you will have to write to the image and re-read
> >> it again on the target. This would be very slow. This will
> >> not help for the migration with non-shared disk too.
> >>
> >> That is why we have specifically worked in a migration,
> >> which for a good does not influence downtime at all now.
> >>
> >> With a write we are issuing several write requests + sync.
> >> Our measurements shows that bdrv_drain could take around
> >> a second on an averagely loaded conventional system, which
> >> seems unacceptable addition to me.
> > I'm not arguing against optimising migration, I fully agree with you. I
> > just think that we should start with a correct if slow base version and
> > then add optimisation to that, instead of starting with a broken base
> > version and adding to that.
> >
> > Look, whether you do the expensive I/O on open/close and make that a
> > slow operation or whether you do it on invalidate_cache/inactivate
> > doesn't really make a difference in term of slowness because in general
> > both operations are called exactly once. But it does make a difference
> > in terms of correctness.
> >
> > Once you do the optimisation, of course, you'll skip writing those
> > bitmaps that you transfer using a different channel, no matter whether
> > you skip it in bdrv_close() or in bdrv_inactivate().
> >
> > Kevin
> I do not understand this point as in order to optimize this
> we will have to create specific code path or option from
> the migration code and keep this as an ugly kludge forever.

The point that I don't understand is why it makes any difference for the
follow-up migration series whether the writeout is in bdrv_close() or
bdrv_inactivate(). I don't really see the difference between the two
from a migration POV; both need to be skipped if we transfer the bitmap
using a different channel.

Maybe I would see the reason if I could find the time to look at the
migration patches first, but unfortunately I don't have this time at the
moment.

My point is just that generally we want to have a correctly working qemu
after every single patch, and even more importantly after every series.
As the migration series is separate from this, I don't think it's a good
excuse for doing worse than we could easily do here.

Kevin
Vladimir Sementsov-Ogievskiy Feb. 17, 2017, 2:54 p.m. UTC | #11
17.02.2017 17:24, Kevin Wolf wrote:
> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
>>>> But for sure this is bad from the downtime point of view.
>>>> On migrate you will have to write to the image and re-read
>>>> it again on the target. This would be very slow. This will
>>>> not help for the migration with non-shared disk too.
>>>>
>>>> That is why we have specifically worked in a migration,
>>>> which for a good does not influence downtime at all now.
>>>>
>>>> With a write we are issuing several write requests + sync.
>>>> Our measurements shows that bdrv_drain could take around
>>>> a second on an averagely loaded conventional system, which
>>>> seems unacceptable addition to me.
>>> I'm not arguing against optimising migration, I fully agree with you. I
>>> just think that we should start with a correct if slow base version and
>>> then add optimisation to that, instead of starting with a broken base
>>> version and adding to that.
>>>
>>> Look, whether you do the expensive I/O on open/close and make that a
>>> slow operation or whether you do it on invalidate_cache/inactivate
>>> doesn't really make a difference in term of slowness because in general
>>> both operations are called exactly once. But it does make a difference
>>> in terms of correctness.
>>>
>>> Once you do the optimisation, of course, you'll skip writing those
>>> bitmaps that you transfer using a different channel, no matter whether
>>> you skip it in bdrv_close() or in bdrv_inactivate().
>>>
>>> Kevin
>> I do not understand this point as in order to optimize this
>> we will have to create specific code path or option from
>> the migration code and keep this as an ugly kludge forever.
> The point that I don't understand is why it makes any difference for the
> follow-up migration series whether the writeout is in bdrv_close() or
> bdrv_inactivate(). I don't really see the difference between the two
> from a migration POV; both need to be skipped if we transfer the bitmap
> using a different channel.
>
> Maybe I would see the reason if I could find the time to look at the
> migration patches first, but unfortunately I don't have this time at the
> moment.
>
> My point is just that generally we want to have a correctly working qemu
> after every single patch, and even more importantly after every series.
> As the migration series is separate from this, I don't think it's a good
> excuse for doing worse than we could easily do here.
>
> Kevin

With open/close all is already ok - bitmaps will not be saved because of 
BDRV_O_INACTIVE  and will not be loaded because of IN_USE.
Denis V. Lunev Feb. 18, 2017, 10:54 a.m. UTC | #12
On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote:
> 17.02.2017 17:24, Kevin Wolf wrote:
>> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
>>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
>>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
>>>>> But for sure this is bad from the downtime point of view.
>>>>> On migrate you will have to write to the image and re-read
>>>>> it again on the target. This would be very slow. This will
>>>>> not help for the migration with non-shared disk too.
>>>>>
>>>>> That is why we have specifically worked in a migration,
>>>>> which for a good does not influence downtime at all now.
>>>>>
>>>>> With a write we are issuing several write requests + sync.
>>>>> Our measurements shows that bdrv_drain could take around
>>>>> a second on an averagely loaded conventional system, which
>>>>> seems unacceptable addition to me.
>>>> I'm not arguing against optimising migration, I fully agree with
>>>> you. I
>>>> just think that we should start with a correct if slow base version
>>>> and
>>>> then add optimisation to that, instead of starting with a broken base
>>>> version and adding to that.
>>>>
>>>> Look, whether you do the expensive I/O on open/close and make that a
>>>> slow operation or whether you do it on invalidate_cache/inactivate
>>>> doesn't really make a difference in term of slowness because in
>>>> general
>>>> both operations are called exactly once. But it does make a difference
>>>> in terms of correctness.
>>>>
>>>> Once you do the optimisation, of course, you'll skip writing those
>>>> bitmaps that you transfer using a different channel, no matter whether
>>>> you skip it in bdrv_close() or in bdrv_inactivate().
>>>>
>>>> Kevin
>>> I do not understand this point as in order to optimize this
>>> we will have to create specific code path or option from
>>> the migration code and keep this as an ugly kludge forever.
>> The point that I don't understand is why it makes any difference for the
>> follow-up migration series whether the writeout is in bdrv_close() or
>> bdrv_inactivate(). I don't really see the difference between the two
>> from a migration POV; both need to be skipped if we transfer the bitmap
>> using a different channel.
>>
>> Maybe I would see the reason if I could find the time to look at the
>> migration patches first, but unfortunately I don't have this time at the
>> moment.
>>
>> My point is just that generally we want to have a correctly working qemu
>> after every single patch, and even more importantly after every series.
>> As the migration series is separate from this, I don't think it's a good
>> excuse for doing worse than we could easily do here.
>>
>> Kevin
>
> With open/close all is already ok - bitmaps will not be saved because
> of BDRV_O_INACTIVE  and will not be loaded because of IN_USE.
>
>
in any case load/store happens outside of VM downtime.
The target is running at the moment of close on source,
the source is running at the moment of open on target.

Den
Vladimir Sementsov-Ogievskiy Feb. 20, 2017, 10:09 a.m. UTC | #13
17.02.2017 17:24, Kevin Wolf wrote:
> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
>>>> But for sure this is bad from the downtime point of view.
>>>> On migrate you will have to write to the image and re-read
>>>> it again on the target. This would be very slow. This will
>>>> not help for the migration with non-shared disk too.
>>>>
>>>> That is why we have specifically worked in a migration,
>>>> which for a good does not influence downtime at all now.
>>>>
>>>> With a write we are issuing several write requests + sync.
>>>> Our measurements shows that bdrv_drain could take around
>>>> a second on an averagely loaded conventional system, which
>>>> seems unacceptable addition to me.
>>> I'm not arguing against optimising migration, I fully agree with you. I
>>> just think that we should start with a correct if slow base version and
>>> then add optimisation to that, instead of starting with a broken base
>>> version and adding to that.
>>>
>>> Look, whether you do the expensive I/O on open/close and make that a
>>> slow operation or whether you do it on invalidate_cache/inactivate
>>> doesn't really make a difference in term of slowness because in general
>>> both operations are called exactly once. But it does make a difference
>>> in terms of correctness.
>>>
>>> Once you do the optimisation, of course, you'll skip writing those
>>> bitmaps that you transfer using a different channel, no matter whether
>>> you skip it in bdrv_close() or in bdrv_inactivate().
>>>
>>> Kevin
>> I do not understand this point as in order to optimize this
>> we will have to create specific code path or option from
>> the migration code and keep this as an ugly kludge forever.
> The point that I don't understand is why it makes any difference for the
> follow-up migration series whether the writeout is in bdrv_close() or
> bdrv_inactivate(). I don't really see the difference between the two
> from a migration POV; both need to be skipped if we transfer the bitmap
> using a different channel.
>
> Maybe I would see the reason if I could find the time to look at the
> migration patches first, but unfortunately I don't have this time at the
> moment.
>
> My point is just that generally we want to have a correctly working qemu
> after every single patch, and even more importantly after every series.
> As the migration series is separate from this, I don't think it's a good
> excuse for doing worse than we could easily do here.
>
> Kevin
>
Bitmaps are not just qcow2 metadata. They belongs to generic block 
layer. And qcow2's ability to store bitmap is used to realize the common 
interface. After bitmap is loaded and turned into BdrvDirtyBitmap it not 
belongs to qcow2.. And qcow2 layer should not touch it on intermediate 
reopens..

We can introduce additional flags to control loading of autoloading 
bitmaps. Isn't it better to handle them in common place for all formats? 
(yes, for now only qcow2 can store bitmaps, but parallels format defines 
this ability too, in future we may have some way to load bitmaps for raw 
format.. Also, there is bitmap related extension for NBD protocol).

bdrv_load_dirty_bitmap, defined in my series about NBD BLOCK_STATUS is 
definetly generic, as it should be called from qmp command.. (just note, 
not concrete argument)
Kevin Wolf Feb. 20, 2017, 11:15 a.m. UTC | #14
Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben:
> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote:
> > 17.02.2017 17:24, Kevin Wolf wrote:
> >> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
> >>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
> >>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> >>>>> But for sure this is bad from the downtime point of view.
> >>>>> On migrate you will have to write to the image and re-read
> >>>>> it again on the target. This would be very slow. This will
> >>>>> not help for the migration with non-shared disk too.
> >>>>>
> >>>>> That is why we have specifically worked in a migration,
> >>>>> which for a good does not influence downtime at all now.
> >>>>>
> >>>>> With a write we are issuing several write requests + sync.
> >>>>> Our measurements shows that bdrv_drain could take around
> >>>>> a second on an averagely loaded conventional system, which
> >>>>> seems unacceptable addition to me.
> >>>> I'm not arguing against optimising migration, I fully agree with
> >>>> you. I
> >>>> just think that we should start with a correct if slow base version
> >>>> and
> >>>> then add optimisation to that, instead of starting with a broken base
> >>>> version and adding to that.
> >>>>
> >>>> Look, whether you do the expensive I/O on open/close and make that a
> >>>> slow operation or whether you do it on invalidate_cache/inactivate
> >>>> doesn't really make a difference in term of slowness because in
> >>>> general
> >>>> both operations are called exactly once. But it does make a difference
> >>>> in terms of correctness.
> >>>>
> >>>> Once you do the optimisation, of course, you'll skip writing those
> >>>> bitmaps that you transfer using a different channel, no matter whether
> >>>> you skip it in bdrv_close() or in bdrv_inactivate().
> >>>>
> >>>> Kevin
> >>> I do not understand this point as in order to optimize this
> >>> we will have to create specific code path or option from
> >>> the migration code and keep this as an ugly kludge forever.
> >> The point that I don't understand is why it makes any difference for the
> >> follow-up migration series whether the writeout is in bdrv_close() or
> >> bdrv_inactivate(). I don't really see the difference between the two
> >> from a migration POV; both need to be skipped if we transfer the bitmap
> >> using a different channel.
> >>
> >> Maybe I would see the reason if I could find the time to look at the
> >> migration patches first, but unfortunately I don't have this time at the
> >> moment.
> >>
> >> My point is just that generally we want to have a correctly working qemu
> >> after every single patch, and even more importantly after every series.
> >> As the migration series is separate from this, I don't think it's a good
> >> excuse for doing worse than we could easily do here.
> >>
> >> Kevin
> >
> > With open/close all is already ok - bitmaps will not be saved because
> > of BDRV_O_INACTIVE  and will not be loaded because of IN_USE.

If the contents of so-called persistent bitmaps is lost across
migration and stale after bdrv_invalidate_cache, that's not "all ok" in
my book. It's buggy.

> in any case load/store happens outside of VM downtime.
> The target is running at the moment of close on source,
> the source is running at the moment of open on target.

Which makes the operation in open/close meaningless: open reads data
which may still by changed, and close cannot write to the image any more
because ownership is already given up.

Anyway, all of this isn't really productive. Please, can't you just
answer that simple question that I asked you above: What problems would
you get if you used invalidate_cache/inactivate instead of open/close
for triggering these actions?

If you can bring up some "this would break X", "it would be very hard to
implement Y correctly then" or "in scenario Z, this would have unwanted
effects", then we can figure out what to do. But not putting things in
the proper place just because you don't feel like it isn't a very strong
argument.

Kevin
Denis V. Lunev Feb. 20, 2017, 11:21 a.m. UTC | #15
On 02/20/2017 12:15 PM, Kevin Wolf wrote:
> Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben:
>> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.02.2017 17:24, Kevin Wolf wrote:
>>>> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
>>>>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
>>>>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
>>>>>>> But for sure this is bad from the downtime point of view.
>>>>>>> On migrate you will have to write to the image and re-read
>>>>>>> it again on the target. This would be very slow. This will
>>>>>>> not help for the migration with non-shared disk too.
>>>>>>>
>>>>>>> That is why we have specifically worked in a migration,
>>>>>>> which for a good does not influence downtime at all now.
>>>>>>>
>>>>>>> With a write we are issuing several write requests + sync.
>>>>>>> Our measurements shows that bdrv_drain could take around
>>>>>>> a second on an averagely loaded conventional system, which
>>>>>>> seems unacceptable addition to me.
>>>>>> I'm not arguing against optimising migration, I fully agree with
>>>>>> you. I
>>>>>> just think that we should start with a correct if slow base version
>>>>>> and
>>>>>> then add optimisation to that, instead of starting with a broken base
>>>>>> version and adding to that.
>>>>>>
>>>>>> Look, whether you do the expensive I/O on open/close and make that a
>>>>>> slow operation or whether you do it on invalidate_cache/inactivate
>>>>>> doesn't really make a difference in term of slowness because in
>>>>>> general
>>>>>> both operations are called exactly once. But it does make a difference
>>>>>> in terms of correctness.
>>>>>>
>>>>>> Once you do the optimisation, of course, you'll skip writing those
>>>>>> bitmaps that you transfer using a different channel, no matter whether
>>>>>> you skip it in bdrv_close() or in bdrv_inactivate().
>>>>>>
>>>>>> Kevin
>>>>> I do not understand this point as in order to optimize this
>>>>> we will have to create specific code path or option from
>>>>> the migration code and keep this as an ugly kludge forever.
>>>> The point that I don't understand is why it makes any difference for the
>>>> follow-up migration series whether the writeout is in bdrv_close() or
>>>> bdrv_inactivate(). I don't really see the difference between the two
>>>> from a migration POV; both need to be skipped if we transfer the bitmap
>>>> using a different channel.
>>>>
>>>> Maybe I would see the reason if I could find the time to look at the
>>>> migration patches first, but unfortunately I don't have this time at the
>>>> moment.
>>>>
>>>> My point is just that generally we want to have a correctly working qemu
>>>> after every single patch, and even more importantly after every series.
>>>> As the migration series is separate from this, I don't think it's a good
>>>> excuse for doing worse than we could easily do here.
>>>>
>>>> Kevin
>>> With open/close all is already ok - bitmaps will not be saved because
>>> of BDRV_O_INACTIVE  and will not be loaded because of IN_USE.
> If the contents of so-called persistent bitmaps is lost across
> migration and stale after bdrv_invalidate_cache, that's not "all ok" in
> my book. It's buggy.
>
>> in any case load/store happens outside of VM downtime.
>> The target is running at the moment of close on source,
>> the source is running at the moment of open on target.
> Which makes the operation in open/close meaningless: open reads data
> which may still by changed, and close cannot write to the image any more
> because ownership is already given up.
>
> Anyway, all of this isn't really productive. Please, can't you just
> answer that simple question that I asked you above: What problems would
> you get if you used invalidate_cache/inactivate instead of open/close
> for triggering these actions?
>
> If you can bring up some "this would break X", "it would be very hard to
> implement Y correctly then" or "in scenario Z, this would have unwanted
> effects", then we can figure out what to do. But not putting things in
> the proper place just because you don't feel like it isn't a very strong
> argument.
>
> Kevin
This will increase the downtime ~0.5-1 second for migrated VM
or will require very intrusive interface from migration code to
here to avoid bitmap save for inactivation on that path.

No other arguments so far.

Den
Vladimir Sementsov-Ogievskiy Feb. 20, 2017, 12:06 p.m. UTC | #16
20.02.2017 14:21, Denis V. Lunev wrote:
> On 02/20/2017 12:15 PM, Kevin Wolf wrote:
>> Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben:
>>> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> 17.02.2017 17:24, Kevin Wolf wrote:
>>>>> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
>>>>>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
>>>>>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
>>>>>>>> But for sure this is bad from the downtime point of view.
>>>>>>>> On migrate you will have to write to the image and re-read
>>>>>>>> it again on the target. This would be very slow. This will
>>>>>>>> not help for the migration with non-shared disk too.
>>>>>>>>
>>>>>>>> That is why we have specifically worked in a migration,
>>>>>>>> which for a good does not influence downtime at all now.
>>>>>>>>
>>>>>>>> With a write we are issuing several write requests + sync.
>>>>>>>> Our measurements shows that bdrv_drain could take around
>>>>>>>> a second on an averagely loaded conventional system, which
>>>>>>>> seems unacceptable addition to me.
>>>>>>> I'm not arguing against optimising migration, I fully agree with
>>>>>>> you. I
>>>>>>> just think that we should start with a correct if slow base version
>>>>>>> and
>>>>>>> then add optimisation to that, instead of starting with a broken base
>>>>>>> version and adding to that.
>>>>>>>
>>>>>>> Look, whether you do the expensive I/O on open/close and make that a
>>>>>>> slow operation or whether you do it on invalidate_cache/inactivate
>>>>>>> doesn't really make a difference in term of slowness because in
>>>>>>> general
>>>>>>> both operations are called exactly once. But it does make a difference
>>>>>>> in terms of correctness.
>>>>>>>
>>>>>>> Once you do the optimisation, of course, you'll skip writing those
>>>>>>> bitmaps that you transfer using a different channel, no matter whether
>>>>>>> you skip it in bdrv_close() or in bdrv_inactivate().
>>>>>>>
>>>>>>> Kevin
>>>>>> I do not understand this point as in order to optimize this
>>>>>> we will have to create specific code path or option from
>>>>>> the migration code and keep this as an ugly kludge forever.
>>>>> The point that I don't understand is why it makes any difference for the
>>>>> follow-up migration series whether the writeout is in bdrv_close() or
>>>>> bdrv_inactivate(). I don't really see the difference between the two
>>>>> from a migration POV; both need to be skipped if we transfer the bitmap
>>>>> using a different channel.
>>>>>
>>>>> Maybe I would see the reason if I could find the time to look at the
>>>>> migration patches first, but unfortunately I don't have this time at the
>>>>> moment.
>>>>>
>>>>> My point is just that generally we want to have a correctly working qemu
>>>>> after every single patch, and even more importantly after every series.
>>>>> As the migration series is separate from this, I don't think it's a good
>>>>> excuse for doing worse than we could easily do here.
>>>>>
>>>>> Kevin
>>>> With open/close all is already ok - bitmaps will not be saved because
>>>> of BDRV_O_INACTIVE  and will not be loaded because of IN_USE.
>> If the contents of so-called persistent bitmaps is lost across
>> migration and stale after bdrv_invalidate_cache, that's not "all ok" in
>> my book. It's buggy.
>>
>>> in any case load/store happens outside of VM downtime.
>>> The target is running at the moment of close on source,
>>> the source is running at the moment of open on target.
>> Which makes the operation in open/close meaningless: open reads data
>> which may still by changed, and close cannot write to the image any more
>> because ownership is already given up.
>>
>> Anyway, all of this isn't really productive. Please, can't you just
>> answer that simple question that I asked you above: What problems would
>> you get if you used invalidate_cache/inactivate instead of open/close
>> for triggering these actions?
>>
>> If you can bring up some "this would break X", "it would be very hard to
>> implement Y correctly then" or "in scenario Z, this would have unwanted
>> effects", then we can figure out what to do. But not putting things in
>> the proper place just because you don't feel like it isn't a very strong
>> argument.
>>
>> Kevin
> This will increase the downtime ~0.5-1 second for migrated VM
> or will require very intrusive interface from migration code to
> here to avoid bitmap save for inactivation on that path.
>
> No other arguments so far.
>
> Den

Ok. Finally, I'll make v16 with load/store in inactivate/invalidate. It 
will not be too hard to change this back if really needed.
Kevin Wolf Feb. 20, 2017, 12:23 p.m. UTC | #17
Am 20.02.2017 um 12:21 hat Denis V. Lunev geschrieben:
> On 02/20/2017 12:15 PM, Kevin Wolf wrote:
> > Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben:
> >> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote:
> >>> 17.02.2017 17:24, Kevin Wolf wrote:
> >>>> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
> >>>>> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
> >>>>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> >>>>>>> But for sure this is bad from the downtime point of view.
> >>>>>>> On migrate you will have to write to the image and re-read
> >>>>>>> it again on the target. This would be very slow. This will
> >>>>>>> not help for the migration with non-shared disk too.
> >>>>>>>
> >>>>>>> That is why we have specifically worked in a migration,
> >>>>>>> which for a good does not influence downtime at all now.
> >>>>>>>
> >>>>>>> With a write we are issuing several write requests + sync.
> >>>>>>> Our measurements shows that bdrv_drain could take around
> >>>>>>> a second on an averagely loaded conventional system, which
> >>>>>>> seems unacceptable addition to me.
> >>>>>> I'm not arguing against optimising migration, I fully agree with
> >>>>>> you. I
> >>>>>> just think that we should start with a correct if slow base version
> >>>>>> and
> >>>>>> then add optimisation to that, instead of starting with a broken base
> >>>>>> version and adding to that.
> >>>>>>
> >>>>>> Look, whether you do the expensive I/O on open/close and make that a
> >>>>>> slow operation or whether you do it on invalidate_cache/inactivate
> >>>>>> doesn't really make a difference in term of slowness because in
> >>>>>> general
> >>>>>> both operations are called exactly once. But it does make a difference
> >>>>>> in terms of correctness.
> >>>>>>
> >>>>>> Once you do the optimisation, of course, you'll skip writing those
> >>>>>> bitmaps that you transfer using a different channel, no matter whether
> >>>>>> you skip it in bdrv_close() or in bdrv_inactivate().
> >>>>>>
> >>>>>> Kevin
> >>>>> I do not understand this point as in order to optimize this
> >>>>> we will have to create specific code path or option from
> >>>>> the migration code and keep this as an ugly kludge forever.
> >>>> The point that I don't understand is why it makes any difference for the
> >>>> follow-up migration series whether the writeout is in bdrv_close() or
> >>>> bdrv_inactivate(). I don't really see the difference between the two
> >>>> from a migration POV; both need to be skipped if we transfer the bitmap
> >>>> using a different channel.
> >>>>
> >>>> Maybe I would see the reason if I could find the time to look at the
> >>>> migration patches first, but unfortunately I don't have this time at the
> >>>> moment.
> >>>>
> >>>> My point is just that generally we want to have a correctly working qemu
> >>>> after every single patch, and even more importantly after every series.
> >>>> As the migration series is separate from this, I don't think it's a good
> >>>> excuse for doing worse than we could easily do here.
> >>>>
> >>>> Kevin
> >>> With open/close all is already ok - bitmaps will not be saved because
> >>> of BDRV_O_INACTIVE  and will not be loaded because of IN_USE.
> > If the contents of so-called persistent bitmaps is lost across
> > migration and stale after bdrv_invalidate_cache, that's not "all ok" in
> > my book. It's buggy.
> >
> >> in any case load/store happens outside of VM downtime.
> >> The target is running at the moment of close on source,
> >> the source is running at the moment of open on target.
> > Which makes the operation in open/close meaningless: open reads data
> > which may still by changed, and close cannot write to the image any more
> > because ownership is already given up.
> >
> > Anyway, all of this isn't really productive. Please, can't you just
> > answer that simple question that I asked you above: What problems would
> > you get if you used invalidate_cache/inactivate instead of open/close
> > for triggering these actions?
> >
> > If you can bring up some "this would break X", "it would be very hard to
> > implement Y correctly then" or "in scenario Z, this would have unwanted
> > effects", then we can figure out what to do. But not putting things in
> > the proper place just because you don't feel like it isn't a very strong
> > argument.
> >
> > Kevin
> This will increase the downtime ~0.5-1 second for migrated VM

True before the optimisation is applied. But correctness trumps
speed.

> or will require very intrusive interface from migration code to
> here to avoid bitmap save for inactivation on that path.

You already have a (very implicit) interface from migration code to
bdrv_open/close. It consists of what Vladimir mentioned above: The
BDRV_O_INACTIVE and IN_USE flags, which happen to be in the right state
during migration so that they can be abused for controlling the
handover.

If moving the code means that we end up getting a more explicit
interface, that's actually a plus for me.

Kevin
diff mbox

Patch

diff --git a/block.c b/block.c
index a0346c8..56f030c 100644
--- a/block.c
+++ b/block.c
@@ -1141,6 +1141,13 @@  static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
     assert(bdrv_min_mem_align(bs) != 0);
     assert(is_power_of_2(bs->bl.request_alignment));
 
+    bdrv_load_autoloading_dirty_bitmaps(bs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto free_and_fail;
+    }
+
     qemu_opts_del(opts);
     return 0;
 
@@ -4093,3 +4100,10 @@  void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp)
 
     parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
 }
+
+void bdrv_load_autoloading_dirty_bitmaps(BlockDriverState *bs, Error **errp)
+{
+    if (bs->drv && bs->drv->bdrv_load_autoloading_dirty_bitmaps) {
+        bs->drv->bdrv_load_autoloading_dirty_bitmaps(bs, errp);
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 4e81f20..154ac7f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -551,4 +551,6 @@  void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
                     Error **errp);
 void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
 
+void bdrv_load_autoloading_dirty_bitmaps(BlockDriverState *bs, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2d92d7e..6b2b50c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -320,6 +320,9 @@  struct BlockDriver {
     void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
                            Error **errp);
 
+    void (*bdrv_load_autoloading_dirty_bitmaps)(BlockDriverState *bs,
+                                                Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };