Message ID | 1487153430-17260-9-git-send-email-vsementsov@virtuozzo.com |
---|---|
State | New |
Headers | show |
Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > Auto loading bitmaps are bitmaps stored in the disk image, which should > be loaded when the image is opened and become BdrvDirtyBitmaps for the > corresponding drive. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > Reviewed-by: John Snow <jsnow@redhat.com> > Reviewed-by: Max Reitz <mreitz@redhat.com> Why do we need a new BlockDriver callback and special code for it in bdrv_open_common()? The callback is only ever called immediately after .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally in their .bdrv_open implementation? Even more so because qcow2 is the only driver that supports this callback. Kevin
Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > > Auto loading bitmaps are bitmaps stored in the disk image, which should > > be loaded when the image is opened and become BdrvDirtyBitmaps for the > > corresponding drive. > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > Reviewed-by: John Snow <jsnow@redhat.com> > > Reviewed-by: Max Reitz <mreitz@redhat.com> > > Why do we need a new BlockDriver callback and special code for it in > bdrv_open_common()? The callback is only ever called immediately after > .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally > in their .bdrv_open implementation? Even more so because qcow2 is the > only driver that supports this callback. Actually, don't we have to call this in qcow2_invalidate_cache()? Currently, I think, after a migration, the autoload bitmaps aren't loaded. By moving the qcow2_load_autoloading_dirty_bitmaps() call to qcow2_open(), this would be fixed. Kevin
16.02.2017 14:49, Kevin Wolf wrote: > Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: >> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> Auto loading bitmaps are bitmaps stored in the disk image, which should >>> be loaded when the image is opened and become BdrvDirtyBitmaps for the >>> corresponding drive. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>> Reviewed-by: John Snow <jsnow@redhat.com> >>> Reviewed-by: Max Reitz <mreitz@redhat.com> >> Why do we need a new BlockDriver callback and special code for it in >> bdrv_open_common()? The callback is only ever called immediately after >> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally >> in their .bdrv_open implementation? Even more so because qcow2 is the >> only driver that supports this callback. > Actually, don't we have to call this in qcow2_invalidate_cache()? > Currently, I think, after a migration, the autoload bitmaps aren't > loaded. > > By moving the qcow2_load_autoloading_dirty_bitmaps() call to > qcow2_open(), this would be fixed. > > Kevin Bitmap should not be reloaded on any intermediate qcow2-open's, reopens, etc. It should be loaded once, on bdrv_open, to not create extra collisions (between in-memory bitmap and it's stored version). That was the idea. For bitmaps migration there are separate series, we shouldn't load bitmap from file on migration, as it's version in the file is outdated. ,
Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > 16.02.2017 14:49, Kevin Wolf wrote: > >Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > >>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>Auto loading bitmaps are bitmaps stored in the disk image, which should > >>>be loaded when the image is opened and become BdrvDirtyBitmaps for the > >>>corresponding drive. > >>> > >>>Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > >>>Reviewed-by: John Snow <jsnow@redhat.com> > >>>Reviewed-by: Max Reitz <mreitz@redhat.com> > >>Why do we need a new BlockDriver callback and special code for it in > >>bdrv_open_common()? The callback is only ever called immediately after > >>.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally > >>in their .bdrv_open implementation? Even more so because qcow2 is the > >>only driver that supports this callback. > >Actually, don't we have to call this in qcow2_invalidate_cache()? > >Currently, I think, after a migration, the autoload bitmaps aren't > >loaded. > > > >By moving the qcow2_load_autoloading_dirty_bitmaps() call to > >qcow2_open(), this would be fixed. > > > >Kevin > > Bitmap should not be reloaded on any intermediate qcow2-open's, > reopens, etc. It should be loaded once, on bdrv_open, to not create > extra collisions (between in-memory bitmap and it's stored version). > That was the idea. > > For bitmaps migration there are separate series, we shouldn't load > bitmap from file on migration, as it's version in the file is > outdated. That's not what your series is doing, though. It loads the bitmaps when migration starts and doesn't reload then when migration completes, even though they are stale. Migration with shared storage would just work without an extra series if you did these things in the correct places. As a reminder, this is how migration with shared storage works (or should work with your series): 1. Start destination qemu instance. This calls bdrv_open() with BDRV_O_INACTIVE. We can read in some metadata, though we don't need much more than the image size at this point. Writing to the image is still impossible. 2. Start migration on the source, while the VM is still writing to the image, rendering the cached metadata from step 1 stale. 3. Migration completes: a. Stop the VM b. Inactivate all images in the source qemu. This is where all metadata needs to be written back to the image file, including bitmaps. No writes to the image are possible after this point because BDRV_O_INACTIVE is set. c. Invalidate the caches in the destination qemu, i.e. reload everything from the file that could have changed since step 1, including bitmaps. BDRV_O_INACTIVE is cleared, making the image ready for writes. d. Resume the VM on the destination 4. Exit the source qemu process, which involves bdrv_close(). Note that at this point, no writing to the image file is possible any more, it's the destination qemu process that own the image file now. Your series loads and stores bitmaps in steps 1 and 4. This means that they are stale on the destination when migration completes, and that bdrv_close() wants to write to an image file that it doesn't own any more, which will cause an assertion failure. If you instead move things to steps 3b and 3c, it will just work. Kevin
17.02.2017 15:09, Kevin Wolf wrote: > Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 16.02.2017 14:49, Kevin Wolf wrote: >>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: >>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should >>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the >>>>> corresponding drive. >>>>> >>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>>> Reviewed-by: John Snow <jsnow@redhat.com> >>>>> Reviewed-by: Max Reitz <mreitz@redhat.com> >>>> Why do we need a new BlockDriver callback and special code for it in >>>> bdrv_open_common()? The callback is only ever called immediately after >>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally >>>> in their .bdrv_open implementation? Even more so because qcow2 is the >>>> only driver that supports this callback. >>> Actually, don't we have to call this in qcow2_invalidate_cache()? >>> Currently, I think, after a migration, the autoload bitmaps aren't >>> loaded. >>> >>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to >>> qcow2_open(), this would be fixed. >>> >>> Kevin >> Bitmap should not be reloaded on any intermediate qcow2-open's, >> reopens, etc. It should be loaded once, on bdrv_open, to not create >> extra collisions (between in-memory bitmap and it's stored version). >> That was the idea. >> >> For bitmaps migration there are separate series, we shouldn't load >> bitmap from file on migration, as it's version in the file is >> outdated. > That's not what your series is doing, though. It loads the bitmaps when Actually, they will not be loaded as they will have IN_USE flag. > migration starts and doesn't reload then when migration completes, even > though they are stale. Migration with shared storage would just work > without an extra series if you did these things in the correct places. > > As a reminder, this is how migration with shared storage works (or > should work with your series): > > 1. Start destination qemu instance. This calls bdrv_open() with > BDRV_O_INACTIVE. We can read in some metadata, though we don't need > much more than the image size at this point. Writing to the image is > still impossible. > > 2. Start migration on the source, while the VM is still writing to the > image, rendering the cached metadata from step 1 stale. > > 3. Migration completes: > > a. Stop the VM > > b. Inactivate all images in the source qemu. This is where all > metadata needs to be written back to the image file, including > bitmaps. No writes to the image are possible after this point > because BDRV_O_INACTIVE is set. > > c. Invalidate the caches in the destination qemu, i.e. reload > everything from the file that could have changed since step 1, > including bitmaps. BDRV_O_INACTIVE is cleared, making the image > ready for writes. > > d. Resume the VM on the destination > > 4. Exit the source qemu process, which involves bdrv_close(). Note that > at this point, no writing to the image file is possible any more, > it's the destination qemu process that own the image file now. > > Your series loads and stores bitmaps in steps 1 and 4. This means that Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and it is checked), nothing is stored. > they are stale on the destination when migration completes, and that > bdrv_close() wants to write to an image file that it doesn't own any > more, which will cause an assertion failure. If you instead move things > to steps 3b and 3c, it will just work. Hmm, I understand the idea.. But this will interfere with postcopy bitmap migration. So if we really need this, there should be some additional control flags or capabilities.. The problem of your approach is that bitmap actually migrated in the short state when source and destination are stopped, it may take time, as bitmaps may be large. > > Kevin
Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: > 17.02.2017 15:09, Kevin Wolf wrote: > >Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>16.02.2017 14:49, Kevin Wolf wrote: > >>>Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > >>>>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>>>Auto loading bitmaps are bitmaps stored in the disk image, which should > >>>>>be loaded when the image is opened and become BdrvDirtyBitmaps for the > >>>>>corresponding drive. > >>>>> > >>>>>Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > >>>>>Reviewed-by: John Snow <jsnow@redhat.com> > >>>>>Reviewed-by: Max Reitz <mreitz@redhat.com> > >>>>Why do we need a new BlockDriver callback and special code for it in > >>>>bdrv_open_common()? The callback is only ever called immediately after > >>>>.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally > >>>>in their .bdrv_open implementation? Even more so because qcow2 is the > >>>>only driver that supports this callback. > >>>Actually, don't we have to call this in qcow2_invalidate_cache()? > >>>Currently, I think, after a migration, the autoload bitmaps aren't > >>>loaded. > >>> > >>>By moving the qcow2_load_autoloading_dirty_bitmaps() call to > >>>qcow2_open(), this would be fixed. > >>> > >>>Kevin > >>Bitmap should not be reloaded on any intermediate qcow2-open's, > >>reopens, etc. It should be loaded once, on bdrv_open, to not create > >>extra collisions (between in-memory bitmap and it's stored version). > >>That was the idea. > >> > >>For bitmaps migration there are separate series, we shouldn't load > >>bitmap from file on migration, as it's version in the file is > >>outdated. > >That's not what your series is doing, though. It loads the bitmaps when > > Actually, they will not be loaded as they will have IN_USE flag. > > >migration starts and doesn't reload then when migration completes, even > >though they are stale. Migration with shared storage would just work > >without an extra series if you did these things in the correct places. > > > >As a reminder, this is how migration with shared storage works (or > >should work with your series): > > > >1. Start destination qemu instance. This calls bdrv_open() with > > BDRV_O_INACTIVE. We can read in some metadata, though we don't need > > much more than the image size at this point. Writing to the image is > > still impossible. > > > >2. Start migration on the source, while the VM is still writing to the > > image, rendering the cached metadata from step 1 stale. > > > >3. Migration completes: > > > > a. Stop the VM > > > > b. Inactivate all images in the source qemu. This is where all > > metadata needs to be written back to the image file, including > > bitmaps. No writes to the image are possible after this point > > because BDRV_O_INACTIVE is set. > > > > c. Invalidate the caches in the destination qemu, i.e. reload > > everything from the file that could have changed since step 1, > > including bitmaps. BDRV_O_INACTIVE is cleared, making the image > > ready for writes. > > > > d. Resume the VM on the destination > > > >4. Exit the source qemu process, which involves bdrv_close(). Note that > > at this point, no writing to the image file is possible any more, > > it's the destination qemu process that own the image file now. > > > >Your series loads and stores bitmaps in steps 1 and 4. This means that > > Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and > it is checked), nothing is stored. > > >they are stale on the destination when migration completes, and that > >bdrv_close() wants to write to an image file that it doesn't own any > >more, which will cause an assertion failure. If you instead move things > >to steps 3b and 3c, it will just work. > > Hmm, I understand the idea.. But this will interfere with postcopy > bitmap migration. So if we really need this, there should be some > additional control flags or capabilities.. The problem of your > approach is that bitmap actually migrated in the short state when > source and destination are stopped, it may take time, as bitmaps may > be large. You can always add optimisations, but this is the basic lifecycle process of block devices in qemu, so it would be good to adhere to it. So far there are no other pieces of information that are ignored in bdrv_invalidate()/bdrv_inactivate() and instead only handled in bdrv_open()/bdrv_close(). It's a matter of consistency, too. And not having to add special cases for specific features in the generic bdrv_open()/close() paths is a big plus for me anyway. Kevin
On 02/17/2017 03:48 PM, Kevin Wolf wrote: > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 17.02.2017 15:09, Kevin Wolf wrote: >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>> 16.02.2017 14:49, Kevin Wolf wrote: >>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: >>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should >>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the >>>>>>> corresponding drive. >>>>>>> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>>>>> Reviewed-by: John Snow <jsnow@redhat.com> >>>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com> >>>>>> Why do we need a new BlockDriver callback and special code for it in >>>>>> bdrv_open_common()? The callback is only ever called immediately after >>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally >>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the >>>>>> only driver that supports this callback. >>>>> Actually, don't we have to call this in qcow2_invalidate_cache()? >>>>> Currently, I think, after a migration, the autoload bitmaps aren't >>>>> loaded. >>>>> >>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to >>>>> qcow2_open(), this would be fixed. >>>>> >>>>> Kevin >>>> Bitmap should not be reloaded on any intermediate qcow2-open's, >>>> reopens, etc. It should be loaded once, on bdrv_open, to not create >>>> extra collisions (between in-memory bitmap and it's stored version). >>>> That was the idea. >>>> >>>> For bitmaps migration there are separate series, we shouldn't load >>>> bitmap from file on migration, as it's version in the file is >>>> outdated. >>> That's not what your series is doing, though. It loads the bitmaps when >> Actually, they will not be loaded as they will have IN_USE flag. >> >>> migration starts and doesn't reload then when migration completes, even >>> though they are stale. Migration with shared storage would just work >>> without an extra series if you did these things in the correct places. >>> >>> As a reminder, this is how migration with shared storage works (or >>> should work with your series): >>> >>> 1. Start destination qemu instance. This calls bdrv_open() with >>> BDRV_O_INACTIVE. We can read in some metadata, though we don't need >>> much more than the image size at this point. Writing to the image is >>> still impossible. >>> >>> 2. Start migration on the source, while the VM is still writing to the >>> image, rendering the cached metadata from step 1 stale. >>> >>> 3. Migration completes: >>> >>> a. Stop the VM >>> >>> b. Inactivate all images in the source qemu. This is where all >>> metadata needs to be written back to the image file, including >>> bitmaps. No writes to the image are possible after this point >>> because BDRV_O_INACTIVE is set. >>> >>> c. Invalidate the caches in the destination qemu, i.e. reload >>> everything from the file that could have changed since step 1, >>> including bitmaps. BDRV_O_INACTIVE is cleared, making the image >>> ready for writes. >>> >>> d. Resume the VM on the destination >>> >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that >>> at this point, no writing to the image file is possible any more, >>> it's the destination qemu process that own the image file now. >>> >>> Your series loads and stores bitmaps in steps 1 and 4. This means that >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and >> it is checked), nothing is stored. >> >>> they are stale on the destination when migration completes, and that >>> bdrv_close() wants to write to an image file that it doesn't own any >>> more, which will cause an assertion failure. If you instead move things >>> to steps 3b and 3c, it will just work. >> Hmm, I understand the idea.. But this will interfere with postcopy >> bitmap migration. So if we really need this, there should be some >> additional control flags or capabilities.. The problem of your >> approach is that bitmap actually migrated in the short state when >> source and destination are stopped, it may take time, as bitmaps may >> be large. > You can always add optimisations, but this is the basic lifecycle > process of block devices in qemu, so it would be good to adhere to it. > So far there are no other pieces of information that are ignored in > bdrv_invalidate()/bdrv_inactivate() and instead only handled in > bdrv_open()/bdrv_close(). It's a matter of consistency, too. > > And not having to add special cases for specific features in the generic > bdrv_open()/close() paths is a big plus for me anyway. > > Kevin But for sure this is bad from the downtime point of view. On migrate you will have to write to the image and re-read it again on the target. This would be very slow. This will not help for the migration with non-shared disk too. That is why we have specifically worked in a migration, which for a good does not influence downtime at all now. With a write we are issuing several write requests + sync. Our measurements shows that bdrv_drain could take around a second on an averagely loaded conventional system, which seems unacceptable addition to me. Den
Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: > On 02/17/2017 03:48 PM, Kevin Wolf wrote: > > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: > >> 17.02.2017 15:09, Kevin Wolf wrote: > >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>> 16.02.2017 14:49, Kevin Wolf wrote: > >>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > >>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should > >>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the > >>>>>>> corresponding drive. > >>>>>>> > >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > >>>>>>> Reviewed-by: John Snow <jsnow@redhat.com> > >>>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com> > >>>>>> Why do we need a new BlockDriver callback and special code for it in > >>>>>> bdrv_open_common()? The callback is only ever called immediately after > >>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally > >>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the > >>>>>> only driver that supports this callback. > >>>>> Actually, don't we have to call this in qcow2_invalidate_cache()? > >>>>> Currently, I think, after a migration, the autoload bitmaps aren't > >>>>> loaded. > >>>>> > >>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to > >>>>> qcow2_open(), this would be fixed. > >>>>> > >>>>> Kevin > >>>> Bitmap should not be reloaded on any intermediate qcow2-open's, > >>>> reopens, etc. It should be loaded once, on bdrv_open, to not create > >>>> extra collisions (between in-memory bitmap and it's stored version). > >>>> That was the idea. > >>>> > >>>> For bitmaps migration there are separate series, we shouldn't load > >>>> bitmap from file on migration, as it's version in the file is > >>>> outdated. > >>> That's not what your series is doing, though. It loads the bitmaps when > >> Actually, they will not be loaded as they will have IN_USE flag. > >> > >>> migration starts and doesn't reload then when migration completes, even > >>> though they are stale. Migration with shared storage would just work > >>> without an extra series if you did these things in the correct places. > >>> > >>> As a reminder, this is how migration with shared storage works (or > >>> should work with your series): > >>> > >>> 1. Start destination qemu instance. This calls bdrv_open() with > >>> BDRV_O_INACTIVE. We can read in some metadata, though we don't need > >>> much more than the image size at this point. Writing to the image is > >>> still impossible. > >>> > >>> 2. Start migration on the source, while the VM is still writing to the > >>> image, rendering the cached metadata from step 1 stale. > >>> > >>> 3. Migration completes: > >>> > >>> a. Stop the VM > >>> > >>> b. Inactivate all images in the source qemu. This is where all > >>> metadata needs to be written back to the image file, including > >>> bitmaps. No writes to the image are possible after this point > >>> because BDRV_O_INACTIVE is set. > >>> > >>> c. Invalidate the caches in the destination qemu, i.e. reload > >>> everything from the file that could have changed since step 1, > >>> including bitmaps. BDRV_O_INACTIVE is cleared, making the image > >>> ready for writes. > >>> > >>> d. Resume the VM on the destination > >>> > >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that > >>> at this point, no writing to the image file is possible any more, > >>> it's the destination qemu process that own the image file now. > >>> > >>> Your series loads and stores bitmaps in steps 1 and 4. This means that > >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and > >> it is checked), nothing is stored. > >> > >>> they are stale on the destination when migration completes, and that > >>> bdrv_close() wants to write to an image file that it doesn't own any > >>> more, which will cause an assertion failure. If you instead move things > >>> to steps 3b and 3c, it will just work. > >> Hmm, I understand the idea.. But this will interfere with postcopy > >> bitmap migration. So if we really need this, there should be some > >> additional control flags or capabilities.. The problem of your > >> approach is that bitmap actually migrated in the short state when > >> source and destination are stopped, it may take time, as bitmaps may > >> be large. > > You can always add optimisations, but this is the basic lifecycle > > process of block devices in qemu, so it would be good to adhere to it. > > So far there are no other pieces of information that are ignored in > > bdrv_invalidate()/bdrv_inactivate() and instead only handled in > > bdrv_open()/bdrv_close(). It's a matter of consistency, too. > > > > And not having to add special cases for specific features in the generic > > bdrv_open()/close() paths is a big plus for me anyway. > > > > Kevin > But for sure this is bad from the downtime point of view. > On migrate you will have to write to the image and re-read > it again on the target. This would be very slow. This will > not help for the migration with non-shared disk too. > > That is why we have specifically worked in a migration, > which for a good does not influence downtime at all now. > > With a write we are issuing several write requests + sync. > Our measurements shows that bdrv_drain could take around > a second on an averagely loaded conventional system, which > seems unacceptable addition to me. I'm not arguing against optimising migration, I fully agree with you. I just think that we should start with a correct if slow base version and then add optimisation to that, instead of starting with a broken base version and adding to that. Look, whether you do the expensive I/O on open/close and make that a slow operation or whether you do it on invalidate_cache/inactivate doesn't really make a difference in term of slowness because in general both operations are called exactly once. But it does make a difference in terms of correctness. Once you do the optimisation, of course, you'll skip writing those bitmaps that you transfer using a different channel, no matter whether you skip it in bdrv_close() or in bdrv_inactivate(). Kevin
On 02/17/2017 04:34 PM, Kevin Wolf wrote: > Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: >> On 02/17/2017 03:48 PM, Kevin Wolf wrote: >>> Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>> 17.02.2017 15:09, Kevin Wolf wrote: >>>>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>>> 16.02.2017 14:49, Kevin Wolf wrote: >>>>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: >>>>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which should >>>>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the >>>>>>>>> corresponding drive. >>>>>>>>> >>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>>>>>>> Reviewed-by: John Snow <jsnow@redhat.com> >>>>>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com> >>>>>>>> Why do we need a new BlockDriver callback and special code for it in >>>>>>>> bdrv_open_common()? The callback is only ever called immediately after >>>>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally >>>>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the >>>>>>>> only driver that supports this callback. >>>>>>> Actually, don't we have to call this in qcow2_invalidate_cache()? >>>>>>> Currently, I think, after a migration, the autoload bitmaps aren't >>>>>>> loaded. >>>>>>> >>>>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to >>>>>>> qcow2_open(), this would be fixed. >>>>>>> >>>>>>> Kevin >>>>>> Bitmap should not be reloaded on any intermediate qcow2-open's, >>>>>> reopens, etc. It should be loaded once, on bdrv_open, to not create >>>>>> extra collisions (between in-memory bitmap and it's stored version). >>>>>> That was the idea. >>>>>> >>>>>> For bitmaps migration there are separate series, we shouldn't load >>>>>> bitmap from file on migration, as it's version in the file is >>>>>> outdated. >>>>> That's not what your series is doing, though. It loads the bitmaps when >>>> Actually, they will not be loaded as they will have IN_USE flag. >>>> >>>>> migration starts and doesn't reload then when migration completes, even >>>>> though they are stale. Migration with shared storage would just work >>>>> without an extra series if you did these things in the correct places. >>>>> >>>>> As a reminder, this is how migration with shared storage works (or >>>>> should work with your series): >>>>> >>>>> 1. Start destination qemu instance. This calls bdrv_open() with >>>>> BDRV_O_INACTIVE. We can read in some metadata, though we don't need >>>>> much more than the image size at this point. Writing to the image is >>>>> still impossible. >>>>> >>>>> 2. Start migration on the source, while the VM is still writing to the >>>>> image, rendering the cached metadata from step 1 stale. >>>>> >>>>> 3. Migration completes: >>>>> >>>>> a. Stop the VM >>>>> >>>>> b. Inactivate all images in the source qemu. This is where all >>>>> metadata needs to be written back to the image file, including >>>>> bitmaps. No writes to the image are possible after this point >>>>> because BDRV_O_INACTIVE is set. >>>>> >>>>> c. Invalidate the caches in the destination qemu, i.e. reload >>>>> everything from the file that could have changed since step 1, >>>>> including bitmaps. BDRV_O_INACTIVE is cleared, making the image >>>>> ready for writes. >>>>> >>>>> d. Resume the VM on the destination >>>>> >>>>> 4. Exit the source qemu process, which involves bdrv_close(). Note that >>>>> at this point, no writing to the image file is possible any more, >>>>> it's the destination qemu process that own the image file now. >>>>> >>>>> Your series loads and stores bitmaps in steps 1 and 4. This means that >>>> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and >>>> it is checked), nothing is stored. >>>> >>>>> they are stale on the destination when migration completes, and that >>>>> bdrv_close() wants to write to an image file that it doesn't own any >>>>> more, which will cause an assertion failure. If you instead move things >>>>> to steps 3b and 3c, it will just work. >>>> Hmm, I understand the idea.. But this will interfere with postcopy >>>> bitmap migration. So if we really need this, there should be some >>>> additional control flags or capabilities.. The problem of your >>>> approach is that bitmap actually migrated in the short state when >>>> source and destination are stopped, it may take time, as bitmaps may >>>> be large. >>> You can always add optimisations, but this is the basic lifecycle >>> process of block devices in qemu, so it would be good to adhere to it. >>> So far there are no other pieces of information that are ignored in >>> bdrv_invalidate()/bdrv_inactivate() and instead only handled in >>> bdrv_open()/bdrv_close(). It's a matter of consistency, too. >>> >>> And not having to add special cases for specific features in the generic >>> bdrv_open()/close() paths is a big plus for me anyway. >>> >>> Kevin >> But for sure this is bad from the downtime point of view. >> On migrate you will have to write to the image and re-read >> it again on the target. This would be very slow. This will >> not help for the migration with non-shared disk too. >> >> That is why we have specifically worked in a migration, >> which for a good does not influence downtime at all now. >> >> With a write we are issuing several write requests + sync. >> Our measurements shows that bdrv_drain could take around >> a second on an averagely loaded conventional system, which >> seems unacceptable addition to me. > I'm not arguing against optimising migration, I fully agree with you. I > just think that we should start with a correct if slow base version and > then add optimisation to that, instead of starting with a broken base > version and adding to that. > > Look, whether you do the expensive I/O on open/close and make that a > slow operation or whether you do it on invalidate_cache/inactivate > doesn't really make a difference in term of slowness because in general > both operations are called exactly once. But it does make a difference > in terms of correctness. > > Once you do the optimisation, of course, you'll skip writing those > bitmaps that you transfer using a different channel, no matter whether > you skip it in bdrv_close() or in bdrv_inactivate(). > > Kevin I do not understand this point as in order to optimize this we will have to create specific code path or option from the migration code and keep this as an ugly kludge forever. Den
Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: > On 02/17/2017 04:34 PM, Kevin Wolf wrote: > > Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: > >> But for sure this is bad from the downtime point of view. > >> On migrate you will have to write to the image and re-read > >> it again on the target. This would be very slow. This will > >> not help for the migration with non-shared disk too. > >> > >> That is why we have specifically worked in a migration, > >> which for a good does not influence downtime at all now. > >> > >> With a write we are issuing several write requests + sync. > >> Our measurements shows that bdrv_drain could take around > >> a second on an averagely loaded conventional system, which > >> seems unacceptable addition to me. > > I'm not arguing against optimising migration, I fully agree with you. I > > just think that we should start with a correct if slow base version and > > then add optimisation to that, instead of starting with a broken base > > version and adding to that. > > > > Look, whether you do the expensive I/O on open/close and make that a > > slow operation or whether you do it on invalidate_cache/inactivate > > doesn't really make a difference in term of slowness because in general > > both operations are called exactly once. But it does make a difference > > in terms of correctness. > > > > Once you do the optimisation, of course, you'll skip writing those > > bitmaps that you transfer using a different channel, no matter whether > > you skip it in bdrv_close() or in bdrv_inactivate(). > > > > Kevin > I do not understand this point as in order to optimize this > we will have to create specific code path or option from > the migration code and keep this as an ugly kludge forever. The point that I don't understand is why it makes any difference for the follow-up migration series whether the writeout is in bdrv_close() or bdrv_inactivate(). I don't really see the difference between the two from a migration POV; both need to be skipped if we transfer the bitmap using a different channel. Maybe I would see the reason if I could find the time to look at the migration patches first, but unfortunately I don't have this time at the moment. My point is just that generally we want to have a correctly working qemu after every single patch, and even more importantly after every series. As the migration series is separate from this, I don't think it's a good excuse for doing worse than we could easily do here. Kevin
17.02.2017 17:24, Kevin Wolf wrote: > Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: >> On 02/17/2017 04:34 PM, Kevin Wolf wrote: >>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: >>>> But for sure this is bad from the downtime point of view. >>>> On migrate you will have to write to the image and re-read >>>> it again on the target. This would be very slow. This will >>>> not help for the migration with non-shared disk too. >>>> >>>> That is why we have specifically worked in a migration, >>>> which for a good does not influence downtime at all now. >>>> >>>> With a write we are issuing several write requests + sync. >>>> Our measurements shows that bdrv_drain could take around >>>> a second on an averagely loaded conventional system, which >>>> seems unacceptable addition to me. >>> I'm not arguing against optimising migration, I fully agree with you. I >>> just think that we should start with a correct if slow base version and >>> then add optimisation to that, instead of starting with a broken base >>> version and adding to that. >>> >>> Look, whether you do the expensive I/O on open/close and make that a >>> slow operation or whether you do it on invalidate_cache/inactivate >>> doesn't really make a difference in term of slowness because in general >>> both operations are called exactly once. But it does make a difference >>> in terms of correctness. >>> >>> Once you do the optimisation, of course, you'll skip writing those >>> bitmaps that you transfer using a different channel, no matter whether >>> you skip it in bdrv_close() or in bdrv_inactivate(). >>> >>> Kevin >> I do not understand this point as in order to optimize this >> we will have to create specific code path or option from >> the migration code and keep this as an ugly kludge forever. > The point that I don't understand is why it makes any difference for the > follow-up migration series whether the writeout is in bdrv_close() or > bdrv_inactivate(). I don't really see the difference between the two > from a migration POV; both need to be skipped if we transfer the bitmap > using a different channel. > > Maybe I would see the reason if I could find the time to look at the > migration patches first, but unfortunately I don't have this time at the > moment. > > My point is just that generally we want to have a correctly working qemu > after every single patch, and even more importantly after every series. > As the migration series is separate from this, I don't think it's a good > excuse for doing worse than we could easily do here. > > Kevin With open/close all is already ok - bitmaps will not be saved because of BDRV_O_INACTIVE and will not be loaded because of IN_USE.
On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote: > 17.02.2017 17:24, Kevin Wolf wrote: >> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: >>> On 02/17/2017 04:34 PM, Kevin Wolf wrote: >>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: >>>>> But for sure this is bad from the downtime point of view. >>>>> On migrate you will have to write to the image and re-read >>>>> it again on the target. This would be very slow. This will >>>>> not help for the migration with non-shared disk too. >>>>> >>>>> That is why we have specifically worked in a migration, >>>>> which for a good does not influence downtime at all now. >>>>> >>>>> With a write we are issuing several write requests + sync. >>>>> Our measurements shows that bdrv_drain could take around >>>>> a second on an averagely loaded conventional system, which >>>>> seems unacceptable addition to me. >>>> I'm not arguing against optimising migration, I fully agree with >>>> you. I >>>> just think that we should start with a correct if slow base version >>>> and >>>> then add optimisation to that, instead of starting with a broken base >>>> version and adding to that. >>>> >>>> Look, whether you do the expensive I/O on open/close and make that a >>>> slow operation or whether you do it on invalidate_cache/inactivate >>>> doesn't really make a difference in term of slowness because in >>>> general >>>> both operations are called exactly once. But it does make a difference >>>> in terms of correctness. >>>> >>>> Once you do the optimisation, of course, you'll skip writing those >>>> bitmaps that you transfer using a different channel, no matter whether >>>> you skip it in bdrv_close() or in bdrv_inactivate(). >>>> >>>> Kevin >>> I do not understand this point as in order to optimize this >>> we will have to create specific code path or option from >>> the migration code and keep this as an ugly kludge forever. >> The point that I don't understand is why it makes any difference for the >> follow-up migration series whether the writeout is in bdrv_close() or >> bdrv_inactivate(). I don't really see the difference between the two >> from a migration POV; both need to be skipped if we transfer the bitmap >> using a different channel. >> >> Maybe I would see the reason if I could find the time to look at the >> migration patches first, but unfortunately I don't have this time at the >> moment. >> >> My point is just that generally we want to have a correctly working qemu >> after every single patch, and even more importantly after every series. >> As the migration series is separate from this, I don't think it's a good >> excuse for doing worse than we could easily do here. >> >> Kevin > > With open/close all is already ok - bitmaps will not be saved because > of BDRV_O_INACTIVE and will not be loaded because of IN_USE. > > in any case load/store happens outside of VM downtime. The target is running at the moment of close on source, the source is running at the moment of open on target. Den
17.02.2017 17:24, Kevin Wolf wrote: > Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: >> On 02/17/2017 04:34 PM, Kevin Wolf wrote: >>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: >>>> But for sure this is bad from the downtime point of view. >>>> On migrate you will have to write to the image and re-read >>>> it again on the target. This would be very slow. This will >>>> not help for the migration with non-shared disk too. >>>> >>>> That is why we have specifically worked in a migration, >>>> which for a good does not influence downtime at all now. >>>> >>>> With a write we are issuing several write requests + sync. >>>> Our measurements shows that bdrv_drain could take around >>>> a second on an averagely loaded conventional system, which >>>> seems unacceptable addition to me. >>> I'm not arguing against optimising migration, I fully agree with you. I >>> just think that we should start with a correct if slow base version and >>> then add optimisation to that, instead of starting with a broken base >>> version and adding to that. >>> >>> Look, whether you do the expensive I/O on open/close and make that a >>> slow operation or whether you do it on invalidate_cache/inactivate >>> doesn't really make a difference in term of slowness because in general >>> both operations are called exactly once. But it does make a difference >>> in terms of correctness. >>> >>> Once you do the optimisation, of course, you'll skip writing those >>> bitmaps that you transfer using a different channel, no matter whether >>> you skip it in bdrv_close() or in bdrv_inactivate(). >>> >>> Kevin >> I do not understand this point as in order to optimize this >> we will have to create specific code path or option from >> the migration code and keep this as an ugly kludge forever. > The point that I don't understand is why it makes any difference for the > follow-up migration series whether the writeout is in bdrv_close() or > bdrv_inactivate(). I don't really see the difference between the two > from a migration POV; both need to be skipped if we transfer the bitmap > using a different channel. > > Maybe I would see the reason if I could find the time to look at the > migration patches first, but unfortunately I don't have this time at the > moment. > > My point is just that generally we want to have a correctly working qemu > after every single patch, and even more importantly after every series. > As the migration series is separate from this, I don't think it's a good > excuse for doing worse than we could easily do here. > > Kevin > Bitmaps are not just qcow2 metadata. They belongs to generic block layer. And qcow2's ability to store bitmap is used to realize the common interface. After bitmap is loaded and turned into BdrvDirtyBitmap it not belongs to qcow2.. And qcow2 layer should not touch it on intermediate reopens.. We can introduce additional flags to control loading of autoloading bitmaps. Isn't it better to handle them in common place for all formats? (yes, for now only qcow2 can store bitmaps, but parallels format defines this ability too, in future we may have some way to load bitmaps for raw format.. Also, there is bitmap related extension for NBD protocol). bdrv_load_dirty_bitmap, defined in my series about NBD BLOCK_STATUS is definetly generic, as it should be called from qmp command.. (just note, not concrete argument)
Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben: > On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote: > > 17.02.2017 17:24, Kevin Wolf wrote: > >> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: > >>> On 02/17/2017 04:34 PM, Kevin Wolf wrote: > >>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: > >>>>> But for sure this is bad from the downtime point of view. > >>>>> On migrate you will have to write to the image and re-read > >>>>> it again on the target. This would be very slow. This will > >>>>> not help for the migration with non-shared disk too. > >>>>> > >>>>> That is why we have specifically worked in a migration, > >>>>> which for a good does not influence downtime at all now. > >>>>> > >>>>> With a write we are issuing several write requests + sync. > >>>>> Our measurements shows that bdrv_drain could take around > >>>>> a second on an averagely loaded conventional system, which > >>>>> seems unacceptable addition to me. > >>>> I'm not arguing against optimising migration, I fully agree with > >>>> you. I > >>>> just think that we should start with a correct if slow base version > >>>> and > >>>> then add optimisation to that, instead of starting with a broken base > >>>> version and adding to that. > >>>> > >>>> Look, whether you do the expensive I/O on open/close and make that a > >>>> slow operation or whether you do it on invalidate_cache/inactivate > >>>> doesn't really make a difference in term of slowness because in > >>>> general > >>>> both operations are called exactly once. But it does make a difference > >>>> in terms of correctness. > >>>> > >>>> Once you do the optimisation, of course, you'll skip writing those > >>>> bitmaps that you transfer using a different channel, no matter whether > >>>> you skip it in bdrv_close() or in bdrv_inactivate(). > >>>> > >>>> Kevin > >>> I do not understand this point as in order to optimize this > >>> we will have to create specific code path or option from > >>> the migration code and keep this as an ugly kludge forever. > >> The point that I don't understand is why it makes any difference for the > >> follow-up migration series whether the writeout is in bdrv_close() or > >> bdrv_inactivate(). I don't really see the difference between the two > >> from a migration POV; both need to be skipped if we transfer the bitmap > >> using a different channel. > >> > >> Maybe I would see the reason if I could find the time to look at the > >> migration patches first, but unfortunately I don't have this time at the > >> moment. > >> > >> My point is just that generally we want to have a correctly working qemu > >> after every single patch, and even more importantly after every series. > >> As the migration series is separate from this, I don't think it's a good > >> excuse for doing worse than we could easily do here. > >> > >> Kevin > > > > With open/close all is already ok - bitmaps will not be saved because > > of BDRV_O_INACTIVE and will not be loaded because of IN_USE. If the contents of so-called persistent bitmaps is lost across migration and stale after bdrv_invalidate_cache, that's not "all ok" in my book. It's buggy. > in any case load/store happens outside of VM downtime. > The target is running at the moment of close on source, > the source is running at the moment of open on target. Which makes the operation in open/close meaningless: open reads data which may still by changed, and close cannot write to the image any more because ownership is already given up. Anyway, all of this isn't really productive. Please, can't you just answer that simple question that I asked you above: What problems would you get if you used invalidate_cache/inactivate instead of open/close for triggering these actions? If you can bring up some "this would break X", "it would be very hard to implement Y correctly then" or "in scenario Z, this would have unwanted effects", then we can figure out what to do. But not putting things in the proper place just because you don't feel like it isn't a very strong argument. Kevin
On 02/20/2017 12:15 PM, Kevin Wolf wrote: > Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben: >> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote: >>> 17.02.2017 17:24, Kevin Wolf wrote: >>>> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: >>>>> On 02/17/2017 04:34 PM, Kevin Wolf wrote: >>>>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: >>>>>>> But for sure this is bad from the downtime point of view. >>>>>>> On migrate you will have to write to the image and re-read >>>>>>> it again on the target. This would be very slow. This will >>>>>>> not help for the migration with non-shared disk too. >>>>>>> >>>>>>> That is why we have specifically worked in a migration, >>>>>>> which for a good does not influence downtime at all now. >>>>>>> >>>>>>> With a write we are issuing several write requests + sync. >>>>>>> Our measurements shows that bdrv_drain could take around >>>>>>> a second on an averagely loaded conventional system, which >>>>>>> seems unacceptable addition to me. >>>>>> I'm not arguing against optimising migration, I fully agree with >>>>>> you. I >>>>>> just think that we should start with a correct if slow base version >>>>>> and >>>>>> then add optimisation to that, instead of starting with a broken base >>>>>> version and adding to that. >>>>>> >>>>>> Look, whether you do the expensive I/O on open/close and make that a >>>>>> slow operation or whether you do it on invalidate_cache/inactivate >>>>>> doesn't really make a difference in term of slowness because in >>>>>> general >>>>>> both operations are called exactly once. But it does make a difference >>>>>> in terms of correctness. >>>>>> >>>>>> Once you do the optimisation, of course, you'll skip writing those >>>>>> bitmaps that you transfer using a different channel, no matter whether >>>>>> you skip it in bdrv_close() or in bdrv_inactivate(). >>>>>> >>>>>> Kevin >>>>> I do not understand this point as in order to optimize this >>>>> we will have to create specific code path or option from >>>>> the migration code and keep this as an ugly kludge forever. >>>> The point that I don't understand is why it makes any difference for the >>>> follow-up migration series whether the writeout is in bdrv_close() or >>>> bdrv_inactivate(). I don't really see the difference between the two >>>> from a migration POV; both need to be skipped if we transfer the bitmap >>>> using a different channel. >>>> >>>> Maybe I would see the reason if I could find the time to look at the >>>> migration patches first, but unfortunately I don't have this time at the >>>> moment. >>>> >>>> My point is just that generally we want to have a correctly working qemu >>>> after every single patch, and even more importantly after every series. >>>> As the migration series is separate from this, I don't think it's a good >>>> excuse for doing worse than we could easily do here. >>>> >>>> Kevin >>> With open/close all is already ok - bitmaps will not be saved because >>> of BDRV_O_INACTIVE and will not be loaded because of IN_USE. > If the contents of so-called persistent bitmaps is lost across > migration and stale after bdrv_invalidate_cache, that's not "all ok" in > my book. It's buggy. > >> in any case load/store happens outside of VM downtime. >> The target is running at the moment of close on source, >> the source is running at the moment of open on target. > Which makes the operation in open/close meaningless: open reads data > which may still by changed, and close cannot write to the image any more > because ownership is already given up. > > Anyway, all of this isn't really productive. Please, can't you just > answer that simple question that I asked you above: What problems would > you get if you used invalidate_cache/inactivate instead of open/close > for triggering these actions? > > If you can bring up some "this would break X", "it would be very hard to > implement Y correctly then" or "in scenario Z, this would have unwanted > effects", then we can figure out what to do. But not putting things in > the proper place just because you don't feel like it isn't a very strong > argument. > > Kevin This will increase the downtime ~0.5-1 second for migrated VM or will require very intrusive interface from migration code to here to avoid bitmap save for inactivation on that path. No other arguments so far. Den
20.02.2017 14:21, Denis V. Lunev wrote: > On 02/20/2017 12:15 PM, Kevin Wolf wrote: >> Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben: >>> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote: >>>> 17.02.2017 17:24, Kevin Wolf wrote: >>>>> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: >>>>>> On 02/17/2017 04:34 PM, Kevin Wolf wrote: >>>>>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: >>>>>>>> But for sure this is bad from the downtime point of view. >>>>>>>> On migrate you will have to write to the image and re-read >>>>>>>> it again on the target. This would be very slow. This will >>>>>>>> not help for the migration with non-shared disk too. >>>>>>>> >>>>>>>> That is why we have specifically worked in a migration, >>>>>>>> which for a good does not influence downtime at all now. >>>>>>>> >>>>>>>> With a write we are issuing several write requests + sync. >>>>>>>> Our measurements shows that bdrv_drain could take around >>>>>>>> a second on an averagely loaded conventional system, which >>>>>>>> seems unacceptable addition to me. >>>>>>> I'm not arguing against optimising migration, I fully agree with >>>>>>> you. I >>>>>>> just think that we should start with a correct if slow base version >>>>>>> and >>>>>>> then add optimisation to that, instead of starting with a broken base >>>>>>> version and adding to that. >>>>>>> >>>>>>> Look, whether you do the expensive I/O on open/close and make that a >>>>>>> slow operation or whether you do it on invalidate_cache/inactivate >>>>>>> doesn't really make a difference in term of slowness because in >>>>>>> general >>>>>>> both operations are called exactly once. But it does make a difference >>>>>>> in terms of correctness. >>>>>>> >>>>>>> Once you do the optimisation, of course, you'll skip writing those >>>>>>> bitmaps that you transfer using a different channel, no matter whether >>>>>>> you skip it in bdrv_close() or in bdrv_inactivate(). >>>>>>> >>>>>>> Kevin >>>>>> I do not understand this point as in order to optimize this >>>>>> we will have to create specific code path or option from >>>>>> the migration code and keep this as an ugly kludge forever. >>>>> The point that I don't understand is why it makes any difference for the >>>>> follow-up migration series whether the writeout is in bdrv_close() or >>>>> bdrv_inactivate(). I don't really see the difference between the two >>>>> from a migration POV; both need to be skipped if we transfer the bitmap >>>>> using a different channel. >>>>> >>>>> Maybe I would see the reason if I could find the time to look at the >>>>> migration patches first, but unfortunately I don't have this time at the >>>>> moment. >>>>> >>>>> My point is just that generally we want to have a correctly working qemu >>>>> after every single patch, and even more importantly after every series. >>>>> As the migration series is separate from this, I don't think it's a good >>>>> excuse for doing worse than we could easily do here. >>>>> >>>>> Kevin >>>> With open/close all is already ok - bitmaps will not be saved because >>>> of BDRV_O_INACTIVE and will not be loaded because of IN_USE. >> If the contents of so-called persistent bitmaps is lost across >> migration and stale after bdrv_invalidate_cache, that's not "all ok" in >> my book. It's buggy. >> >>> in any case load/store happens outside of VM downtime. >>> The target is running at the moment of close on source, >>> the source is running at the moment of open on target. >> Which makes the operation in open/close meaningless: open reads data >> which may still by changed, and close cannot write to the image any more >> because ownership is already given up. >> >> Anyway, all of this isn't really productive. Please, can't you just >> answer that simple question that I asked you above: What problems would >> you get if you used invalidate_cache/inactivate instead of open/close >> for triggering these actions? >> >> If you can bring up some "this would break X", "it would be very hard to >> implement Y correctly then" or "in scenario Z, this would have unwanted >> effects", then we can figure out what to do. But not putting things in >> the proper place just because you don't feel like it isn't a very strong >> argument. >> >> Kevin > This will increase the downtime ~0.5-1 second for migrated VM > or will require very intrusive interface from migration code to > here to avoid bitmap save for inactivation on that path. > > No other arguments so far. > > Den Ok. Finally, I'll make v16 with load/store in inactivate/invalidate. It will not be too hard to change this back if really needed.
Am 20.02.2017 um 12:21 hat Denis V. Lunev geschrieben: > On 02/20/2017 12:15 PM, Kevin Wolf wrote: > > Am 18.02.2017 um 11:54 hat Denis V. Lunev geschrieben: > >> On 02/17/2017 03:54 PM, Vladimir Sementsov-Ogievskiy wrote: > >>> 17.02.2017 17:24, Kevin Wolf wrote: > >>>> Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: > >>>>> On 02/17/2017 04:34 PM, Kevin Wolf wrote: > >>>>>> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: > >>>>>>> But for sure this is bad from the downtime point of view. > >>>>>>> On migrate you will have to write to the image and re-read > >>>>>>> it again on the target. This would be very slow. This will > >>>>>>> not help for the migration with non-shared disk too. > >>>>>>> > >>>>>>> That is why we have specifically worked in a migration, > >>>>>>> which for a good does not influence downtime at all now. > >>>>>>> > >>>>>>> With a write we are issuing several write requests + sync. > >>>>>>> Our measurements shows that bdrv_drain could take around > >>>>>>> a second on an averagely loaded conventional system, which > >>>>>>> seems unacceptable addition to me. > >>>>>> I'm not arguing against optimising migration, I fully agree with > >>>>>> you. I > >>>>>> just think that we should start with a correct if slow base version > >>>>>> and > >>>>>> then add optimisation to that, instead of starting with a broken base > >>>>>> version and adding to that. > >>>>>> > >>>>>> Look, whether you do the expensive I/O on open/close and make that a > >>>>>> slow operation or whether you do it on invalidate_cache/inactivate > >>>>>> doesn't really make a difference in term of slowness because in > >>>>>> general > >>>>>> both operations are called exactly once. But it does make a difference > >>>>>> in terms of correctness. > >>>>>> > >>>>>> Once you do the optimisation, of course, you'll skip writing those > >>>>>> bitmaps that you transfer using a different channel, no matter whether > >>>>>> you skip it in bdrv_close() or in bdrv_inactivate(). > >>>>>> > >>>>>> Kevin > >>>>> I do not understand this point as in order to optimize this > >>>>> we will have to create specific code path or option from > >>>>> the migration code and keep this as an ugly kludge forever. > >>>> The point that I don't understand is why it makes any difference for the > >>>> follow-up migration series whether the writeout is in bdrv_close() or > >>>> bdrv_inactivate(). I don't really see the difference between the two > >>>> from a migration POV; both need to be skipped if we transfer the bitmap > >>>> using a different channel. > >>>> > >>>> Maybe I would see the reason if I could find the time to look at the > >>>> migration patches first, but unfortunately I don't have this time at the > >>>> moment. > >>>> > >>>> My point is just that generally we want to have a correctly working qemu > >>>> after every single patch, and even more importantly after every series. > >>>> As the migration series is separate from this, I don't think it's a good > >>>> excuse for doing worse than we could easily do here. > >>>> > >>>> Kevin > >>> With open/close all is already ok - bitmaps will not be saved because > >>> of BDRV_O_INACTIVE and will not be loaded because of IN_USE. > > If the contents of so-called persistent bitmaps is lost across > > migration and stale after bdrv_invalidate_cache, that's not "all ok" in > > my book. It's buggy. > > > >> in any case load/store happens outside of VM downtime. > >> The target is running at the moment of close on source, > >> the source is running at the moment of open on target. > > Which makes the operation in open/close meaningless: open reads data > > which may still by changed, and close cannot write to the image any more > > because ownership is already given up. > > > > Anyway, all of this isn't really productive. Please, can't you just > > answer that simple question that I asked you above: What problems would > > you get if you used invalidate_cache/inactivate instead of open/close > > for triggering these actions? > > > > If you can bring up some "this would break X", "it would be very hard to > > implement Y correctly then" or "in scenario Z, this would have unwanted > > effects", then we can figure out what to do. But not putting things in > > the proper place just because you don't feel like it isn't a very strong > > argument. > > > > Kevin > This will increase the downtime ~0.5-1 second for migrated VM True before the optimisation is applied. But correctness trumps speed. > or will require very intrusive interface from migration code to > here to avoid bitmap save for inactivation on that path. You already have a (very implicit) interface from migration code to bdrv_open/close. It consists of what Vladimir mentioned above: The BDRV_O_INACTIVE and IN_USE flags, which happen to be in the right state during migration so that they can be abused for controlling the handover. If moving the code means that we end up getting a more explicit interface, that's actually a plus for me. Kevin
diff --git a/block.c b/block.c index a0346c8..56f030c 100644 --- a/block.c +++ b/block.c @@ -1141,6 +1141,13 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, assert(bdrv_min_mem_align(bs) != 0); assert(is_power_of_2(bs->bl.request_alignment)); + bdrv_load_autoloading_dirty_bitmaps(bs, &local_err); + if (local_err) { + error_propagate(errp, local_err); + ret = -EINVAL; + goto free_and_fail; + } + qemu_opts_del(opts); return 0; @@ -4093,3 +4100,10 @@ void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp) parent_bs->drv->bdrv_del_child(parent_bs, child, errp); } + +void bdrv_load_autoloading_dirty_bitmaps(BlockDriverState *bs, Error **errp) +{ + if (bs->drv && bs->drv->bdrv_load_autoloading_dirty_bitmaps) { + bs->drv->bdrv_load_autoloading_dirty_bitmaps(bs, errp); + } +} diff --git a/include/block/block.h b/include/block/block.h index 4e81f20..154ac7f 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -551,4 +551,6 @@ void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child, Error **errp); void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp); +void bdrv_load_autoloading_dirty_bitmaps(BlockDriverState *bs, Error **errp); + #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index 2d92d7e..6b2b50c 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -320,6 +320,9 @@ struct BlockDriver { void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child, Error **errp); + void (*bdrv_load_autoloading_dirty_bitmaps)(BlockDriverState *bs, + Error **errp); + QLIST_ENTRY(BlockDriver) list; };