diff mbox series

bugfix:migrate with block-dirty-bitmap (disk size is big enough) can't be finished

Message ID 20220910063542.3036319-1-liuhaiwei9699@126.com
State New
Headers show
Series bugfix:migrate with block-dirty-bitmap (disk size is big enough) can't be finished | expand

Commit Message

liuhaiwei Sept. 10, 2022, 6:35 a.m. UTC
From: liuhaiwei <liuhaiwei@inspur.com>

bug description as  https://gitlab.com/qemu-project/qemu/-/issues/1203
Usually,we use the precopy or postcopy mode to migrate block dirty bitmap.
but if block-dirty-bitmap size more than threshold size,we cannot entry the migration_completion in migration_iteration_run function
To solve this problem, we can setting  the pending size to a fake value(threshold-1 or 0) to tell  migration_iteration_run function to entry the migration_completion,if pending size > threshold size

Signed-off-by: liuhaiwei <liuhaiwei9699@126.com>
Signed-off-by: liuhaiwei <liuhaiwei@inspur.com>
---
 migration/block-dirty-bitmap.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Vladimir Sementsov-Ogievskiy Sept. 10, 2022, 10:18 a.m. UTC | #1
On 9/10/22 09:35, liuhaiwei wrote:
> From: liuhaiwei <liuhaiwei@inspur.com>
> 
> bug description as  https://gitlab.com/qemu-project/qemu/-/issues/1203
> Usually,we use the precopy or postcopy mode to migrate block dirty bitmap.
> but if block-dirty-bitmap size more than threshold size,we cannot entry the migration_completion in migration_iteration_run function
> To solve this problem, we can setting  the pending size to a fake value(threshold-1 or 0) to tell  migration_iteration_run function to entry the migration_completion,if pending size > threshold size
> 


Actually, bitmaps migrate in postcopy. So, you should start postcopy for it to work (qmp command migrate-start-postcopy). This command simply set the boolean variable, so that in migration_iteration_run() we'll move to postcopy when needed. So, you can start this command immediately after migrate command, or even before it, but after setting the "dirty-bitmaps" capability.

Fake pending is a wrong thing to do, it means that you will make downtime to be larger than expected.
liuhaiwei Sept. 15, 2022, 1:28 a.m. UTC | #2
Hi ,Vladimir
sometimes ,post-copy mode is not the best choice. For instance, Supposing migrate process will take ten minutes,but network may be interruptted In this process .
If it does happenthe , memory data of VM will be splitted into two parts, and will not be rollback.This is a bad situation


so  migrate-start-postcopy will not be setted in conservative scenario. In this case, the migration with block dirty bitmap may not be finished.




I think  migration of block dirty bitmap should not dependent on post-copy or pre-copy mode.




Best regards
At 2022-09-10 18:18:04, "Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru> wrote:
>On 9/10/22 09:35, liuhaiwei wrote:
>> From: liuhaiwei <liuhaiwei@inspur.com>
>> 
>> bug description as  https://gitlab.com/qemu-project/qemu/-/issues/1203
>> Usually,we use the precopy or postcopy mode to migrate block dirty bitmap.
>> but if block-dirty-bitmap size more than threshold size,we cannot entry the migration_completion in migration_iteration_run function
>> To solve this problem, we can setting  the pending size to a fake value(threshold-1 or 0) to tell  migration_iteration_run function to entry the migration_completion,if pending size > threshold size
>> 
>
>
>Actually, bitmaps migrate in postcopy. So, you should start postcopy for it to work (qmp command migrate-start-postcopy). This command simply set the boolean variable, so that in migration_iteration_run() we'll move to postcopy when needed. So, you can start this command immediately after migrate command, or even before it, but after setting the "dirty-bitmaps" capability.
>
>Fake pending is a wrong thing to do, it means that you will make downtime to be larger than expected.
>
>-- 
>Best regards,
>Vladimir
Vladimir Sementsov-Ogievskiy Sept. 15, 2022, 11:45 a.m. UTC | #3
Post-copy migration of dirty-bitmaps doesn't mean post-copy migration of RAM.

To turn on post-copy migration of RAM, you should enable postcopy-ram capability. If you don't enable it, RAM is migrated in pre-copy (i.e. before starting VM on target).

migrate-start-postcopy command doesn't enable postcopy-ram capability automatically, so don't be afraid of it.

On 9/15/22 04:28, liuhaiwei9699 wrote:
> Hi ,Vladimir
> sometimes ,post-copy mode is not the best choice. For instance, Supposing migrate process will take ten minutes,but network may be interruptted In this process .
> If it does happenthe , memory data of VM will be splitted into two parts, and will not be rollback.This is a bad situation

If you don't enable postcopy-ram capability, memory data is already migrated _before_ starting VM on destination. So, the only thing that we may lose in worst case is dirty bitmap itself, not RAM.

> 
> so  migrate-start-postcopy will not be setted in conservative scenario. In this case, the migration with block dirty bitmap may not be finished.

Again, migrate-start-postcopy command don't enable postcopy of RAM. It only allow to enter generic postcopy mode. If dirty-bitmaps capability is enabled and postcopy-ram is not, the only thing that can be migrated in postcopy is dirty bitmap.

> 
> 
> I think  migration of block dirty bitmap should not dependent on post-copy or pre-copy mode.
> 

But dirty bitmaps migration is realized as postcopy in Qemu.

We can't migrate bitmaps during downtime in general, as bitmaps may be large and connection slow (your case). So, we have to migrate them either in pre-copy or in post-copy mode. Historically, the second method was chosen.

> 
> At 2022-09-10 18:18:04, "Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru> wrote:
>>On 9/10/22 09:35, liuhaiwei wrote:
>>> From: liuhaiwei <liuhaiwei@inspur.com>
>>> 
>>> bug description as  https://gitlab.com/qemu-project/qemu/-/issues/1203
>>> Usually,we use the precopy or postcopy mode to migrate block dirty bitmap.
>>> but if block-dirty-bitmap size more than threshold size,we cannot entry the migration_completion in migration_iteration_run function
>>> To solve this problem, we can setting  the pending size to a fake value(threshold-1 or 0) to tell  migration_iteration_run function to entry the migration_completion,if pending size > threshold size
>>> 
>>
>>
>>Actually, bitmaps migrate in postcopy. So, you should start postcopy for it to work (qmp command migrate-start-postcopy). This command simply set the boolean variable, so that in migration_iteration_run() we'll move to postcopy when needed. So, you can start this command immediately after migrate command, or even before it, but after setting the "dirty-bitmaps" capability.
>>
>>Fake pending is a wrong thing to do, it means that you will make downtime to be larger than expected.
>>
>>-- 
>>Best regards,
>>Vladimir
diff mbox series

Patch

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9aba7d9c22..5cbf365f46 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -782,6 +782,10 @@  static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
     }
 
     qemu_mutex_unlock_iothread();
+     /*we set the fake pending size  when the dirty bitmap size more than max_size */
+    if(pending >= max_size && max_size != 0){
+        pending = max_size - 1;
+    }
 
     trace_dirty_bitmap_save_pending(pending, max_size);