Message ID | 1434450415-11339-36-git-send-email-dgilbert@redhat.com |
---|---|
State | New |
Headers | show |
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Once we're in postcopy the source processors are stopped and memory > shouldn't change any more, so there's no need to look at the dirty > map. > > There are two notes to this: > 1) If we do resync and a page had changed then the page would get > sent again, which the destination wouldn't allow (since it might > have also modified the page) > 2) Before disabling this I'd seen very rare cases where a page had been > marked dirtied although the memory contents are apparently identical > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Juan Quintela <quintela@redhat.com> But, in what patch do we sync the migratioon bitmap after changing to postcopy? > --- > migration/ram.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 01a0ab4..5cff4d6 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > { > rcu_read_lock(); > > - migration_bitmap_sync(); > + if (!migration_postcopy_phase(migrate_get_current())) { > + migration_bitmap_sync(); > + } > > ram_control_before_iterate(f, RAM_CONTROL_FINISH); > > @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, > > remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE; > > - if (remaining_size < max_size) { > + if (!migration_postcopy_phase(migrate_get_current()) && > + remaining_size < max_size) { > qemu_mutex_lock_iothread(); > rcu_read_lock(); > migration_bitmap_sync();
* Juan Quintela (quintela@redhat.com) wrote: > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Once we're in postcopy the source processors are stopped and memory > > shouldn't change any more, so there's no need to look at the dirty > > map. > > > > There are two notes to this: > > 1) If we do resync and a page had changed then the page would get > > sent again, which the destination wouldn't allow (since it might > > have also modified the page) > > 2) Before disabling this I'd seen very rare cases where a page had been > > marked dirtied although the memory contents are apparently identical > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > > Reviewed-by: Juan Quintela <quintela@redhat.com> > > But, in what patch do we sync the migratioon bitmap after changing to postcopy? It's called one last time in ram_postcopy_send_discard_bitmap; see: v7-0025-Postcopy-Maintain-sentmap-and-calculate-discard.patch and that happens at the start of postcopy mode, when the CPU is stopped and won't be running on the source again. Dave > > > --- > > migration/ram.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/migration/ram.c b/migration/ram.c > > index 01a0ab4..5cff4d6 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > > { > > rcu_read_lock(); > > > > - migration_bitmap_sync(); > > + if (!migration_postcopy_phase(migrate_get_current())) { > > + migration_bitmap_sync(); > > + } > > > > ram_control_before_iterate(f, RAM_CONTROL_FINISH); > > > > @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, > > > > remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE; > > > > - if (remaining_size < max_size) { > > + if (!migration_postcopy_phase(migrate_get_current()) && > > + remaining_size < max_size) { > > qemu_mutex_lock_iothread(); > > rcu_read_lock(); > > migration_bitmap_sync(); -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Once we're in postcopy the source processors are stopped and memory > shouldn't change any more, so there's no need to look at the dirty > map. > > There are two notes to this: > 1) If we do resync and a page had changed then the page would get > sent again, which the destination wouldn't allow (since it might > have also modified the page) > 2) Before disabling this I'd seen very rare cases where a page had been > marked dirtied although the memory contents are apparently identical I suppose we don't know why. Any way to send a message to the dest with this info, so the dest can print out something? That'll help in debugging. (I'm suggesting sending a message to the dest, because after a migration, we don't ever think of looking at messages on the src. And chances are the dest could blow up after a migration is successful because of such "corruption".) > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Amit Shah <amit.shah@redhat.com> Amit
* Amit Shah (amit.shah@redhat.com) wrote: > On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Once we're in postcopy the source processors are stopped and memory > > shouldn't change any more, so there's no need to look at the dirty > > map. > > > > There are two notes to this: > > 1) If we do resync and a page had changed then the page would get > > sent again, which the destination wouldn't allow (since it might > > have also modified the page) > > 2) Before disabling this I'd seen very rare cases where a page had been > > marked dirtied although the memory contents are apparently identical > > I suppose we don't know why. Any way to send a message to the dest > with this info, so the dest can print out something? That'll help in > debugging. (I'm suggesting sending a message to the dest, because > after a migration, we don't ever think of looking at messages on the > src. And chances are the dest could blow up after a migration is > successful because of such "corruption".) One way perhaps would be to do one more sync at the end, after migration is apparently finished, but before the socket was closed; that would detect these changes and you could send a message to the other end. However, given that (2) I say that where I'd seen it the page contents were identical, this could be a false alarm, so we'd need to be careful. It also doesn't help you find out *why* it happens, since tracing back from a bit in the migration bitmap to the area of memory and the thing that marked it dirty is very hard. The only way to do that, is to mark the memory as read-only and then get a backtrace to find out who tried to change it; but you don't want to do that on a normal build and cause the source to die. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > > Reviewed-by: Amit Shah <amit.shah@redhat.com> Thanks. Dave > > Amit -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On (Fri) 31 Jul 2015 [10:50:46], Dr. David Alan Gilbert wrote: > * Amit Shah (amit.shah@redhat.com) wrote: > > On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > Once we're in postcopy the source processors are stopped and memory > > > shouldn't change any more, so there's no need to look at the dirty > > > map. > > > > > > There are two notes to this: > > > 1) If we do resync and a page had changed then the page would get > > > sent again, which the destination wouldn't allow (since it might > > > have also modified the page) > > > 2) Before disabling this I'd seen very rare cases where a page had been > > > marked dirtied although the memory contents are apparently identical > > > > I suppose we don't know why. Any way to send a message to the dest > > with this info, so the dest can print out something? That'll help in > > debugging. (I'm suggesting sending a message to the dest, because > > after a migration, we don't ever think of looking at messages on the > > src. And chances are the dest could blow up after a migration is > > successful because of such "corruption".) > > One way perhaps would be to do one more sync at the end, after migration > is apparently finished, but before the socket was closed; that would > detect these changes and you could send a message to the other end. However, > given that (2) I say that where I'd seen it the page contents were > identical, this could be a false alarm, so we'd need to be careful. > It also doesn't help you find out *why* it happens, since tracing > back from a bit in the migration bitmap to the area of memory > and the thing that marked it dirty is very hard. The only way to do > that, is to mark the memory as read-only and then get a backtrace > to find out who tried to change it; but you don't want to do > that on a normal build and cause the source to die. Agreed - but some notification that something might possibly be wrong is better than we not having such a clue, and fervently trying to debug an issue. In fact, a per-VM flag could be better since multiple migrations may mean such notifications could be lost in the logs of a previous host which we don't examine. Amit
diff --git a/migration/ram.c b/migration/ram.c index 01a0ab4..5cff4d6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque) { rcu_read_lock(); - migration_bitmap_sync(); + if (!migration_postcopy_phase(migrate_get_current())) { + migration_bitmap_sync(); + } ram_control_before_iterate(f, RAM_CONTROL_FINISH); @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE; - if (remaining_size < max_size) { + if (!migration_postcopy_phase(migrate_get_current()) && + remaining_size < max_size) { qemu_mutex_lock_iothread(); rcu_read_lock(); migration_bitmap_sync();