diff mbox

[v7,35/42] Don't sync dirty bitmaps in postcopy

Message ID 1434450415-11339-36-git-send-email-dgilbert@redhat.com
State New
Headers show

Commit Message

Dr. David Alan Gilbert June 16, 2015, 10:26 a.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once we're in postcopy the source processors are stopped and memory
shouldn't change any more, so there's no need to look at the dirty
map.

There are two notes to this:
  1) If we do resync and a page had changed then the page would get
     sent again, which the destination wouldn't allow (since it might
     have also modified the page)
  2) Before disabling this I'd seen very rare cases where a page had been
     marked dirtied although the memory contents are apparently identical

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 migration/ram.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Juan Quintela July 14, 2015, 12:36 p.m. UTC | #1
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Once we're in postcopy the source processors are stopped and memory
> shouldn't change any more, so there's no need to look at the dirty
> map.
>
> There are two notes to this:
>   1) If we do resync and a page had changed then the page would get
>      sent again, which the destination wouldn't allow (since it might
>      have also modified the page)
>   2) Before disabling this I'd seen very rare cases where a page had been
>      marked dirtied although the memory contents are apparently identical
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Juan Quintela <quintela@redhat.com>

But, in what patch do we sync the migratioon bitmap after changing to postcopy?

> ---
>  migration/ram.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 01a0ab4..5cff4d6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>  {
>      rcu_read_lock();
>  
> -    migration_bitmap_sync();
> +    if (!migration_postcopy_phase(migrate_get_current())) {
> +        migration_bitmap_sync();
> +    }
>  
>      ram_control_before_iterate(f, RAM_CONTROL_FINISH);
>  
> @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
>  
>      remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
>  
> -    if (remaining_size < max_size) {
> +    if (!migration_postcopy_phase(migrate_get_current()) &&
> +        remaining_size < max_size) {
>          qemu_mutex_lock_iothread();
>          rcu_read_lock();
>          migration_bitmap_sync();
Dr. David Alan Gilbert July 14, 2015, 1:13 p.m. UTC | #2
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Once we're in postcopy the source processors are stopped and memory
> > shouldn't change any more, so there's no need to look at the dirty
> > map.
> >
> > There are two notes to this:
> >   1) If we do resync and a page had changed then the page would get
> >      sent again, which the destination wouldn't allow (since it might
> >      have also modified the page)
> >   2) Before disabling this I'd seen very rare cases where a page had been
> >      marked dirtied although the memory contents are apparently identical
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> But, in what patch do we sync the migratioon bitmap after changing to postcopy?

It's called one last time in ram_postcopy_send_discard_bitmap; see:
v7-0025-Postcopy-Maintain-sentmap-and-calculate-discard.patch

and that happens at the start of postcopy mode, when the CPU is stopped and won't
be running on the source again.

Dave

> 
> > ---
> >  migration/ram.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 01a0ab4..5cff4d6 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> >  {
> >      rcu_read_lock();
> >  
> > -    migration_bitmap_sync();
> > +    if (!migration_postcopy_phase(migrate_get_current())) {
> > +        migration_bitmap_sync();
> > +    }
> >  
> >      ram_control_before_iterate(f, RAM_CONTROL_FINISH);
> >  
> > @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> >  
> >      remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
> >  
> > -    if (remaining_size < max_size) {
> > +    if (!migration_postcopy_phase(migrate_get_current()) &&
> > +        remaining_size < max_size) {
> >          qemu_mutex_lock_iothread();
> >          rcu_read_lock();
> >          migration_bitmap_sync();
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Amit Shah July 27, 2015, 7:43 a.m. UTC | #3
On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once we're in postcopy the source processors are stopped and memory
> shouldn't change any more, so there's no need to look at the dirty
> map.
> 
> There are two notes to this:
>   1) If we do resync and a page had changed then the page would get
>      sent again, which the destination wouldn't allow (since it might
>      have also modified the page)
>   2) Before disabling this I'd seen very rare cases where a page had been
>      marked dirtied although the memory contents are apparently identical

I suppose we don't know why.  Any way to send a message to the dest
with this info, so the dest can print out something?  That'll help in
debugging.  (I'm suggesting sending a message to the dest, because
after a migration, we don't ever think of looking at messages on the
src.  And chances are the dest could blow up after a migration is
successful because of such "corruption".)

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit
Dr. David Alan Gilbert July 31, 2015, 9:50 a.m. UTC | #4
* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Once we're in postcopy the source processors are stopped and memory
> > shouldn't change any more, so there's no need to look at the dirty
> > map.
> > 
> > There are two notes to this:
> >   1) If we do resync and a page had changed then the page would get
> >      sent again, which the destination wouldn't allow (since it might
> >      have also modified the page)
> >   2) Before disabling this I'd seen very rare cases where a page had been
> >      marked dirtied although the memory contents are apparently identical
> 
> I suppose we don't know why.  Any way to send a message to the dest
> with this info, so the dest can print out something?  That'll help in
> debugging.  (I'm suggesting sending a message to the dest, because
> after a migration, we don't ever think of looking at messages on the
> src.  And chances are the dest could blow up after a migration is
> successful because of such "corruption".)

One way perhaps would be to do one more sync at the end, after migration
is apparently finished, but before the socket was closed; that would
detect these changes and you could send a message to the other end.  However,
given that (2) I say that where I'd seen it the page contents were
identical, this could be a false alarm, so we'd need to be careful.
It also doesn't help you find out *why* it happens, since tracing
back from a bit in the migration bitmap to the area of memory
and the thing that marked it dirty is very hard.  The only way to do
that, is to mark the memory as read-only and then get a backtrace
to find out who tried to change it; but you don't want to do
that on a normal build and cause the source to die.

> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Thanks.

Dave
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Amit Shah Aug. 4, 2015, 5:46 a.m. UTC | #5
On (Fri) 31 Jul 2015 [10:50:46], Dr. David Alan Gilbert wrote:
> * Amit Shah (amit.shah@redhat.com) wrote:
> > On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Once we're in postcopy the source processors are stopped and memory
> > > shouldn't change any more, so there's no need to look at the dirty
> > > map.
> > > 
> > > There are two notes to this:
> > >   1) If we do resync and a page had changed then the page would get
> > >      sent again, which the destination wouldn't allow (since it might
> > >      have also modified the page)
> > >   2) Before disabling this I'd seen very rare cases where a page had been
> > >      marked dirtied although the memory contents are apparently identical
> > 
> > I suppose we don't know why.  Any way to send a message to the dest
> > with this info, so the dest can print out something?  That'll help in
> > debugging.  (I'm suggesting sending a message to the dest, because
> > after a migration, we don't ever think of looking at messages on the
> > src.  And chances are the dest could blow up after a migration is
> > successful because of such "corruption".)
> 
> One way perhaps would be to do one more sync at the end, after migration
> is apparently finished, but before the socket was closed; that would
> detect these changes and you could send a message to the other end.  However,
> given that (2) I say that where I'd seen it the page contents were
> identical, this could be a false alarm, so we'd need to be careful.
> It also doesn't help you find out *why* it happens, since tracing
> back from a bit in the migration bitmap to the area of memory
> and the thing that marked it dirty is very hard.  The only way to do
> that, is to mark the memory as read-only and then get a backtrace
> to find out who tried to change it; but you don't want to do
> that on a normal build and cause the source to die.

Agreed - but some notification that something might possibly be wrong
is better than we not having such a clue, and fervently trying to
debug an issue.  In fact, a per-VM flag could be better since multiple
migrations may mean such notifications could be lost in the logs of a
previous host which we don't examine.

		Amit
diff mbox

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 01a0ab4..5cff4d6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1643,7 +1643,9 @@  static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     rcu_read_lock();
 
-    migration_bitmap_sync();
+    if (!migration_postcopy_phase(migrate_get_current())) {
+        migration_bitmap_sync();
+    }
 
     ram_control_before_iterate(f, RAM_CONTROL_FINISH);
 
@@ -1678,7 +1680,8 @@  static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
 
     remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
 
-    if (remaining_size < max_size) {
+    if (!migration_postcopy_phase(migrate_get_current()) &&
+        remaining_size < max_size) {
         qemu_mutex_lock_iothread();
         rcu_read_lock();
         migration_bitmap_sync();