diff mbox

[v7,42/42] Inhibit ballooning during postcopy

Message ID 1434450415-11339-43-git-send-email-dgilbert@redhat.com
State New
Headers show

Commit Message

Dr. David Alan Gilbert June 16, 2015, 10:26 a.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy detects accesses to pages that haven't been transferred yet
using userfaultfd, and it causes exceptions on pages that are 'not
present'.
Ballooning also causes pages to be marked as 'not present' when the
guest inflates the balloon.
Potentially a balloon could be inflated to discard pages that are
currently inflight during postcopy and that may be arriving at about
the same time.

To avoid this confusion, disable ballooning during postcopy.

When disabled we drop balloon requests from the guest.  Since ballooning
is generally initiated by the host, the management system should avoid
initiating any balloon instructions to the guest during migration,
although it's not possible to know how long it would take a guest to
process a request made prior to the start of migration.

Queueing the requests until after migration would be nice, but is
non-trivial, since the set of inflate/deflate requests have to
be compared with the state of the page to know what the final
outcome is allowed to be.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 balloon.c                  | 11 +++++++++++
 hw/virtio/virtio-balloon.c |  4 +++-
 include/sysemu/balloon.h   |  2 ++
 migration/postcopy-ram.c   |  9 +++++++++
 4 files changed, 25 insertions(+), 1 deletion(-)

Comments

Juan Quintela July 14, 2015, 3:24 p.m. UTC | #1
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Postcopy detects accesses to pages that haven't been transferred yet
> using userfaultfd, and it causes exceptions on pages that are 'not
> present'.
> Ballooning also causes pages to be marked as 'not present' when the
> guest inflates the balloon.
> Potentially a balloon could be inflated to discard pages that are
> currently inflight during postcopy and that may be arriving at about
> the same time.
>
> To avoid this confusion, disable ballooning during postcopy.
>
> When disabled we drop balloon requests from the guest.  Since ballooning
> is generally initiated by the host, the management system should avoid
> initiating any balloon instructions to the guest during migration,
> although it's not possible to know how long it would take a guest to
> process a request made prior to the start of migration.
>
> Queueing the requests until after migration would be nice, but is
> non-trivial, since the set of inflate/deflate requests have to
> be compared with the state of the page to know what the final
> outcome is allowed to be.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>
Amit Shah July 28, 2015, 6:15 a.m. UTC | #2
On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy detects accesses to pages that haven't been transferred yet
> using userfaultfd, and it causes exceptions on pages that are 'not
> present'.
> Ballooning also causes pages to be marked as 'not present' when the
> guest inflates the balloon.
> Potentially a balloon could be inflated to discard pages that are
> currently inflight during postcopy and that may be arriving at about
> the same time.
> 
> To avoid this confusion, disable ballooning during postcopy.
> 
> When disabled we drop balloon requests from the guest.  Since ballooning
> is generally initiated by the host, the management system should avoid
> initiating any balloon instructions to the guest during migration,
> although it's not possible to know how long it would take a guest to
> process a request made prior to the start of migration.
> 
> Queueing the requests until after migration would be nice, but is
> non-trivial, since the set of inflate/deflate requests have to
> be compared with the state of the page to know what the final
> outcome is allowed to be.

I didn't track the previous discussion, but there were plans to have
guest-initiated balloon requests for cases where the guest wants to
co-operate with hosts and return any free mem available We don't
currently have guests that do this, but we also don't want to have a
dependency between the host and guest -- they should be independent.

This approach here seems the simplest possible, short of maintaining
another bitmap for the duration of postcopy which indicates
guest-freed memory pages which postcopy should not populate, after
receiving them at the dest (this sounds better to me than queuing up
guest requests).

The downside here is that the guest offered some memory back, and we
don't use it.  The guest also doesn't use it -- so it's a double loss,
of sorts.

Thoughts?  I don't have a problem with this current approach, but if
we could get something better, that'll be good too.

		Amit
Dr. David Alan Gilbert July 28, 2015, 9:08 a.m. UTC | #3
* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Postcopy detects accesses to pages that haven't been transferred yet
> > using userfaultfd, and it causes exceptions on pages that are 'not
> > present'.
> > Ballooning also causes pages to be marked as 'not present' when the
> > guest inflates the balloon.
> > Potentially a balloon could be inflated to discard pages that are
> > currently inflight during postcopy and that may be arriving at about
> > the same time.
> > 
> > To avoid this confusion, disable ballooning during postcopy.
> > 
> > When disabled we drop balloon requests from the guest.  Since ballooning
> > is generally initiated by the host, the management system should avoid
> > initiating any balloon instructions to the guest during migration,
> > although it's not possible to know how long it would take a guest to
> > process a request made prior to the start of migration.
> > 
> > Queueing the requests until after migration would be nice, but is
> > non-trivial, since the set of inflate/deflate requests have to
> > be compared with the state of the page to know what the final
> > outcome is allowed to be.
> 
> I didn't track the previous discussion, but there were plans to have
> guest-initiated balloon requests for cases where the guest wants to
> co-operate with hosts and return any free mem available We don't
> currently have guests that do this, but we also don't want to have a
> dependency between the host and guest -- they should be independent.
> 
> This approach here seems the simplest possible, short of maintaining
> another bitmap for the duration of postcopy which indicates
> guest-freed memory pages which postcopy should not populate, after
> receiving them at the dest (this sounds better to me than queuing up
> guest requests).
> 
> The downside here is that the guest offered some memory back, and we
> don't use it.  The guest also doesn't use it -- so it's a double loss,
> of sorts.
> 
> Thoughts?  I don't have a problem with this current approach, but if
> we could get something better, that'll be good too.

It needs something like that bitmap, but it would take quite a bit
of care to manage the interaction between:
    a) The guest emitting balloon notifications
    b) Pages being received from the source
    c) Destination use of that page

  we also have to think what to do with a page that's been ballooned
after reception of the source page; the madvise(dontneed) that's used
normally would cause userfault to fire again, and we can't allow that.
(We could make it the same as receiving a zero page).   But then we would
also have to cope with  the source sending us a page after the destination
has ballooned it and make sure to discard that (I suspect there are further
ordering examples that have to also be considered).

Dave
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Amit Shah July 28, 2015, 10:01 a.m. UTC | #4
On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote:
> * Amit Shah (amit.shah@redhat.com) wrote:
> > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Postcopy detects accesses to pages that haven't been transferred yet
> > > using userfaultfd, and it causes exceptions on pages that are 'not
> > > present'.
> > > Ballooning also causes pages to be marked as 'not present' when the
> > > guest inflates the balloon.
> > > Potentially a balloon could be inflated to discard pages that are
> > > currently inflight during postcopy and that may be arriving at about
> > > the same time.
> > > 
> > > To avoid this confusion, disable ballooning during postcopy.
> > > 
> > > When disabled we drop balloon requests from the guest.  Since ballooning
> > > is generally initiated by the host, the management system should avoid
> > > initiating any balloon instructions to the guest during migration,
> > > although it's not possible to know how long it would take a guest to
> > > process a request made prior to the start of migration.
> > > 
> > > Queueing the requests until after migration would be nice, but is
> > > non-trivial, since the set of inflate/deflate requests have to
> > > be compared with the state of the page to know what the final
> > > outcome is allowed to be.
> > 
> > I didn't track the previous discussion, but there were plans to have
> > guest-initiated balloon requests for cases where the guest wants to
> > co-operate with hosts and return any free mem available We don't
> > currently have guests that do this, but we also don't want to have a
> > dependency between the host and guest -- they should be independent.
> > 
> > This approach here seems the simplest possible, short of maintaining
> > another bitmap for the duration of postcopy which indicates
> > guest-freed memory pages which postcopy should not populate, after
> > receiving them at the dest (this sounds better to me than queuing up
> > guest requests).
> > 
> > The downside here is that the guest offered some memory back, and we
> > don't use it.  The guest also doesn't use it -- so it's a double loss,
> > of sorts.
> > 
> > Thoughts?  I don't have a problem with this current approach, but if
> > we could get something better, that'll be good too.
> 
> It needs something like that bitmap, but it would take quite a bit
> of care to manage the interaction between:
>     a) The guest emitting balloon notifications
>     b) Pages being received from the source
>     c) Destination use of that page
> 
>   we also have to think what to do with a page that's been ballooned
> after reception of the source page; the madvise(dontneed) that's used
> normally would cause userfault to fire again, and we can't allow that.
> (We could make it the same as receiving a zero page).   But then we would
> also have to cope with  the source sending us a page after the destination
> has ballooned it and make sure to discard that (I suspect there are further
> ordering examples that have to also be considered).

Yeah.  I'm fine with the current approach, with the downsides
mentioned.  Maybe in the commit message, make it explicit that the
guest may think it's given up ownership, but the host won't honour
this till postcopy isn't finished.

Anyway:

Reviewed-by: Amit Shah <amit.shah@redhat.com>


		Amit
Dr. David Alan Gilbert July 28, 2015, 11:16 a.m. UTC | #5
* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote:
> > * Amit Shah (amit.shah@redhat.com) wrote:
> > > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Postcopy detects accesses to pages that haven't been transferred yet
> > > > using userfaultfd, and it causes exceptions on pages that are 'not
> > > > present'.
> > > > Ballooning also causes pages to be marked as 'not present' when the
> > > > guest inflates the balloon.
> > > > Potentially a balloon could be inflated to discard pages that are
> > > > currently inflight during postcopy and that may be arriving at about
> > > > the same time.
> > > > 
> > > > To avoid this confusion, disable ballooning during postcopy.
> > > > 
> > > > When disabled we drop balloon requests from the guest.  Since ballooning
> > > > is generally initiated by the host, the management system should avoid
> > > > initiating any balloon instructions to the guest during migration,
> > > > although it's not possible to know how long it would take a guest to
> > > > process a request made prior to the start of migration.
> > > > 
> > > > Queueing the requests until after migration would be nice, but is
> > > > non-trivial, since the set of inflate/deflate requests have to
> > > > be compared with the state of the page to know what the final
> > > > outcome is allowed to be.
> > > 
> > > I didn't track the previous discussion, but there were plans to have
> > > guest-initiated balloon requests for cases where the guest wants to
> > > co-operate with hosts and return any free mem available We don't
> > > currently have guests that do this, but we also don't want to have a
> > > dependency between the host and guest -- they should be independent.
> > > 
> > > This approach here seems the simplest possible, short of maintaining
> > > another bitmap for the duration of postcopy which indicates
> > > guest-freed memory pages which postcopy should not populate, after
> > > receiving them at the dest (this sounds better to me than queuing up
> > > guest requests).
> > > 
> > > The downside here is that the guest offered some memory back, and we
> > > don't use it.  The guest also doesn't use it -- so it's a double loss,
> > > of sorts.
> > > 
> > > Thoughts?  I don't have a problem with this current approach, but if
> > > we could get something better, that'll be good too.
> > 
> > It needs something like that bitmap, but it would take quite a bit
> > of care to manage the interaction between:
> >     a) The guest emitting balloon notifications
> >     b) Pages being received from the source
> >     c) Destination use of that page
> > 
> >   we also have to think what to do with a page that's been ballooned
> > after reception of the source page; the madvise(dontneed) that's used
> > normally would cause userfault to fire again, and we can't allow that.
> > (We could make it the same as receiving a zero page).   But then we would
> > also have to cope with  the source sending us a page after the destination
> > has ballooned it and make sure to discard that (I suspect there are further
> > ordering examples that have to also be considered).
> 
> Yeah.  I'm fine with the current approach, with the downsides
> mentioned.  Maybe in the commit message, make it explicit that the
> guest may think it's given up ownership, but the host won't honour
> this till postcopy isn't finished.

OK, I've added the text:
'Guest initiated ballooning will not know if it's really freed a page
of host memory or not.'

> Anyway:
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Thanks.

Dave

> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/balloon.c b/balloon.c
index c7033e3..80a8280 100644
--- a/balloon.c
+++ b/balloon.c
@@ -35,6 +35,17 @@ 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
 static void *balloon_opaque;
+static bool balloon_inhibited;
+
+bool qemu_balloon_is_inhibited(void)
+{
+    return balloon_inhibited;
+}
+
+void qemu_balloon_inhibit(bool state)
+{
+    balloon_inhibited = state;
+}
 
 static bool have_balloon(Error **errp)
 {
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 78bc14f..7d7170a 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -37,9 +37,11 @@ 
 static void balloon_page(void *addr, int deflate)
 {
 #if defined(__linux__)
-    if (!kvm_enabled() || kvm_has_sync_mmu())
+    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
+                                         kvm_has_sync_mmu())) {
         qemu_madvise(addr, TARGET_PAGE_SIZE,
                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+    }
 #endif
 }
 
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 0345e01..6851d99 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -23,5 +23,7 @@  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
 			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
+bool qemu_balloon_is_inhibited(void);
+void qemu_balloon_inhibit(bool state);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index be7e5f2..4670cf0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -24,6 +24,7 @@ 
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/balloon.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -317,6 +318,8 @@  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_fault_thread = false;
     }
 
+    qemu_balloon_inhibit(false);
+
     if (enable_mlock) {
         if (os_mlock() < 0) {
             error_report("mlock: %s", strerror(errno));
@@ -505,6 +508,12 @@  int postcopy_ram_enable_notify(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Ballooning can mark pages as absent while we're postcopying
+     * that would cause false userfaults.
+     */
+    qemu_balloon_inhibit(true);
+
     trace_postcopy_ram_enable_notify();
 
     return 0;