Message ID | 1434450415-11339-43-git-send-email-dgilbert@redhat.com |
---|---|
State | New |
Headers | show |
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Postcopy detects accesses to pages that haven't been transferred yet > using userfaultfd, and it causes exceptions on pages that are 'not > present'. > Ballooning also causes pages to be marked as 'not present' when the > guest inflates the balloon. > Potentially a balloon could be inflated to discard pages that are > currently inflight during postcopy and that may be arriving at about > the same time. > > To avoid this confusion, disable ballooning during postcopy. > > When disabled we drop balloon requests from the guest. Since ballooning > is generally initiated by the host, the management system should avoid > initiating any balloon instructions to the guest during migration, > although it's not possible to know how long it would take a guest to > process a request made prior to the start of migration. > > Queueing the requests until after migration would be nice, but is > non-trivial, since the set of inflate/deflate requests have to > be compared with the state of the page to know what the final > outcome is allowed to be. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Postcopy detects accesses to pages that haven't been transferred yet > using userfaultfd, and it causes exceptions on pages that are 'not > present'. > Ballooning also causes pages to be marked as 'not present' when the > guest inflates the balloon. > Potentially a balloon could be inflated to discard pages that are > currently inflight during postcopy and that may be arriving at about > the same time. > > To avoid this confusion, disable ballooning during postcopy. > > When disabled we drop balloon requests from the guest. Since ballooning > is generally initiated by the host, the management system should avoid > initiating any balloon instructions to the guest during migration, > although it's not possible to know how long it would take a guest to > process a request made prior to the start of migration. > > Queueing the requests until after migration would be nice, but is > non-trivial, since the set of inflate/deflate requests have to > be compared with the state of the page to know what the final > outcome is allowed to be. I didn't track the previous discussion, but there were plans to have guest-initiated balloon requests for cases where the guest wants to co-operate with hosts and return any free mem available We don't currently have guests that do this, but we also don't want to have a dependency between the host and guest -- they should be independent. This approach here seems the simplest possible, short of maintaining another bitmap for the duration of postcopy which indicates guest-freed memory pages which postcopy should not populate, after receiving them at the dest (this sounds better to me than queuing up guest requests). The downside here is that the guest offered some memory back, and we don't use it. The guest also doesn't use it -- so it's a double loss, of sorts. Thoughts? I don't have a problem with this current approach, but if we could get something better, that'll be good too. Amit
* Amit Shah (amit.shah@redhat.com) wrote: > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Postcopy detects accesses to pages that haven't been transferred yet > > using userfaultfd, and it causes exceptions on pages that are 'not > > present'. > > Ballooning also causes pages to be marked as 'not present' when the > > guest inflates the balloon. > > Potentially a balloon could be inflated to discard pages that are > > currently inflight during postcopy and that may be arriving at about > > the same time. > > > > To avoid this confusion, disable ballooning during postcopy. > > > > When disabled we drop balloon requests from the guest. Since ballooning > > is generally initiated by the host, the management system should avoid > > initiating any balloon instructions to the guest during migration, > > although it's not possible to know how long it would take a guest to > > process a request made prior to the start of migration. > > > > Queueing the requests until after migration would be nice, but is > > non-trivial, since the set of inflate/deflate requests have to > > be compared with the state of the page to know what the final > > outcome is allowed to be. > > I didn't track the previous discussion, but there were plans to have > guest-initiated balloon requests for cases where the guest wants to > co-operate with hosts and return any free mem available We don't > currently have guests that do this, but we also don't want to have a > dependency between the host and guest -- they should be independent. > > This approach here seems the simplest possible, short of maintaining > another bitmap for the duration of postcopy which indicates > guest-freed memory pages which postcopy should not populate, after > receiving them at the dest (this sounds better to me than queuing up > guest requests). > > The downside here is that the guest offered some memory back, and we > don't use it. The guest also doesn't use it -- so it's a double loss, > of sorts. > > Thoughts? I don't have a problem with this current approach, but if > we could get something better, that'll be good too. It needs something like that bitmap, but it would take quite a bit of care to manage the interaction between: a) The guest emitting balloon notifications b) Pages being received from the source c) Destination use of that page we also have to think what to do with a page that's been ballooned after reception of the source page; the madvise(dontneed) that's used normally would cause userfault to fire again, and we can't allow that. (We could make it the same as receiving a zero page). But then we would also have to cope with the source sending us a page after the destination has ballooned it and make sure to discard that (I suspect there are further ordering examples that have to also be considered). Dave > > Amit -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote: > * Amit Shah (amit.shah@redhat.com) wrote: > > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > Postcopy detects accesses to pages that haven't been transferred yet > > > using userfaultfd, and it causes exceptions on pages that are 'not > > > present'. > > > Ballooning also causes pages to be marked as 'not present' when the > > > guest inflates the balloon. > > > Potentially a balloon could be inflated to discard pages that are > > > currently inflight during postcopy and that may be arriving at about > > > the same time. > > > > > > To avoid this confusion, disable ballooning during postcopy. > > > > > > When disabled we drop balloon requests from the guest. Since ballooning > > > is generally initiated by the host, the management system should avoid > > > initiating any balloon instructions to the guest during migration, > > > although it's not possible to know how long it would take a guest to > > > process a request made prior to the start of migration. > > > > > > Queueing the requests until after migration would be nice, but is > > > non-trivial, since the set of inflate/deflate requests have to > > > be compared with the state of the page to know what the final > > > outcome is allowed to be. > > > > I didn't track the previous discussion, but there were plans to have > > guest-initiated balloon requests for cases where the guest wants to > > co-operate with hosts and return any free mem available We don't > > currently have guests that do this, but we also don't want to have a > > dependency between the host and guest -- they should be independent. > > > > This approach here seems the simplest possible, short of maintaining > > another bitmap for the duration of postcopy which indicates > > guest-freed memory pages which postcopy should not populate, after > > receiving them at the dest (this sounds better to me than queuing up > > guest requests). > > > > The downside here is that the guest offered some memory back, and we > > don't use it. The guest also doesn't use it -- so it's a double loss, > > of sorts. > > > > Thoughts? I don't have a problem with this current approach, but if > > we could get something better, that'll be good too. > > It needs something like that bitmap, but it would take quite a bit > of care to manage the interaction between: > a) The guest emitting balloon notifications > b) Pages being received from the source > c) Destination use of that page > > we also have to think what to do with a page that's been ballooned > after reception of the source page; the madvise(dontneed) that's used > normally would cause userfault to fire again, and we can't allow that. > (We could make it the same as receiving a zero page). But then we would > also have to cope with the source sending us a page after the destination > has ballooned it and make sure to discard that (I suspect there are further > ordering examples that have to also be considered). Yeah. I'm fine with the current approach, with the downsides mentioned. Maybe in the commit message, make it explicit that the guest may think it's given up ownership, but the host won't honour this till postcopy isn't finished. Anyway: Reviewed-by: Amit Shah <amit.shah@redhat.com> Amit
* Amit Shah (amit.shah@redhat.com) wrote: > On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote: > > * Amit Shah (amit.shah@redhat.com) wrote: > > > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote: > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > Postcopy detects accesses to pages that haven't been transferred yet > > > > using userfaultfd, and it causes exceptions on pages that are 'not > > > > present'. > > > > Ballooning also causes pages to be marked as 'not present' when the > > > > guest inflates the balloon. > > > > Potentially a balloon could be inflated to discard pages that are > > > > currently inflight during postcopy and that may be arriving at about > > > > the same time. > > > > > > > > To avoid this confusion, disable ballooning during postcopy. > > > > > > > > When disabled we drop balloon requests from the guest. Since ballooning > > > > is generally initiated by the host, the management system should avoid > > > > initiating any balloon instructions to the guest during migration, > > > > although it's not possible to know how long it would take a guest to > > > > process a request made prior to the start of migration. > > > > > > > > Queueing the requests until after migration would be nice, but is > > > > non-trivial, since the set of inflate/deflate requests have to > > > > be compared with the state of the page to know what the final > > > > outcome is allowed to be. > > > > > > I didn't track the previous discussion, but there were plans to have > > > guest-initiated balloon requests for cases where the guest wants to > > > co-operate with hosts and return any free mem available We don't > > > currently have guests that do this, but we also don't want to have a > > > dependency between the host and guest -- they should be independent. > > > > > > This approach here seems the simplest possible, short of maintaining > > > another bitmap for the duration of postcopy which indicates > > > guest-freed memory pages which postcopy should not populate, after > > > receiving them at the dest (this sounds better to me than queuing up > > > guest requests). > > > > > > The downside here is that the guest offered some memory back, and we > > > don't use it. The guest also doesn't use it -- so it's a double loss, > > > of sorts. > > > > > > Thoughts? I don't have a problem with this current approach, but if > > > we could get something better, that'll be good too. > > > > It needs something like that bitmap, but it would take quite a bit > > of care to manage the interaction between: > > a) The guest emitting balloon notifications > > b) Pages being received from the source > > c) Destination use of that page > > > > we also have to think what to do with a page that's been ballooned > > after reception of the source page; the madvise(dontneed) that's used > > normally would cause userfault to fire again, and we can't allow that. > > (We could make it the same as receiving a zero page). But then we would > > also have to cope with the source sending us a page after the destination > > has ballooned it and make sure to discard that (I suspect there are further > > ordering examples that have to also be considered). > > Yeah. I'm fine with the current approach, with the downsides > mentioned. Maybe in the commit message, make it explicit that the > guest may think it's given up ownership, but the host won't honour > this till postcopy isn't finished. OK, I've added the text: 'Guest initiated ballooning will not know if it's really freed a page of host memory or not.' > Anyway: > > Reviewed-by: Amit Shah <amit.shah@redhat.com> Thanks. Dave > > > Amit -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/balloon.c b/balloon.c index c7033e3..80a8280 100644 --- a/balloon.c +++ b/balloon.c @@ -35,6 +35,17 @@ static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; static void *balloon_opaque; +static bool balloon_inhibited; + +bool qemu_balloon_is_inhibited(void) +{ + return balloon_inhibited; +} + +void qemu_balloon_inhibit(bool state) +{ + balloon_inhibited = state; +} static bool have_balloon(Error **errp) { diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 78bc14f..7d7170a 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -37,9 +37,11 @@ static void balloon_page(void *addr, int deflate) { #if defined(__linux__) - if (!kvm_enabled() || kvm_has_sync_mmu()) + if (!qemu_balloon_is_inhibited() && (!kvm_enabled() || + kvm_has_sync_mmu())) { qemu_madvise(addr, TARGET_PAGE_SIZE, deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); + } #endif } diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h index 0345e01..6851d99 100644 --- a/include/sysemu/balloon.h +++ b/include/sysemu/balloon.h @@ -23,5 +23,7 @@ typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info); int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, QEMUBalloonStatus *stat_func, void *opaque); void qemu_remove_balloon_handler(void *opaque); +bool qemu_balloon_is_inhibited(void); +void qemu_balloon_inhibit(bool state); #endif diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index be7e5f2..4670cf0 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -24,6 +24,7 @@ #include "migration/migration.h" #include "migration/postcopy-ram.h" #include "sysemu/sysemu.h" +#include "sysemu/balloon.h" #include "qemu/error-report.h" #include "trace.h" @@ -317,6 +318,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis) mis->have_fault_thread = false; } + qemu_balloon_inhibit(false); + if (enable_mlock) { if (os_mlock() < 0) { error_report("mlock: %s", strerror(errno)); @@ -505,6 +508,12 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis) return -1; } + /* + * Ballooning can mark pages as absent while we're postcopying + * that would cause false userfaults. + */ + qemu_balloon_inhibit(true); + trace_postcopy_ram_enable_notify(); return 0;