Patchwork migration: Fix madvise breakage if host and guest have different page sizes

login
register
mail settings
Submitter David Gibson
Date Nov. 21, 2012, 4:59 a.m.
Message ID <1353473965-30678-1-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/200549/
State New
Headers show

Comments

David Gibson - Nov. 21, 2012, 4:59 a.m.
madvise(DONTNEED) will throw away the contents of the whole page at the
given address, even if the given length is less than the page size.  One
can argue about whether that's the correct behaviour, but that's what it's
done for a long time in Linux at least.

That means that the madvise() in ram_load(), on a setup where
TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
in guest pages adjacent to the one it's actually processing right now,
leading to guest memory corruption on an incoming migration.

This patch therefore, disables the madvise() if the host page size is
larger than TARGET_PAGE_SIZE.  This means we don't get the benefits of that
madvise() in this case, but a more complete fix is more difficult to
accomplish.  This at least fixes the guest memory corruption.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch_init.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
David Gibson - Nov. 21, 2012, 5:04 a.m.
On Wed, Nov 21, 2012 at 03:59:25PM +1100, David Gibson wrote:
> madvise(DONTNEED) will throw away the contents of the whole page at the
> given address, even if the given length is less than the page size.  One
> can argue about whether that's the correct behaviour, but that's what it's
> done for a long time in Linux at least.
> 
> That means that the madvise() in ram_load(), on a setup where
> TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
> in guest pages adjacent to the one it's actually processing right now,
> leading to guest memory corruption on an incoming migration.
> 
> This patch therefore, disables the madvise() if the host page size is
> larger than TARGET_PAGE_SIZE.  This means we don't get the benefits of that
> madvise() in this case, but a more complete fix is more difficult to
> accomplish.  This at least fixes the guest memory corruption.

So, discussing the more complete fix here.  The first idea which
occurred to me was to instead madvise(DONTNEED) the entire memory
block in the RAM_SAVE_FLAG_MEM_SIZE phase.  Then skip the memset() in
the RAM_SAVE_FLAG_COMPRESS path if ch == 0, so that we don't force
unneeded zero pages back into host memory.

But that would be a bug in the case where the page is initially
non-zero, we migrate its contents, but then the page is zeroed on the
outgoing guest before the live migration completes.  To handle that
we'd need some kind of dirty flag on the incoming side recording if a
page has already been loaded once or not.  I don't know if anything
suitable exists.

Any thoughts on how to implement a more thorough fix than the one in
my patch?
David Gibson - Nov. 21, 2012, 5:42 a.m.
On Wed, Nov 21, 2012 at 03:59:25PM +1100, David Gibson wrote:
> madvise(DONTNEED) will throw away the contents of the whole page at the
> given address, even if the given length is less than the page size.  One
> can argue about whether that's the correct behaviour, but that's what it's
> done for a long time in Linux at least.
> 
> That means that the madvise() in ram_load(), on a setup where
> TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
> in guest pages adjacent to the one it's actually processing right now,
> leading to guest memory corruption on an incoming migration.
> 
> This patch therefore, disables the madvise() if the host page size is
> larger than TARGET_PAGE_SIZE.  This means we don't get the benefits of that
> madvise() in this case, but a more complete fix is more difficult to
> accomplish.  This at least fixes the guest memory corruption.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Sorry, forgot to add:

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Patch

diff --git a/arch_init.c b/arch_init.c
index b75a4c5..83dcc53 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -840,7 +840,8 @@  static int ram_load(QEMUFile *f, void *opaque, int version_id)
             memset(host, ch, TARGET_PAGE_SIZE);
 #ifndef _WIN32
             if (ch == 0 &&
-                (!kvm_enabled() || kvm_has_sync_mmu())) {
+                (!kvm_enabled() || kvm_has_sync_mmu()) &&
+                getpagesize() <= TARGET_PAGE_SIZE) {
                 qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
             }
 #endif