migration: Fix madvise breakage if host and guest have different page sizes

Submitted by David Gibson on Nov. 21, 2012, 4:59 a.m.

Details

Message ID 1353473965-30678-1-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson Nov. 21, 2012, 4:59 a.m.
madvise(DONTNEED) will throw away the contents of the whole page at the
given address, even if the given length is less than the page size.  One
can argue about whether that's the correct behaviour, but that's what it's
done for a long time in Linux at least.

That means that the madvise() in ram_load(), on a setup where
TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
in guest pages adjacent to the one it's actually processing right now,
leading to guest memory corruption on an incoming migration.

This patch therefore, disables the madvise() if the host page size is
larger than TARGET_PAGE_SIZE.  This means we don't get the benefits of that
madvise() in this case, but a more complete fix is more difficult to
accomplish.  This at least fixes the guest memory corruption.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch_init.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

David Gibson Nov. 21, 2012, 5:04 a.m.
On Wed, Nov 21, 2012 at 03:59:25PM +1100, David Gibson wrote:
> madvise(DONTNEED) will throw away the contents of the whole page at the
> given address, even if the given length is less than the page size.  One
> can argue about whether that's the correct behaviour, but that's what it's
> done for a long time in Linux at least.
> 
> That means that the madvise() in ram_load(), on a setup where
> TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
> in guest pages adjacent to the one it's actually processing right now,
> leading to guest memory corruption on an incoming migration.
> 
> This patch therefore, disables the madvise() if the host page size is
> larger than TARGET_PAGE_SIZE.  This means we don't get the benefits of that
> madvise() in this case, but a more complete fix is more difficult to
> accomplish.  This at least fixes the guest memory corruption.

So, discussing the more complete fix here.  The first idea which
occurred to me was to instead madvise(DONTNEED) the entire memory
block in the RAM_SAVE_FLAG_MEM_SIZE phase.  Then skip the memset() in
the RAM_SAVE_FLAG_COMPRESS path if ch == 0, so that we don't force
unneeded zero pages back into host memory.

But that would be a bug in the case where the page is initially
non-zero, we migrate its contents, but then the page is zeroed on the
outgoing guest before the live migration completes.  To handle that
we'd need some kind of dirty flag on the incoming side recording if a
page has already been loaded once or not.  I don't know if anything
suitable exists.

Any thoughts on how to implement a more thorough fix than the one in
my patch?
David Gibson Nov. 21, 2012, 5:42 a.m.
On Wed, Nov 21, 2012 at 03:59:25PM +1100, David Gibson wrote:
> madvise(DONTNEED) will throw away the contents of the whole page at the
> given address, even if the given length is less than the page size.  One
> can argue about whether that's the correct behaviour, but that's what it's
> done for a long time in Linux at least.
> 
> That means that the madvise() in ram_load(), on a setup where
> TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
> in guest pages adjacent to the one it's actually processing right now,
> leading to guest memory corruption on an incoming migration.
> 
> This patch therefore, disables the madvise() if the host page size is
> larger than TARGET_PAGE_SIZE.  This means we don't get the benefits of that
> madvise() in this case, but a more complete fix is more difficult to
> accomplish.  This at least fixes the guest memory corruption.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Sorry, forgot to add:

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Patch hide | download patch | download mbox

diff --git a/arch_init.c b/arch_init.c
index b75a4c5..83dcc53 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -840,7 +840,8 @@  static int ram_load(QEMUFile *f, void *opaque, int version_id)
             memset(host, ch, TARGET_PAGE_SIZE);
 #ifndef _WIN32
             if (ch == 0 &&
-                (!kvm_enabled() || kvm_has_sync_mmu())) {
+                (!kvm_enabled() || kvm_has_sync_mmu()) &&
+                getpagesize() <= TARGET_PAGE_SIZE) {
                 qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
             }
 #endif