From patchwork Mon May 2 18:32:44 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 93696 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id C2184B6F62 for ; Tue, 3 May 2011 04:35:57 +1000 (EST) Received: from localhost ([::1]:37911 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QGxyN-0002sv-2H for incoming@patchwork.ozlabs.org; Mon, 02 May 2011 14:35:55 -0400 Received: from eggs.gnu.org ([140.186.70.92]:52330) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QGxwT-00009n-QT for qemu-devel@nongnu.org; Mon, 02 May 2011 14:33:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QGxwS-00059C-Ki for qemu-devel@nongnu.org; Mon, 02 May 2011 14:33:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30319) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QGxwS-00058l-C4 for qemu-devel@nongnu.org; Mon, 02 May 2011 14:33:56 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p42IXtbM028807 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 2 May 2011 14:33:55 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p42IXsDj003631; Mon, 2 May 2011 14:33:54 -0400 Received: from amt.cnet (vpn-10-54.rdu.redhat.com [10.11.10.54]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id p42IXqPI031497; Mon, 2 May 2011 14:33:52 -0400 Received: from amt.cnet (amt.cnet [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 6155068A110; Mon, 2 May 2011 15:33:14 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.4/8.14.4/Submit) id p42IXBU7003559; Mon, 2 May 2011 15:33:11 -0300 From: Marcelo Tosatti To: Anthony Liguori Date: Mon, 2 May 2011 15:32:44 -0300 Message-Id: <51b0c6065aa6e47a47094d73e24be298a4a7f3a1.1304361165.git.mtosatti@redhat.com> In-Reply-To: References: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: Avi Kivity , Michael Tokarev , qemu-devel@nongnu.org, kvm@vger.kernel.org, Bruce Rogers Subject: [Qemu-devel] [PATCH 8/9] fix crash in migration, 32-bit userspace on 64-bit host X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org From: Michael Tokarev This change fixes a long-standing immediate crash (memory corruption and abort in glibc malloc code) in migration on 32bits. The bug is present since this commit: commit 692d9aca97b865b0f7903565274a52606910f129 Author: Bruce Rogers Date: Wed Sep 23 16:13:18 2009 -0600 qemu-kvm: allocate correct size for dirty bitmap The dirty bitmap copied out to userspace is stored in a long array, and gets copied out to userspace accordingly. This patch accounts for that correctly. Currently I'm seeing kvm crashing due to writing beyond the end of the alloc'd dirty bitmap memory, because the buffer has the wrong size. Signed-off-by: Bruce Rogers Signed-off-by: Marcelo Tosatti --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr, - buf = qemu_malloc((slots[i].len / 4096 + 7) / 8 + 2); + buf = qemu_malloc(BITMAP_SIZE(slots[i].len)); r = kvm_get_map(kvm, KVM_GET_DIRTY_LOG, i, buf); BITMAP_SIZE is now open-coded in that function, like this: size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS), HOST_LONG_BITS) / 8; The problem is that HOST_LONG_BITS in 32bit userspace is 32 but it's 64 in 64bit kernel. So userspace aligns this to 32, and kernel to 64, but since no length is passed from userspace to kernel on ioctl, kernel uses its size calculation and copies 4 extra bytes to userspace, corrupting memory. Here's how it looks like during migrate execution: our=20, kern=24 our=4, kern=8 ... our=4, kern=8 our=4064, kern=4064 our=512, kern=512 our=4, kern=8 our=20, kern=24 our=4, kern=8 ... our=4, kern=8 our=4064, kern=4064 *** glibc detected *** ./x86_64-softmmu/qemu-system-x86_64: realloc(): invalid next size: 0x08f20528 *** (our is userspace size above, kern is the size as calculated by the kernel). Fix this by always aligning to 64 in a hope that no platform will have sizeof(long)>8 any time soon, and add a comment describing it all. It's a small price to pay for bad kernel design. Alternatively it's possible to fix that in the kernel by using different size calculation depending on the current process. But this becomes quite ugly. Special thanks goes to Stefan Hajnoczi for spotting the fundamental cause of the issue, and to Alexander Graf for his support in #qemu. Signed-off-by: Michael Tokarev CC: Bruce Rogers Signed-off-by: Avi Kivity --- kvm-all.c | 15 ++++++++++++++- 1 files changed, 14 insertions(+), 1 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index fd1fbfe..8b85029 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -373,7 +373,20 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, break; } - size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS), HOST_LONG_BITS) / 8; + /* XXX bad kernel interface alert + * For dirty bitmap, kernel allocates array of size aligned to + * bits-per-long. But for case when the kernel is 64bits and + * the userspace is 32bits, userspace can't align to the same + * bits-per-long, since sizeof(long) is different between kernel + * and user space. This way, userspace will provide buffer which + * may be 4 bytes less than the kernel will use, resulting in + * userspace memory corruption (which is not detectable by valgrind + * too, in most cases). + * So for now, let's align to 64 instead of HOST_LONG_BITS here, in + * a hope that sizeof(long) wont become >8 any time soon. + */ + size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS), + /*HOST_LONG_BITS*/ 64) / 8; if (!d.dirty_bitmap) { d.dirty_bitmap = qemu_malloc(size); } else if (size > allocated_size) {