From patchwork Wed Feb 24 19:10:12 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Liguori X-Patchwork-Id: 46202 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 2C102B7C59 for ; Thu, 25 Feb 2010 06:56:51 +1100 (EST) Received: from localhost ([127.0.0.1]:48637 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NkNLk-0002t6-JX for incoming@patchwork.ozlabs.org; Wed, 24 Feb 2010 14:56:48 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NkMcu-0003Tk-Hk for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:28 -0500 Received: from [199.232.76.173] (port=50054 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NkMct-0003TQ-Vl for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:28 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NkMct-0005Wd-0q for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:27 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:46804) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NkMcs-0005WX-LE for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:26 -0500 Received: from d03relay03.boulder.ibm.com (d03relay03.boulder.ibm.com [9.17.195.228]) by e33.co.us.ibm.com (8.14.3/8.13.1) with ESMTP id o1OJ6tlN020241 for ; Wed, 24 Feb 2010 12:06:55 -0700 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o1OJAGFs129512 for ; Wed, 24 Feb 2010 12:10:17 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o1OJAFEF028980 for ; Wed, 24 Feb 2010 12:10:15 -0700 Received: from localhost.localdomain (sig-9-48-55-147.mts.ibm.com [9.48.55.147]) by d03av04.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVin) with ESMTP id o1OJAEEC028959; Wed, 24 Feb 2010 12:10:14 -0700 From: Anthony Liguori To: qemu-devel@nongnu.org Date: Wed, 24 Feb 2010 13:10:12 -0600 Message-Id: <1267038612-21581-1-git-send-email-aliguori@us.ibm.com> X-Mailer: git-send-email 1.6.5.2 X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) Cc: Anthony Liguori Subject: [Qemu-devel] [PATCH] pc: madvise(MADV_DONTNEED) memory on reset X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org If you compare the RSS of a freshly booted guest and the same guest after a reboot, it's very likely the freshly booted guest will have an RSS that is much lower the the rebooted guest because the previous run of the guest faulted in all available memory. This patch addresses this issue by using madvise() during reset. It only resets RAM areas which means it has to be done in the machine. I've only done this for the x86 target because I'm fairly confident that this is allowed architecturally on x86 although I'm not sure this is universely true. This does not appear to have an observable cost with a large memory guest and I can't really think of any down sides. Reported-by: Karl Rister Signed-off-by: Anthony Liguori --- hw/pc.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 4f6a522..10446ba 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -45,6 +45,11 @@ #include "loader.h" #include "elf.h" #include "multiboot.h" +#include "kvm.h" + +#ifndef _WIN32 +#include +#endif /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -63,10 +68,19 @@ #define MAX_IDE_BUS 2 +#define MAX_MEMORY_ENTRIES 10 + +typedef struct MemoryEntry { + ram_addr_t addr; + ram_addr_t size; +} MemoryEntry; + static FDCtrl *floppy_controller; static RTCState *rtc_state; static PITState *pit; static PCII440FXState *i440fx_state; +static int num_memory_entries; +static MemoryEntry memory_entries[MAX_MEMORY_ENTRIES]; #define E820_NR_ENTRIES 16 @@ -782,6 +796,27 @@ static CPUState *pc_new_cpu(const char *cpu_model) return env; } +static void add_mem_entry(ram_addr_t addr, ram_addr_t size) +{ + memory_entries[num_memory_entries].addr = addr; + memory_entries[num_memory_entries].size = size; + num_memory_entries++; +} + +static void pc_reset_ram(void *opaque) +{ + int i; + + for (i = 0; i < num_memory_entries; i++) { +#ifndef _WIN32 + if (!kvm_enabled() || kvm_has_sync_mmu()) { + madvise(qemu_get_ram_ptr(memory_entries[i].addr), + memory_entries[i].size, MADV_DONTNEED); + } +#endif + } +} + /* PC hardware initialisation */ static void pc_init1(ram_addr_t ram_size, const char *boot_device, @@ -835,6 +870,7 @@ static void pc_init1(ram_addr_t ram_size, /* allocate RAM */ ram_addr = qemu_ram_alloc(0xa0000); cpu_register_physical_memory(0, 0xa0000, ram_addr); + add_mem_entry(ram_addr, 0xa0000); /* Allocate, even though we won't register, so we don't break the * phys_ram_base + PA assumption. This range includes vga (0xa0000 - 0xc0000), @@ -845,6 +881,7 @@ static void pc_init1(ram_addr_t ram_size, cpu_register_physical_memory(0x100000, below_4g_mem_size - 0x100000, ram_addr); + add_mem_entry(ram_addr, below_4g_mem_size - 0x100000); /* above 4giga memory allocation */ if (above_4g_mem_size > 0) { @@ -855,6 +892,7 @@ static void pc_init1(ram_addr_t ram_size, cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size, ram_addr); + add_mem_entry(ram_addr, above_4g_mem_size); #endif } @@ -1050,6 +1088,8 @@ static void pc_init1(ram_addr_t ram_size, pci_create_simple(pci_bus, -1, "lsi53c895a"); } } + + qemu_register_reset(pc_reset_ram, NULL); } static void pc_init_pci(ram_addr_t ram_size,