From patchwork Thu Nov 8 04:45:22 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Gibson X-Patchwork-Id: 197754 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 520872C0106 for ; Thu, 8 Nov 2012 15:44:08 +1100 (EST) Received: from localhost ([::1]:36995 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TWJyI-0003q5-Ap for incoming@patchwork.ozlabs.org; Wed, 07 Nov 2012 23:44:06 -0500 Received: from eggs.gnu.org ([208.118.235.92]:42552) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TWJyA-0003pq-8H for qemu-devel@nongnu.org; Wed, 07 Nov 2012 23:43:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TWJy8-0006sS-Az for qemu-devel@nongnu.org; Wed, 07 Nov 2012 23:43:58 -0500 Received: from ozlabs.org ([203.10.76.45]:54465) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TWJy7-0006ri-Od for qemu-devel@nongnu.org; Wed, 07 Nov 2012 23:43:56 -0500 Received: by ozlabs.org (Postfix, from userid 1007) id A29D32C009F; Thu, 8 Nov 2012 15:43:49 +1100 (EST) Date: Thu, 8 Nov 2012 15:45:22 +1100 From: David Gibson To: qemu-devel@nongnu.org Message-ID: <20121108044522.GU23553@truffula.fritz.box> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 203.10.76.45 Subject: [Qemu-devel] RFC: Partial workaround for buggy guest virtio-balloon driver X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766 "virtio_balloon: Fix endian bug" and 3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side) handling of the virtio balloon. In practice, these bugs only affected powerpc guests, which is big-endian and frequently configured for 64k base page size. Attempting to use the balloon with the buggy guest would usually result in an immediate guest crash. The bugs are fixed now, of course and the balloon works fine with kernels v3.4 and later, but unfortunately there are many distro releases still in use which still have buggy kernels. The nature of the page size bug makes it impossible to work around from the host side (there simply isn't enough information supplied to operate the balloon correctly). However, it *is* possible with some fiddling to safely detect the endian bug in the guest kernel, and disable the balloon in this case. The two fixes were applied to the mainline kernel almost consecutively, so there are no released kernels with one fix but not the other, meaning we can use the presence of the endian bug as a proxy for the presence of the page size bug. This patch implements such a test, working as follows. For a fixed guest kernel: 1. qemu sets a state variable to "TESTING" and the initial balloon target size to 16 (4k pages). 2. When the guest balloon driver starts, it sees the target, finds either 16 unused 4k pages or 1 unused 64k page (depending on guest config) and submits the 16 resulting 4k pfns to the host. qemu, in TESTING state, ignores the submitted pages for now. 4. The guest then updates the 'actual' field in the balloon config space to 16. qemu sees this and determines that the guest is not buggy, it moves to CLEANUP state, and sets the target balloon size back to 0. 5. The guest sees the target go back to 0, and reclaims its page(s) from the balloon. qemu continues to ignore the page addresses for now in CLEANUP state. 6. The guest updates the actual field to 0. qemu now considers cleanup complete and moves to GOOD state. The balloon now operates normally. For a buggy kernel: 1. qemu sets a state variable to "TESTING" and the initial balloon target size to 16 (4k pages). 2. When the guest balloon driver starts it sees the non-zero target, and misinterprets it as 268435456 (endian bug). It starts trying to find pages to free. 3. The guest is probably newly booted, so it almost certainly finds 256 pages easily and submits incorrect addresses for them (page size bug) to the host. qemu, in TESTING state ignores the (wrong) addresses for now. 4. The guest updates the actual field in config space to 256. qemu sees this, and determines that the guest is buggy. It moves to BUGGY state and sets the balloon target back to zero. 5. The guest, before attempting to find the next batch of pages to free, rechecks the target and discovers it is now zero. It reclaims the pages by submitting more wrong addresses, which qemu ignores. 6. The balloon is now disabled, if the user attempts to change the balloon size, qemu print an error message and otherwise ignore it. I'm aware that this patch needs a bunch more comments (largely based on the info above), but otherwise do people think this is a reasonable approach. It doesn't (and can't) fix the balloon for buggy kernels, but it does make the failure mode a lot less ugly. From dbc721f5e50a39ca3b40d81f060d8bb0e6312995 Mon Sep 17 00:00:00 2001 From: David Gibson Date: Thu, 8 Nov 2012 14:49:38 +1100 Subject: [PATCH] Detection of buggy guest balloon --- hw/virtio-balloon.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 72 insertions(+), 5 deletions(-) diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c index dd1a650..6a9cd3f 100644 --- a/hw/virtio-balloon.c +++ b/hw/virtio-balloon.c @@ -37,8 +37,16 @@ typedef struct VirtIOBalloon VirtQueueElement stats_vq_elem; size_t stats_vq_offset; DeviceState *qdev; + + int guest_bug_state; +#define GUEST_BUG_TESTING 0 +#define GUEST_BUG_CLEANUP 1 +#define GUEST_BUG_BUGGY 2 +#define GUEST_BUG_GOOD 3 } VirtIOBalloon; +#define GUEST_BUG_TARGET 16 + static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev) { return (VirtIOBalloon *)vdev; @@ -84,6 +92,11 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT; offset += 4; + if (s->guest_bug_state != GUEST_BUG_GOOD) { + /* Still bug testing, not ready to use balloon yet */ + continue; + } + /* FIXME: remove get_system_memory(), but how? */ section = memory_region_find(get_system_memory(), pa, 1); if (!section.size || !memory_region_is_ram(section.mr)) @@ -134,8 +147,23 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data) { VirtIOBalloon *dev = to_virtio_balloon(vdev); struct virtio_balloon_config config; + uint32_t num_pages; + + switch (dev->guest_bug_state) { + case GUEST_BUG_TESTING: + num_pages = GUEST_BUG_TARGET; + break; - config.num_pages = cpu_to_le32(dev->num_pages); + case GUEST_BUG_CLEANUP: + case GUEST_BUG_BUGGY: + num_pages = 0; + break; + + default: + num_pages = dev->num_pages; + } + + config.num_pages = cpu_to_le32(num_pages); config.actual = cpu_to_le32(dev->actual); memcpy(config_data, &config, 8); @@ -147,11 +175,41 @@ static void virtio_balloon_set_config(VirtIODevice *vdev, VirtIOBalloon *dev = to_virtio_balloon(vdev); struct virtio_balloon_config config; uint32_t oldactual = dev->actual; + memcpy(&config, config_data, 8); dev->actual = le32_to_cpu(config.actual); - if (dev->actual != oldactual) { - qemu_balloon_changed(ram_size - - (dev->actual << VIRTIO_BALLOON_PFN_SHIFT)); + + switch (dev->guest_bug_state) { + case GUEST_BUG_TESTING: + if (dev->actual == 0) { + /* Both buggy and non-buggy guests write a 0 before going + * on to write a meaningful value */ + break; + } + + if (dev->actual > GUEST_BUG_TARGET) { + fprintf(stderr, "virtio-balloon: Buggy guest detected, disabling balloon\n"); + dev->guest_bug_state = GUEST_BUG_BUGGY; + } else { + dev->guest_bug_state = GUEST_BUG_CLEANUP; + } + /* Changing bug state implicitly alters the config */ + virtio_notify_config(&dev->vdev); + break; + + case GUEST_BUG_CLEANUP: + if (dev->actual == 0) { + /* Cleanup completed, proceed with normal operation */ + dev->guest_bug_state = GUEST_BUG_GOOD; + virtio_notify_config(&dev->vdev); + } + break; + + default: + if (dev->actual != oldactual) { + qemu_balloon_changed(ram_size - + (dev->actual << VIRTIO_BALLOON_PFN_SHIFT)); + } } } @@ -194,12 +252,21 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target) { VirtIOBalloon *dev = opaque; + if (dev->guest_bug_state == GUEST_BUG_BUGGY) { + fprintf(stderr, "Guest is buggy, cannot use balloon\n"); + } + if (target > ram_size) { target = ram_size; } if (target) { dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT; - virtio_notify_config(&dev->vdev); + + /* If we're still testing for guest bugs, delay the change + * interrupt until we've finished that */ + if (dev->guest_bug_state == GUEST_BUG_GOOD) { + virtio_notify_config(&dev->vdev); + } } }