Patchwork RFC: Partial workaround for buggy guest virtio-balloon driver

login
register
mail settings
Submitter David Gibson
Date Nov. 8, 2012, 4:45 a.m.
Message ID <20121108044522.GU23553@truffula.fritz.box>
Download mbox | patch
Permalink /patch/197754/
State New
Headers show

Comments

David Gibson - Nov. 8, 2012, 4:45 a.m.
Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766
"virtio_balloon: Fix endian bug" and
3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling
of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side)
handling of the virtio balloon.  In practice, these bugs only affected
powerpc guests, which is big-endian and frequently configured for 64k
base page size.  Attempting to use the balloon with the buggy guest
would usually result in an immediate guest crash.

The bugs are fixed now, of course and the balloon works fine with
kernels v3.4 and later, but unfortunately there are many distro
releases still in use which still have buggy kernels.

The nature of the page size bug makes it impossible to work around
from the host side (there simply isn't enough information supplied to
operate the balloon correctly).  However, it *is* possible with some
fiddling to safely detect the endian bug in the guest kernel, and
disable the balloon in this case.  The two fixes were applied to the
mainline kernel almost consecutively, so there are no released kernels
with one fix but not the other, meaning we can use the presence of the
endian bug as a proxy for the presence of the page size bug.

This patch implements such a test, working as follows.

For a fixed guest kernel:
  1. qemu sets a state variable to "TESTING" and the initial balloon
     target size to 16 (4k pages).
  2. When the guest balloon driver starts, it sees the target, finds
     either 16 unused 4k pages or 1 unused 64k page (depending on
     guest config) and submits the 16 resulting 4k pfns to the
     host. qemu, in TESTING state, ignores the submitted pages for
     now.
  4. The guest then updates the 'actual' field in the balloon config
     space to 16.  qemu sees this and determines that the guest is not
     buggy, it moves to CLEANUP state, and sets the target balloon
     size back to 0.
  5. The guest sees the target go back to 0, and reclaims its page(s)
     from the balloon.  qemu continues to ignore the page addresses
     for now in CLEANUP state.
  6. The guest updates the actual field to 0.  qemu now considers
     cleanup complete and moves to GOOD state.  The balloon now
     operates normally.

For a buggy kernel:
  1. qemu sets a state variable to "TESTING" and the initial balloon
     target size to 16 (4k pages).
  2. When the guest balloon driver starts it sees the non-zero target,
     and misinterprets it as 268435456 (endian bug).  It starts trying
     to find pages to free.
  3. The guest is probably newly booted, so it almost certainly finds
     256 pages easily and submits incorrect addresses for them (page
     size bug) to the host. qemu, in TESTING state ignores the (wrong)
     addresses for now.
  4. The guest updates the actual field in config space to 256.  qemu
     sees this, and determines that the guest is buggy.  It moves to
     BUGGY state and sets the balloon target back to zero.
  5. The guest, before attempting to find the next batch of pages to
     free, rechecks the target and discovers it is now zero.  It
     reclaims the pages by submitting more wrong addresses, which qemu
     ignores.
  6. The balloon is now disabled, if the user attempts to change the
     balloon size, qemu print an error message and otherwise ignore
     it.


I'm aware that this patch needs a bunch more comments (largely based
on the info above), but otherwise do people think this is a reasonable
approach.  It doesn't (and can't) fix the balloon for buggy kernels,
but it does make the failure mode a lot less ugly.

From dbc721f5e50a39ca3b40d81f060d8bb0e6312995 Mon Sep 17 00:00:00 2001
From: David Gibson <david@gibson.dropbear.id.au>
Date: Thu, 8 Nov 2012 14:49:38 +1100
Subject: [PATCH] Detection of buggy guest balloon

---
 hw/virtio-balloon.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 72 insertions(+), 5 deletions(-)
Anthony Liguori - Nov. 8, 2012, 1:11 p.m.
David Gibson <david@gibson.dropbear.id.au> writes:

> Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766
> "virtio_balloon: Fix endian bug" and
> 3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling
> of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side)
> handling of the virtio balloon.  In practice, these bugs only affected
> powerpc guests, which is big-endian and frequently configured for 64k
> base page size.  Attempting to use the balloon with the buggy guest
> would usually result in an immediate guest crash.

You should create a new feature VIRTIO_BALLOON_F_ENDIAN_SAFE,
advertise it in the host, and add a guest kernel patch to ack it in
newer kernels.

Older kernels won't ack this feature which gives you a safe way to to
disable the driver on a big endian host.

You won't get support for 3.4 kernels but it's much nicer to handle it
this way.

Regards,

Anthony Liguori

>
> The bugs are fixed now, of course and the balloon works fine with
> kernels v3.4 and later, but unfortunately there are many distro
> releases still in use which still have buggy kernels.
>
> The nature of the page size bug makes it impossible to work around
> from the host side (there simply isn't enough information supplied to
> operate the balloon correctly).  However, it *is* possible with some
> fiddling to safely detect the endian bug in the guest kernel, and
> disable the balloon in this case.  The two fixes were applied to the
> mainline kernel almost consecutively, so there are no released kernels
> with one fix but not the other, meaning we can use the presence of the
> endian bug as a proxy for the presence of the page size bug.
>
> This patch implements such a test, working as follows.
>
> For a fixed guest kernel:
>   1. qemu sets a state variable to "TESTING" and the initial balloon
>      target size to 16 (4k pages).
>   2. When the guest balloon driver starts, it sees the target, finds
>      either 16 unused 4k pages or 1 unused 64k page (depending on
>      guest config) and submits the 16 resulting 4k pfns to the
>      host. qemu, in TESTING state, ignores the submitted pages for
>      now.
>   4. The guest then updates the 'actual' field in the balloon config
>      space to 16.  qemu sees this and determines that the guest is not
>      buggy, it moves to CLEANUP state, and sets the target balloon
>      size back to 0.
>   5. The guest sees the target go back to 0, and reclaims its page(s)
>      from the balloon.  qemu continues to ignore the page addresses
>      for now in CLEANUP state.
>   6. The guest updates the actual field to 0.  qemu now considers
>      cleanup complete and moves to GOOD state.  The balloon now
>      operates normally.
>
> For a buggy kernel:
>   1. qemu sets a state variable to "TESTING" and the initial balloon
>      target size to 16 (4k pages).
>   2. When the guest balloon driver starts it sees the non-zero target,
>      and misinterprets it as 268435456 (endian bug).  It starts trying
>      to find pages to free.
>   3. The guest is probably newly booted, so it almost certainly finds
>      256 pages easily and submits incorrect addresses for them (page
>      size bug) to the host. qemu, in TESTING state ignores the (wrong)
>      addresses for now.
>   4. The guest updates the actual field in config space to 256.  qemu
>      sees this, and determines that the guest is buggy.  It moves to
>      BUGGY state and sets the balloon target back to zero.
>   5. The guest, before attempting to find the next batch of pages to
>      free, rechecks the target and discovers it is now zero.  It
>      reclaims the pages by submitting more wrong addresses, which qemu
>      ignores.
>   6. The balloon is now disabled, if the user attempts to change the
>      balloon size, qemu print an error message and otherwise ignore
>      it.
>
>
> I'm aware that this patch needs a bunch more comments (largely based
> on the info above), but otherwise do people think this is a reasonable
> approach.  It doesn't (and can't) fix the balloon for buggy kernels,
> but it does make the failure mode a lot less ugly.
>
> From dbc721f5e50a39ca3b40d81f060d8bb0e6312995 Mon Sep 17 00:00:00 2001
> From: David Gibson <david@gibson.dropbear.id.au>
> Date: Thu, 8 Nov 2012 14:49:38 +1100
> Subject: [PATCH] Detection of buggy guest balloon
>
> ---
>  hw/virtio-balloon.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 72 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> index dd1a650..6a9cd3f 100644
> --- a/hw/virtio-balloon.c
> +++ b/hw/virtio-balloon.c
> @@ -37,8 +37,16 @@ typedef struct VirtIOBalloon
>      VirtQueueElement stats_vq_elem;
>      size_t stats_vq_offset;
>      DeviceState *qdev;
> +
> +    int guest_bug_state;
> +#define GUEST_BUG_TESTING       0
> +#define GUEST_BUG_CLEANUP       1
> +#define GUEST_BUG_BUGGY         2
> +#define GUEST_BUG_GOOD          3
>  } VirtIOBalloon;
>  
> +#define GUEST_BUG_TARGET        16
> +
>  static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
>  {
>      return (VirtIOBalloon *)vdev;
> @@ -84,6 +92,11 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>              pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT;
>              offset += 4;
>  
> +            if (s->guest_bug_state != GUEST_BUG_GOOD) {
> +                /* Still bug testing, not ready to use balloon yet */
> +                continue;
> +            }
> +
>              /* FIXME: remove get_system_memory(), but how? */
>              section = memory_region_find(get_system_memory(), pa, 1);
>              if (!section.size || !memory_region_is_ram(section.mr))
> @@ -134,8 +147,23 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = to_virtio_balloon(vdev);
>      struct virtio_balloon_config config;
> +    uint32_t num_pages;
> +
> +    switch (dev->guest_bug_state) {
> +    case GUEST_BUG_TESTING:
> +        num_pages = GUEST_BUG_TARGET;
> +        break;
>  
> -    config.num_pages = cpu_to_le32(dev->num_pages);
> +    case GUEST_BUG_CLEANUP:
> +    case GUEST_BUG_BUGGY:
> +        num_pages = 0;
> +        break;
> +
> +    default:
> +        num_pages = dev->num_pages;
> +    }
> +
> +    config.num_pages = cpu_to_le32(num_pages);
>      config.actual = cpu_to_le32(dev->actual);
>  
>      memcpy(config_data, &config, 8);
> @@ -147,11 +175,41 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>      VirtIOBalloon *dev = to_virtio_balloon(vdev);
>      struct virtio_balloon_config config;
>      uint32_t oldactual = dev->actual;
> +
>      memcpy(&config, config_data, 8);
>      dev->actual = le32_to_cpu(config.actual);
> -    if (dev->actual != oldactual) {
> -        qemu_balloon_changed(ram_size -
> -                             (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> +
> +    switch (dev->guest_bug_state) {
> +    case GUEST_BUG_TESTING:
> +        if (dev->actual == 0) {
> +            /* Both buggy and non-buggy guests write a 0 before going
> +             * on to write a meaningful value */
> +            break;
> +        }
> +
> +        if (dev->actual > GUEST_BUG_TARGET) {
> +            fprintf(stderr, "virtio-balloon: Buggy guest detected, disabling balloon\n");
> +            dev->guest_bug_state = GUEST_BUG_BUGGY;
> +        } else {
> +            dev->guest_bug_state = GUEST_BUG_CLEANUP;
> +        }
> +        /* Changing bug state implicitly alters the config */
> +        virtio_notify_config(&dev->vdev);
> +        break;
> +
> +    case GUEST_BUG_CLEANUP:
> +        if (dev->actual == 0) {
> +            /* Cleanup completed, proceed with normal operation */
> +            dev->guest_bug_state = GUEST_BUG_GOOD;
> +            virtio_notify_config(&dev->vdev);
> +        }
> +        break;
> +
> +    default:
> +        if (dev->actual != oldactual) {
> +            qemu_balloon_changed(ram_size -
> +                                 (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> +        }
>      }
>  }
>  
> @@ -194,12 +252,21 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
>  {
>      VirtIOBalloon *dev = opaque;
>  
> +    if (dev->guest_bug_state == GUEST_BUG_BUGGY) {
> +        fprintf(stderr, "Guest is buggy, cannot use balloon\n");
> +    }
> +
>      if (target > ram_size) {
>          target = ram_size;
>      }
>      if (target) {
>          dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
> -        virtio_notify_config(&dev->vdev);
> +
> +        /* If we're still testing for guest bugs, delay the change
> +         * interrupt until we've finished that */
> +        if (dev->guest_bug_state == GUEST_BUG_GOOD) {
> +            virtio_notify_config(&dev->vdev);
> +        }
>      }
>  }
>  
> -- 
> 1.7.10.4
>
>
>
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson
David Gibson - Nov. 9, 2012, 12:57 a.m.
On Thu, Nov 08, 2012 at 07:11:13AM -0600, Anthony Liguori wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766
> > "virtio_balloon: Fix endian bug" and
> > 3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling
> > of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side)
> > handling of the virtio balloon.  In practice, these bugs only affected
> > powerpc guests, which is big-endian and frequently configured for 64k
> > base page size.  Attempting to use the balloon with the buggy guest
> > would usually result in an immediate guest crash.
> 
> You should create a new feature VIRTIO_BALLOON_F_ENDIAN_SAFE,
> advertise it in the host, and add a guest kernel patch to ack it in
> newer kernels.
> 
> Older kernels won't ack this feature which gives you a safe way to to
> disable the driver on a big endian host.

Well, yes, we should have done this at the time we made the bugfixes.
Unfortunately, we didn't, so now we are where we are.

> You won't get support for 3.4 kernels but it's much nicer to handle it
> this way.

Since 3.4 and 3.5 kernels will be around in distros for some time now,
that is not a trivial drawback with adding a feature now.

Patch

diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index dd1a650..6a9cd3f 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -37,8 +37,16 @@  typedef struct VirtIOBalloon
     VirtQueueElement stats_vq_elem;
     size_t stats_vq_offset;
     DeviceState *qdev;
+
+    int guest_bug_state;
+#define GUEST_BUG_TESTING       0
+#define GUEST_BUG_CLEANUP       1
+#define GUEST_BUG_BUGGY         2
+#define GUEST_BUG_GOOD          3
 } VirtIOBalloon;
 
+#define GUEST_BUG_TARGET        16
+
 static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
 {
     return (VirtIOBalloon *)vdev;
@@ -84,6 +92,11 @@  static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
             pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT;
             offset += 4;
 
+            if (s->guest_bug_state != GUEST_BUG_GOOD) {
+                /* Still bug testing, not ready to use balloon yet */
+                continue;
+            }
+
             /* FIXME: remove get_system_memory(), but how? */
             section = memory_region_find(get_system_memory(), pa, 1);
             if (!section.size || !memory_region_is_ram(section.mr))
@@ -134,8 +147,23 @@  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = to_virtio_balloon(vdev);
     struct virtio_balloon_config config;
+    uint32_t num_pages;
+
+    switch (dev->guest_bug_state) {
+    case GUEST_BUG_TESTING:
+        num_pages = GUEST_BUG_TARGET;
+        break;
 
-    config.num_pages = cpu_to_le32(dev->num_pages);
+    case GUEST_BUG_CLEANUP:
+    case GUEST_BUG_BUGGY:
+        num_pages = 0;
+        break;
+
+    default:
+        num_pages = dev->num_pages;
+    }
+
+    config.num_pages = cpu_to_le32(num_pages);
     config.actual = cpu_to_le32(dev->actual);
 
     memcpy(config_data, &config, 8);
@@ -147,11 +175,41 @@  static void virtio_balloon_set_config(VirtIODevice *vdev,
     VirtIOBalloon *dev = to_virtio_balloon(vdev);
     struct virtio_balloon_config config;
     uint32_t oldactual = dev->actual;
+
     memcpy(&config, config_data, 8);
     dev->actual = le32_to_cpu(config.actual);
-    if (dev->actual != oldactual) {
-        qemu_balloon_changed(ram_size -
-                             (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
+
+    switch (dev->guest_bug_state) {
+    case GUEST_BUG_TESTING:
+        if (dev->actual == 0) {
+            /* Both buggy and non-buggy guests write a 0 before going
+             * on to write a meaningful value */
+            break;
+        }
+
+        if (dev->actual > GUEST_BUG_TARGET) {
+            fprintf(stderr, "virtio-balloon: Buggy guest detected, disabling balloon\n");
+            dev->guest_bug_state = GUEST_BUG_BUGGY;
+        } else {
+            dev->guest_bug_state = GUEST_BUG_CLEANUP;
+        }
+        /* Changing bug state implicitly alters the config */
+        virtio_notify_config(&dev->vdev);
+        break;
+
+    case GUEST_BUG_CLEANUP:
+        if (dev->actual == 0) {
+            /* Cleanup completed, proceed with normal operation */
+            dev->guest_bug_state = GUEST_BUG_GOOD;
+            virtio_notify_config(&dev->vdev);
+        }
+        break;
+
+    default:
+        if (dev->actual != oldactual) {
+            qemu_balloon_changed(ram_size -
+                                 (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
+        }
     }
 }
 
@@ -194,12 +252,21 @@  static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
 {
     VirtIOBalloon *dev = opaque;
 
+    if (dev->guest_bug_state == GUEST_BUG_BUGGY) {
+        fprintf(stderr, "Guest is buggy, cannot use balloon\n");
+    }
+
     if (target > ram_size) {
         target = ram_size;
     }
     if (target) {
         dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
-        virtio_notify_config(&dev->vdev);
+
+        /* If we're still testing for guest bugs, delay the change
+         * interrupt until we've finished that */
+        if (dev->guest_bug_state == GUEST_BUG_GOOD) {
+            virtio_notify_config(&dev->vdev);
+        }
     }
 }