Patchwork [RFC] virtio: Report new guest memory statistics pertinent to memory ballooning (V2)

login
register
mail settings
Submitter Adam Litke
Date Nov. 9, 2009, 4:07 p.m.
Message ID <1257782838.2835.5.camel@aglitke>
Download mbox | patch
Permalink /patch/37974/
State New
Headers show

Comments

Adam Litke - Nov. 9, 2009, 4:07 p.m.
Changes since V1:
- In the monitor, print all stats on one line with less abbreviated names
- Coding style changes

When using ballooning to manage overcommitted memory on a host, a system for
guests to communicate their memory usage to the host can provide information
that will minimize the impact of ballooning on the guests.  The current method
employs a daemon running in each guest that communicates memory statistics to a
host daemon at a specified time interval.  The host daemon aggregates this
information and inflates and/or deflates balloons according to the level of
host memory pressure.  This approach is effective but overly complex since a
daemon must be installed inside each guest and coordinated to communicate with
the host.  A simpler approach is to collect memory statistics in the virtio
balloon driver and communicate them to the host via the device config space.

This patch implements the qemu side of the communication channel.  I will post
the kernel driver modifications in-reply to this message.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Jamie Lokier - Nov. 9, 2009, 7 p.m.
Adam Litke wrote:
> +        s->stats.pswapin = has_feature(dev, VIRTIO_BALLOON_F_RPT_SWAP_OUT) ?
> +                                       dev->stats.pswapin : -1;

(etc.)

Why not simply have the guest fill in the unused fields with -1, and
say that's how "no meaningful value" is represented in the ABI?

All guests have to know about all those fields anyway, for the
structure layout.  Is there any benefit to specifying feature bits in
advance over simply storing -1 there?

-- Jamie
Jamie Lokier - Nov. 9, 2009, 7:01 p.m.
Adam Litke wrote:
> +static inline void print_stat(Monitor *mon, uint32_t val, const char *label)
> +{
> +    if (val != -1) {
> +        monitor_printf(mon, ",%s=%u", label, val);
> +    }
> +}
> +
>  static void do_info_balloon(Monitor *mon)
>  {
> -    ram_addr_t actual;
> +    QEMUBalloonState s;
> +    int ret;
>  
> -    actual = qemu_balloon_status();
> -    if (kvm_enabled() && !kvm_has_sync_mmu())
> +    ret = qemu_balloon_status(&s);
> +    if (kvm_enabled() && !kvm_has_sync_mmu()) {
>          monitor_printf(mon, "Using KVM without synchronous MMU, "
>                         "ballooning disabled\n");
> -    else if (actual == 0)
> +    } else if (ret < 0) {
>          monitor_printf(mon, "Ballooning not activated in VM\n");
> -    else
> -        monitor_printf(mon, "balloon: actual=%d\n", (int)(actual >> 20));
> +    } else {
> +        monitor_printf(mon, "balloon: actual=%d", (int)(s.actual >> 20));
> +        print_stat(mon, s.stats.pswapin, "pages_swapped_in");
> +        print_stat(mon, s.stats.pswapout, "pages_swapped_out");
> +        print_stat(mon, s.stats.panon, "anon_pages");
> +        print_stat(mon, s.stats.pgmajfault, "major_page_faults");
> +        print_stat(mon, s.stats.pgminfault, "minor_page_faults");
> +        print_stat(mon, s.stats.memfree, "free_memory");
> +        print_stat(mon, s.stats.memtot, "total_memory");
> +        monitor_printf(mon, "\n");
> +    }
> +        
>  }

These days, would it make more sense to emit a QJSON object?

In this case the JSON object is quite human readable too.

-- Jamie
Adam Litke - Nov. 9, 2009, 7:16 p.m.
On Mon, 2009-11-09 at 19:00 +0000, Jamie Lokier wrote:
> Adam Litke wrote:
> > +        s->stats.pswapin = has_feature(dev, VIRTIO_BALLOON_F_RPT_SWAP_OUT) ?
> > +                                       dev->stats.pswapin : -1;
> 
> (etc.)
> 
> Why not simply have the guest fill in the unused fields with -1, and
> say that's how "no meaningful value" is represented in the ABI?
> 
> All guests have to know about all those fields anyway, for the
> structure layout.  Is there any benefit to specifying feature bits in
> advance over simply storing -1 there?

This holds true for fields that exist but might not be supported by a
guest, but what about the case where we add a new field?  In that case,
qemu and the guest may disagree on the structure layout.
Adam Litke - Nov. 9, 2009, 7:23 p.m.
On Mon, 2009-11-09 at 19:01 +0000, Jamie Lokier wrote:
> These days, would it make more sense to emit a QJSON object?
> 
> In this case the JSON object is quite human readable too.

I'm not very particular on the output format since it's main consumer is
likely another program.  Is XML output via the monitor a new precedent
in qemu?  I was unable to find any other monitor command that emits
XML.
Anthony Liguori - Nov. 9, 2009, 9:15 p.m.
Jamie Lokier wrote:
> Adam Litke wrote:
>   
>> +        s->stats.pswapin = has_feature(dev, VIRTIO_BALLOON_F_RPT_SWAP_OUT) ?
>> +                                       dev->stats.pswapin : -1;
>>     
>
> (etc.)
>
> Why not simply have the guest fill in the unused fields with -1, and
> say that's how "no meaningful value" is represented in the ABI?
>
> All guests have to know about all those fields anyway, for the
> structure layout.  Is there any benefit to specifying feature bits in
> advance over simply storing -1 there?
>   

Features are negotiated.  It lets a host advertise the support of a 
feature and it lets the guest acknowledge it's support of a feature.

Most importantly, why invent a new mechanism when we already have one?
Jamie Lokier - Nov. 10, 2009, 1:23 p.m.
Anthony Liguori wrote:
> Jamie Lokier wrote:
> >Adam Litke wrote:
> >  
> >>+        s->stats.pswapin = has_feature(dev, 
> >>VIRTIO_BALLOON_F_RPT_SWAP_OUT) ?
> >>+                                       dev->stats.pswapin : -1;
> >>    
> >
> >(etc.)
> >
> >Why not simply have the guest fill in the unused fields with -1, and
> >say that's how "no meaningful value" is represented in the ABI?
> >
> >All guests have to know about all those fields anyway, for the
> >structure layout.  Is there any benefit to specifying feature bits in
> >advance over simply storing -1 there?
> >  
> 
> Features are negotiated.  It lets a host advertise the support of a 
> feature and it lets the guest acknowledge it's support of a feature.
> 
> Most importantly, why invent a new mechanism when we already have one?

In this case, the guest already has to have those fields in the
structure.  It's not like something where the guest doesn't know about
a feature at all, and then a later driver adds new capabilities.  That
would require feature bits or an ABI version, I agree.

But the real motivation for my comment was seeing the bulk of the
patch being definitions and tests for lots of feature bits which do
something trivial that the guest can easily do.

The feature bits don't seem to simplify anything in this case.  As for
preparing to be consistent with future additions, presumably if
another 20 fields are added, there won't be sufficient bits in the
feature word anyway, so an "extended feature bits" feature will be
needed.

-- Jamie

Patch

diff --git a/balloon.h b/balloon.h
index 60b4a5d..4008d1b 100644
--- a/balloon.h
+++ b/balloon.h
@@ -14,14 +14,21 @@ 
 #ifndef _QEMU_BALLOON_H
 #define _QEMU_BALLOON_H
 
+#include "hw/virtio-balloon.h"
 #include "cpu-defs.h"
 
-typedef ram_addr_t (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
+typedef struct QEMUBalloonState {
+    ram_addr_t actual;
+    virtio_balloon_stats stats;
+} QEMUBalloonState;
+
+typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target,
+                                      QEMUBalloonState *state);
 
 void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque);
 
 void qemu_balloon(ram_addr_t target);
 
-ram_addr_t qemu_balloon_status(void);
+int qemu_balloon_status(QEMUBalloonState *s);
 
 #endif
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index cfd3b41..3d74e7f 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -30,6 +30,7 @@  typedef struct VirtIOBalloon
     VirtQueue *ivq, *dvq;
     uint32_t num_pages;
     uint32_t actual;
+    virtio_balloon_stats stats;
 } VirtIOBalloon;
 
 static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
@@ -111,8 +112,15 @@  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
-
-    memcpy(config_data, &config, 8);
+    config.stats.pswapin = cpu_to_le32(dev->stats.pswapin);
+    config.stats.pswapout = cpu_to_le32(dev->stats.pswapout);
+    config.stats.panon = cpu_to_le32(dev->stats.panon);
+    config.stats.pgmajfault = cpu_to_le32(dev->stats.pgmajfault);
+    config.stats.pgminfault = cpu_to_le32(dev->stats.pgminfault);
+    config.stats.memfree = cpu_to_le32(dev->stats.memfree);
+    config.stats.memtot = cpu_to_le32(dev->stats.memtot);
+
+    memcpy(config_data, &config, sizeof(config));
 }
 
 static void virtio_balloon_set_config(VirtIODevice *vdev,
@@ -120,16 +128,39 @@  static void virtio_balloon_set_config(VirtIODevice *vdev,
 {
     VirtIOBalloon *dev = to_virtio_balloon(vdev);
     struct virtio_balloon_config config;
-    memcpy(&config, config_data, 8);
+    memcpy(&config, config_data, sizeof(config));
     dev->actual = config.actual;
+    dev->stats.pswapin = config.stats.pswapin;
+    dev->stats.pswapout = config.stats.pswapout;
+    dev->stats.panon = config.stats.panon;
+    dev->stats.pgmajfault = config.stats.pgmajfault;
+    dev->stats.pgminfault = config.stats.pgminfault;
+    dev->stats.memfree = config.stats.memfree;
+    dev->stats.memtot = config.stats.memtot;
 }
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev)
 {
-    return 0;
+    uint32_t features = 0;
+
+    features |= (1 << VIRTIO_BALLOON_F_RPT_SWAP_IN)  |
+                (1 << VIRTIO_BALLOON_F_RPT_SWAP_OUT) |
+                (1 << VIRTIO_BALLOON_F_RPT_ANON)     |
+                (1 << VIRTIO_BALLOON_F_RPT_MAJFLT)   |
+                (1 << VIRTIO_BALLOON_F_RPT_MINFLT)   |
+                (1 << VIRTIO_BALLOON_F_RPT_MEMFREE)  |
+                (1 << VIRTIO_BALLOON_F_RPT_MEMTOT);
+		
+    return features;
+}
+
+static inline int has_feature(VirtIOBalloon *dev, int feature)
+{
+	return (dev->vdev.features & 1<<feature);
 }
 
-static ram_addr_t virtio_balloon_to_target(void *opaque, ram_addr_t target)
+static void virtio_balloon_to_target(void *opaque, ram_addr_t target,
+                                     QEMUBalloonState *s)
 {
     VirtIOBalloon *dev = opaque;
 
@@ -141,7 +172,23 @@  static ram_addr_t virtio_balloon_to_target(void *opaque, ram_addr_t target)
         virtio_notify_config(&dev->vdev);
     }
 
-    return ram_size - (dev->actual << VIRTIO_BALLOON_PFN_SHIFT);
+    if (s) {
+        s->actual = ram_size - (dev->actual << VIRTIO_BALLOON_PFN_SHIFT);
+        s->stats.pswapin = has_feature(dev, VIRTIO_BALLOON_F_RPT_SWAP_OUT) ?
+                                       dev->stats.pswapin : -1;
+        s->stats.pswapout = has_feature(dev, VIRTIO_BALLOON_F_RPT_SWAP_OUT) ?
+                                        dev->stats.pswapout : -1;  
+        s->stats.panon = has_feature(dev, VIRTIO_BALLOON_F_RPT_ANON) ?
+                                     dev->stats.panon : -1;
+        s->stats.pgmajfault = has_feature(dev, VIRTIO_BALLOON_F_RPT_MAJFLT) ?
+                                          dev->stats.pgmajfault : -1;
+        s->stats.pgminfault = has_feature(dev, VIRTIO_BALLOON_F_RPT_MINFLT) ?
+                                          dev->stats.pgminfault : -1;
+        s->stats.memfree = has_feature(dev, VIRTIO_BALLOON_F_RPT_MEMFREE) ?
+                                       dev->stats.memfree : -1;
+        s->stats.memtot = has_feature(dev, VIRTIO_BALLOON_F_RPT_MEMTOT) ?
+                                      dev->stats.memtot : -1;
+    }
 }
 
 static void virtio_balloon_save(QEMUFile *f, void *opaque)
@@ -174,8 +221,9 @@  VirtIODevice *virtio_balloon_init(DeviceState *dev)
     VirtIOBalloon *s;
 
     s = (VirtIOBalloon *)virtio_common_init("virtio-balloon",
-                                            VIRTIO_ID_BALLOON,
-                                            8, sizeof(VirtIOBalloon));
+                             VIRTIO_ID_BALLOON,
+                             sizeof(struct virtio_balloon_config),
+                             sizeof(VirtIOBalloon));
 
     s->vdev.get_config = virtio_balloon_get_config;
     s->vdev.set_config = virtio_balloon_set_config;
diff --git a/hw/virtio-balloon.h b/hw/virtio-balloon.h
index 9a0d119..7e635b7 100644
--- a/hw/virtio-balloon.h
+++ b/hw/virtio-balloon.h
@@ -25,16 +25,37 @@ 
 
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
+                                        /* Guest memory statistic reporting */
+#define VIRTIO_BALLOON_F_RPT_SWAP_IN  1   /* Number of pages swapped in */
+#define VIRTIO_BALLOON_F_RPT_SWAP_OUT 2   /* Number of pages swapped out */
+#define VIRTIO_BALLOON_F_RPT_ANON     3   /* Number of anonymous pages in use */
+#define VIRTIO_BALLOON_F_RPT_MAJFLT   4   /* Number of major faults */
+#define VIRTIO_BALLOON_F_RPT_MINFLT   5   /* Number of minor faults */
+#define VIRTIO_BALLOON_F_RPT_MEMFREE  6   /* Total amount of free memory */
+#define VIRTIO_BALLOON_F_RPT_MEMTOT   7   /* Total amount of memory */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+typedef struct virtio_balloon_stats
+{
+    uint32_t pswapin;     /* pages swapped in */
+    uint32_t pswapout;    /* pages swapped out */
+    uint32_t panon;       /* anonymous pages in use */
+    uint32_t pgmajfault;  /* Major page faults */
+    uint32_t pgminfault;  /* Minor page faults */
+    uint32_t memfree;     /* Total amount of free memory (in kb) */
+    uint32_t memtot;      /* Total amount of memory (in kb) */
+} virtio_balloon_stats;
+
 struct virtio_balloon_config
 {
     /* Number of pages host wants Guest to give up. */
     uint32_t num_pages;
     /* Number of pages we've actually got in balloon. */
     uint32_t actual;
+    /* Memory statistics */
+    virtio_balloon_stats stats;
 };
 
 #endif
diff --git a/monitor.c b/monitor.c
index ed1ce6e..b071ee5 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1581,18 +1581,36 @@  static void do_balloon(Monitor *mon, int value)
     qemu_balloon(target << 20);
 }
 
+static inline void print_stat(Monitor *mon, uint32_t val, const char *label)
+{
+    if (val != -1) {
+        monitor_printf(mon, ",%s=%u", label, val);
+    }
+}
+
 static void do_info_balloon(Monitor *mon)
 {
-    ram_addr_t actual;
+    QEMUBalloonState s;
+    int ret;
 
-    actual = qemu_balloon_status();
-    if (kvm_enabled() && !kvm_has_sync_mmu())
+    ret = qemu_balloon_status(&s);
+    if (kvm_enabled() && !kvm_has_sync_mmu()) {
         monitor_printf(mon, "Using KVM without synchronous MMU, "
                        "ballooning disabled\n");
-    else if (actual == 0)
+    } else if (ret < 0) {
         monitor_printf(mon, "Ballooning not activated in VM\n");
-    else
-        monitor_printf(mon, "balloon: actual=%d\n", (int)(actual >> 20));
+    } else {
+        monitor_printf(mon, "balloon: actual=%d", (int)(s.actual >> 20));
+        print_stat(mon, s.stats.pswapin, "pages_swapped_in");
+        print_stat(mon, s.stats.pswapout, "pages_swapped_out");
+        print_stat(mon, s.stats.panon, "anon_pages");
+        print_stat(mon, s.stats.pgmajfault, "major_page_faults");
+        print_stat(mon, s.stats.pgminfault, "minor_page_faults");
+        print_stat(mon, s.stats.memfree, "free_memory");
+        print_stat(mon, s.stats.memtot, "total_memory");
+        monitor_printf(mon, "\n");
+    }
+        
 }
 
 static qemu_acl *find_acl(Monitor *mon, const char *name)
diff --git a/vl.c b/vl.c
index 710d52e..de3b4ae 100644
--- a/vl.c
+++ b/vl.c
@@ -334,13 +334,13 @@  void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque)
 void qemu_balloon(ram_addr_t target)
 {
     if (qemu_balloon_event)
-        qemu_balloon_event(qemu_balloon_event_opaque, target);
+        qemu_balloon_event(qemu_balloon_event_opaque, target, NULL);
 }
 
-ram_addr_t qemu_balloon_status(void)
+int qemu_balloon_status(struct QEMUBalloonState *s)
 {
     if (qemu_balloon_event)
-        return qemu_balloon_event(qemu_balloon_event_opaque, 0);
+        qemu_balloon_event(qemu_balloon_event_opaque, 0, s);
     return 0;
 }