diff mbox

virtio: Add memory statistics reporting to the balloon driver

Message ID 1257784326.2835.16.camel@aglitke
State New
Headers show

Commit Message

Adam Litke Nov. 9, 2009, 4:32 p.m. UTC
When using ballooning to manage overcommitted memory on a host, a system for
guests to communicate their memory usage to the host can provide information
that will minimize the impact of ballooning on the guests.  The current method
employs a daemon running in each guest that communicates memory statistics to a
host daemon at a specified time interval.  The host daemon aggregates this
information and inflates and/or deflates balloons according to the level of
host memory pressure.  This approach is effective but overly complex since a
daemon must be installed inside each guest and coordinated to communicate with
the host.  A simpler approach is to collect memory statistics in the virtio
balloon driver and communicate them to the host via the device config space.

This patch enables the guest-side support by adding stats collection and
reporting to the virtio balloon driver.

Comments?

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: Avi Kivity <avi@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org

Comments

Rusty Russell Nov. 10, 2009, 2:42 a.m. UTC | #1
On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
> A simpler approach is to collect memory statistics in the virtio
> balloon driver and communicate them to the host via the device config space.

There are two issues I see with this.  First, there's an atomicity problem
since you can't tell when the stats are consistent.  Second, polling is
ugly.

A stats vq might solve this more cleanly?
Rusty.
Anthony Liguori Nov. 10, 2009, 2:36 p.m. UTC | #2
Rusty Russell wrote:
> On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
>   
>> A simpler approach is to collect memory statistics in the virtio
>> balloon driver and communicate them to the host via the device config space.
>>     
>
> There are two issues I see with this.  First, there's an atomicity problem
> since you can't tell when the stats are consistent.

Actually, config writes always require notification from the guest to 
the host.  This means the host knows when they config space is changed 
so atomicity isn't a problem.

In fact, if it were a problem, then the balloon driver would be 
fundamentally broken because target and actual are stored in the config 
space.

If you recall, we had this discussion originally wrt the balloon driver :-)

>   Second, polling is
> ugly.
>   

As opposed to?  The guest could set a timer and update the values 
periodically but that's even uglier because then the host cannot 
determine the update granularity.

> A stats vq might solve this more cleanly?
>   

actual and target are both really just stats.  Had we implemented those 
with a vq, I'd be inclined to agree with you but since they're 
implemented in the config space, it seems natural to extend the config 
space with other stats.

Regards,

Anthony Liguori
Avi Kivity Nov. 10, 2009, 2:43 p.m. UTC | #3
On 11/10/2009 04:36 PM, Anthony Liguori wrote:
>
>> A stats vq might solve this more cleanly?
>
> actual and target are both really just stats.  Had we implemented 
> those with a vq, I'd be inclined to agree with you but since they're 
> implemented in the config space, it seems natural to extend the config 
> space with other stats.
>

There is in fact a difference; actual and target are very rarely 
updated, while the stats are updated very often.  Using a vq means a 
constant number of exits per batch instead of one exit per statistic.  
If the vq is host-driven, it also allows the host to control the update 
frequency dynamically (i.e. stop polling when there is no memory pressure).
Anthony Liguori Nov. 10, 2009, 2:58 p.m. UTC | #4
Avi Kivity wrote:
> On 11/10/2009 04:36 PM, Anthony Liguori wrote:
>>
>>> A stats vq might solve this more cleanly?
>>
>> actual and target are both really just stats.  Had we implemented 
>> those with a vq, I'd be inclined to agree with you but since they're 
>> implemented in the config space, it seems natural to extend the 
>> config space with other stats.
>>
>
> There is in fact a difference; actual and target are very rarely 
> updated, while the stats are updated very often.  Using a vq means a 
> constant number of exits per batch instead of one exit per statistic.  
> If the vq is host-driven, it also allows the host to control the 
> update frequency dynamically (i.e. stop polling when there is no 
> memory pressure).

I'm not terribly opposed to using a vq for this.  I would expect the 
stat update interval to be rather long (10s probably) but a vq works 
just as well.
Anthony Liguori Nov. 10, 2009, 9:52 p.m. UTC | #5
Rusty Russell wrote:
> On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
>   
>> A simpler approach is to collect memory statistics in the virtio
>> balloon driver and communicate them to the host via the device config space.
>>     
>
> There are two issues I see with this.  First, there's an atomicity problem
> since you can't tell when the stats are consistent.  Second, polling is
> ugly.
>
> A stats vq might solve this more cleanly?
>   

This turns out to not work so nicely.  You really need bidirectional 
communication.  You need to request that stats be collected and then you 
need to tell the hypervisor about the stats that were collected.  You 
don't need any real correlation between requests and stat reports either.

This really models how target/actual work and I think it suggests that 
we want to reuse that mechanism for the stats too.

> Rusty.
>
Rusty Russell Nov. 10, 2009, 11:59 p.m. UTC | #6
On Wed, 11 Nov 2009 01:06:14 am Anthony Liguori wrote:
> Rusty Russell wrote:
> > On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
> >   
> >> A simpler approach is to collect memory statistics in the virtio
> >> balloon driver and communicate them to the host via the device config space.
> >>     
> >
> > There are two issues I see with this.  First, there's an atomicity problem
> > since you can't tell when the stats are consistent.
> 
> Actually, config writes always require notification from the guest to 
> the host.  This means the host knows when they config space is changed 
> so atomicity isn't a problem.

I think you missed my point: the stats are inter-related, so they should be
served together.

> In fact, if it were a problem, then the balloon driver would be 
> fundamentally broken because target and actual are stored in the config 
> space.

No, one is written by the host, the other the guest.  Still works.

> If you recall, we had this discussion originally wrt the balloon driver :-)

And I never did get around to the lguest implementation, which would have
seen if this really is an issue.

> >   Second, polling is ugly.
> 
> As opposed to?

As opposed to giving the stats whenever asked by the host.

> > A stats vq might solve this more cleanly?
> >   
> 
> actual and target are both really just stats.  Had we implemented those 
> with a vq, I'd be inclined to agree with you but since they're 
> implemented in the config space, it seems natural to extend the config 
> space with other stats.

It does, *if* we don't need accuracy.  Otherwise, it seems like we need
something else.

Cheers,
Rusty.
Rusty Russell Nov. 11, 2009, 12:02 a.m. UTC | #7
On Wed, 11 Nov 2009 08:22:42 am Anthony Liguori wrote:
> Rusty Russell wrote:
> > On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
> >   
> >> A simpler approach is to collect memory statistics in the virtio
> >> balloon driver and communicate them to the host via the device config space.
> >>     
> >
> > There are two issues I see with this.  First, there's an atomicity problem
> > since you can't tell when the stats are consistent.  Second, polling is
> > ugly.
> >
> > A stats vq might solve this more cleanly?
> >   
> 
> This turns out to not work so nicely.  You really need bidirectional 
> communication.  You need to request that stats be collected and then you 
> need to tell the hypervisor about the stats that were collected.  You 
> don't need any real correlation between requests and stat reports either.

You register an outbuf at initialization time.  The host hands it back when
it wants you to refill it with stats.

> This really models how target/actual work and I think it suggests that 
> we want to reuse that mechanism for the stats too.

Sure, I want to.  You want to.  It's simple.

But the universe is remarkably indifferent to what we want.  Is it actually
sufficient or are we going to regret our laziness?

Cheers,
Rusty.
Anthony Liguori Nov. 11, 2009, 12:07 a.m. UTC | #8
Rusty Russell wrote:
> On Wed, 11 Nov 2009 08:22:42 am Anthony Liguori wrote:
>   
>> Rusty Russell wrote:
>>     
>>> On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
>>>   
>>>       
>>>> A simpler approach is to collect memory statistics in the virtio
>>>> balloon driver and communicate them to the host via the device config space.
>>>>     
>>>>         
>>> There are two issues I see with this.  First, there's an atomicity problem
>>> since you can't tell when the stats are consistent.  Second, polling is
>>> ugly.
>>>
>>> A stats vq might solve this more cleanly?
>>>   
>>>       
>> This turns out to not work so nicely.  You really need bidirectional 
>> communication.  You need to request that stats be collected and then you 
>> need to tell the hypervisor about the stats that were collected.  You 
>> don't need any real correlation between requests and stat reports either.
>>     
>
> You register an outbuf at initialization time.  The host hands it back when
> it wants you to refill it with stats.
>   

That's strangely backwards.  Guest send a stat buffer that's filled out, 
host acks it when it wants another.  That doesn't seem bizarre to you?

>> This really models how target/actual work and I think it suggests that 
>> we want to reuse that mechanism for the stats too.
>>     
>
> Sure, I want to.  You want to.  It's simple.
>
> But the universe is remarkably indifferent to what we want.  Is it actually
> sufficient or are we going to regret our laziness?
>   

It's not laziness, it's consistency.  How is actual different than free 
memory or any other stat?

> Cheers,
> Rusty.
>
Rusty Russell Nov. 11, 2009, 2:43 a.m. UTC | #9
On Wed, 11 Nov 2009 10:37:56 am Anthony Liguori wrote:
> Rusty Russell wrote:
> > You register an outbuf at initialization time.  The host hands it back when
> > it wants you to refill it with stats.
> 
> That's strangely backwards.  Guest send a stat buffer that's filled out, 
> host acks it when it wants another.  That doesn't seem bizarre to you?

Yep!  But that's a limitation of our brains, not the infrastructure ;)

Think of the stats as an infinite stream of data.  Read from it at your
leisure.  This is how, for example, console output works.

> > But the universe is remarkably indifferent to what we want.  Is it actually
> > sufficient or are we going to regret our laziness?
> 
> It's not laziness, it's consistency.  How is actual different than free 
> memory or any other stat?

Because it's a COLLECTION of stats.  For example, swap in should be < swap
out.  Now, the current Linux implementation of all_vm_events() is non-atomic
anyway, so maybe we can just document this as best-effort.  I'm saying that
if it *is* a problem, I think we need a vq.

But it raises the question: what stats are generally useful cross-OS?  Should
we be supplying numbers like "unused" (free) "instantly discardable" (ie.
clean), "discardable to disk" (ie. file-backed), "discardable to swap"
(ie. swap-backed) and "unswappable" instead?

(I just made those up, of course, but it seems like that would give a fair
indication of real memory pressure in any OS).

Thanks,
Rusty.
Jamie Lokier Nov. 11, 2009, 9:24 a.m. UTC | #10
Anthony Liguori wrote:
> Avi Kivity wrote:
> >On 11/10/2009 04:36 PM, Anthony Liguori wrote:
> >>
> >>>A stats vq might solve this more cleanly?
> >>
> >>actual and target are both really just stats.  Had we implemented 
> >>those with a vq, I'd be inclined to agree with you but since they're 
> >>implemented in the config space, it seems natural to extend the 
> >>config space with other stats.
> >>
> >
> >There is in fact a difference; actual and target are very rarely 
> >updated, while the stats are updated very often.  Using a vq means a 
> >constant number of exits per batch instead of one exit per statistic.  
> >If the vq is host-driven, it also allows the host to control the 
> >update frequency dynamically (i.e. stop polling when there is no 
> >memory pressure).
> 
> I'm not terribly opposed to using a vq for this.  I would expect the 
> stat update interval to be rather long (10s probably) but a vq works 
> just as well.

If there's no memory pressure and no guest activity, you probably want
the stat update to be as rare as possible to avoid wakeups.  Save
power on laptops, that sort of thing.

If there's a host user interested in the state ("qemutop?"), you may
want updates more often than 10s.

-- Jamie
Daniel P. Berrangé Nov. 11, 2009, 10:12 a.m. UTC | #11
On Wed, Nov 11, 2009 at 09:24:09AM +0000, Jamie Lokier wrote:
> Anthony Liguori wrote:
> > Avi Kivity wrote:
> > >On 11/10/2009 04:36 PM, Anthony Liguori wrote:
> > >>
> > >>>A stats vq might solve this more cleanly?
> > >>
> > >>actual and target are both really just stats.  Had we implemented 
> > >>those with a vq, I'd be inclined to agree with you but since they're 
> > >>implemented in the config space, it seems natural to extend the 
> > >>config space with other stats.
> > >>
> > >
> > >There is in fact a difference; actual and target are very rarely 
> > >updated, while the stats are updated very often.  Using a vq means a 
> > >constant number of exits per batch instead of one exit per statistic.  
> > >If the vq is host-driven, it also allows the host to control the 
> > >update frequency dynamically (i.e. stop polling when there is no 
> > >memory pressure).
> > 
> > I'm not terribly opposed to using a vq for this.  I would expect the 
> > stat update interval to be rather long (10s probably) but a vq works 
> > just as well.
> 
> If there's no memory pressure and no guest activity, you probably want
> the stat update to be as rare as possible to avoid wakeups.  Save
> power on laptops, that sort of thing.
> 
> If there's a host user interested in the state ("qemutop?"), you may
> want updates more often than 10s.

This all suggests that we should only update the stats from the guest
when something on the host actually asks for them by issuing the QEMU
monitor command. We don't want any kind of continuous polling of stats
at any frequency, if nothing is using these stats on the host.

Daniel
Adam Litke Nov. 11, 2009, 1:26 p.m. UTC | #12
On Wed, 2009-11-11 at 10:12 +0000, Daniel P. Berrange wrote:
> This all suggests that we should only update the stats from the guest
> when something on the host actually asks for them by issuing the QEMU
> monitor command. We don't want any kind of continuous polling of stats
> at any frequency, if nothing is using these stats on the host.

Agreed.  The next version of the patch will remove the timer completely.
We'll wake up in response to config change notifications only.
Avi Kivity Nov. 11, 2009, 3 p.m. UTC | #13
On 11/11/2009 03:26 PM, Adam Litke wrote:
> On Wed, 2009-11-11 at 10:12 +0000, Daniel P. Berrange wrote:
>    
>> This all suggests that we should only update the stats from the guest
>> when something on the host actually asks for them by issuing the QEMU
>> monitor command. We don't want any kind of continuous polling of stats
>> at any frequency, if nothing is using these stats on the host.
>>      
> Agreed.  The next version of the patch will remove the timer completely.
> We'll wake up in response to config change notifications only.
>    

A vq with its own interrupt would be much nicer.
Adam Litke Nov. 11, 2009, 3:08 p.m. UTC | #14
On Wed, 2009-11-11 at 13:13 +1030, Rusty Russell wrote:
> > It's not laziness, it's consistency.  How is actual different than free 
> > memory or any other stat?
> 
> Because it's a COLLECTION of stats.  For example, swap in should be < swap
> out.  Now, the current Linux implementation of all_vm_events() is non-atomic
> anyway, so maybe we can just document this as best-effort.  I'm saying that
> if it *is* a problem, I think we need a vq.

I can't see why we would care about the atomicity of the collection of
statistics.  Best-effort is good enough.  Any variance within the stats
will be overshadowed by the latency of the host-side management daemon.

> But it raises the question: what stats are generally useful cross-OS?  Should
> we be supplying numbers like "unused" (free) "instantly discardable" (ie.
> clean), "discardable to disk" (ie. file-backed), "discardable to swap"
> (ie. swap-backed) and "unswappable" instead?

While I see the virtue in presenting abstracted memory stats that seem
more digestible in a virtualization context, I think we should keep the
raw stats.  This concentrates the complexity in the host-side management
daemon, and allows the host daemon to make better decisions (ie. by
reacting to trends in individual statistics).  Different OSes (or
different versions of the same OS), may also have different sets of
statistics that will provide the answers that a management daemon needs.
Rusty Russell Nov. 12, 2009, 2:29 a.m. UTC | #15
On Thu, 12 Nov 2009 01:38:34 am Adam Litke wrote:
> > But it raises the question: what stats are generally useful cross-OS?  Should
> > we be supplying numbers like "unused" (free) "instantly discardable" (ie.
> > clean), "discardable to disk" (ie. file-backed), "discardable to swap"
> > (ie. swap-backed) and "unswappable" instead?
> 
> While I see the virtue in presenting abstracted memory stats that seem
> more digestible in a virtualization context, I think we should keep the
> raw stats.  This concentrates the complexity in the host-side management
> daemon, and allows the host daemon to make better decisions (ie. by
> reacting to trends in individual statistics).  Different OSes (or
> different versions of the same OS), may also have different sets of
> statistics that will provide the answers that a management daemon needs.

OK, I see you made each one a separate feature bit, which does allow this
somewhat.  But you can't just change the meaning arbitrarily, all you can
do is refuse to supply some of them.  This is because virtio is an ABI,
but also it's plain sanity: run a new guest on an old host and get crazy
answers.

Two more questions:

I assume memtot should be equal to the initial memory granted to the guest
(perhaps reduced if the guest can't use all the memory for internal reasons)?

I'm not sure of the relevance to the host of the number of anonymous pages?
That's why I wondered if unswappable pages would be a better number to supply?

Thanks,
Rusty.
diff mbox

Patch

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 200c22f..0c9a9a1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -180,6 +180,41 @@  static void update_balloon_size(struct virtio_balloon *vb)
 			      &actual, sizeof(actual));
 }
 
+static inline void update_stat(struct virtio_device *vdev, int feature,
+				unsigned long value, unsigned offset)
+{
+	__le32 __v = cpu_to_le32(value);
+	if (virtio_has_feature(vdev, feature))
+		vdev->config->set(vdev, offset, &__v, sizeof(__v));
+}
+
+#define K(x) ((x) << (PAGE_SHIFT - 10))
+static void update_balloon_stats(struct virtio_balloon *vb)
+{
+	unsigned long events[NR_VM_EVENT_ITEMS];
+	struct sysinfo i;
+	unsigned off = offsetof(struct virtio_balloon_config, stats);
+
+	all_vm_events(events);
+
+	update_stat(vb->vdev, VIRTIO_BALLOON_F_RPT_SWAP_IN, events[PSWPIN],
+		    off + offsetof(struct virtio_balloon_stats, pswapin));
+	update_stat(vb->vdev, VIRTIO_BALLOON_F_RPT_SWAP_OUT, events[PSWPOUT],
+		    off + offsetof(struct virtio_balloon_stats, pswapout));
+	update_stat(vb->vdev, VIRTIO_BALLOON_F_RPT_MAJFLT, events[PGMAJFAULT],
+		    off + offsetof(struct virtio_balloon_stats, pgmajfault));
+	update_stat(vb->vdev, VIRTIO_BALLOON_F_RPT_MINFLT, events[PGFAULT],
+		    off + offsetof(struct virtio_balloon_stats, pgminfault));
+	update_stat(vb->vdev, VIRTIO_BALLOON_F_RPT_ANON,
+		    K(global_page_state(NR_ANON_PAGES)),
+		    off + offsetof(struct virtio_balloon_stats, panon));
+	si_meminfo(&i);
+	update_stat(vb->vdev, VIRTIO_BALLOON_F_RPT_MEMFREE, K(i.freeram),
+		    off + offsetof(struct virtio_balloon_stats, memfree));
+	update_stat(vb->vdev, VIRTIO_BALLOON_F_RPT_MEMTOT, K(i.totalram),
+		    off + offsetof(struct virtio_balloon_stats, memtot));
+}
+
 static int balloon(void *_vballoon)
 {
 	struct virtio_balloon *vb = _vballoon;
@@ -189,15 +224,17 @@  static int balloon(void *_vballoon)
 		s64 diff;
 
 		try_to_freeze();
-		wait_event_interruptible(vb->config_change,
+		wait_event_interruptible_timeout(vb->config_change,
 					 (diff = towards_target(vb)) != 0
 					 || kthread_should_stop()
-					 || freezing(current));
+					 || freezing(current),
+					 VIRTIO_BALLOON_TIMEOUT);
 		if (diff > 0)
 			fill_balloon(vb, diff);
 		else if (diff < 0)
 			leak_balloon(vb, -diff);
 		update_balloon_size(vb);
+		update_balloon_stats(vb);
 	}
 	return 0;
 }
@@ -265,7 +302,12 @@  static void virtballoon_remove(struct virtio_device *vdev)
 	kfree(vb);
 }
 
-static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST };
+static unsigned int features[] = {
+	VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_RPT_SWAP_IN,
+	VIRTIO_BALLOON_F_RPT_SWAP_OUT, VIRTIO_BALLOON_F_RPT_ANON,
+	VIRTIO_BALLOON_F_RPT_MAJFLT, VIRTIO_BALLOON_F_RPT_MINFLT,
+	VIRTIO_BALLOON_F_RPT_MEMFREE, VIRTIO_BALLOON_F_RPT_MEMTOT,
+};
 
 static struct virtio_driver virtio_balloon = {
 	.feature_table = features,
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index 09d7300..0bff4b8 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -6,15 +6,39 @@ 
 
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
+                                          /* Guest memory statistic reporting */
+#define VIRTIO_BALLOON_F_RPT_SWAP_IN  1   /* Number of pages swapped in */
+#define VIRTIO_BALLOON_F_RPT_SWAP_OUT 2   /* Number of pages swapped out */
+#define VIRTIO_BALLOON_F_RPT_ANON     3   /* Number of anonymous pages in use */
+#define VIRTIO_BALLOON_F_RPT_MAJFLT   4   /* Number of major faults */
+#define VIRTIO_BALLOON_F_RPT_MINFLT   5   /* Number of minor faults */
+#define VIRTIO_BALLOON_F_RPT_MEMFREE  6   /* Total amount of free memory */
+#define VIRTIO_BALLOON_F_RPT_MEMTOT   7   /* Total amount of memory */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+struct virtio_balloon_stats
+{
+	__le32 pswapin;      /* pages swapped in */
+	__le32 pswapout;     /* pages swapped out */
+	__le32 panon;        /* anonymous pages in use (in kb) */
+	__le32 pgmajfault;   /* Major page faults */
+	__le32 pgminfault;   /* Minor page faults */
+	__le32 memfree;      /* Total amount of free memory (in kb) */
+	__le32 memtot;       /* Total amount of memory (in kb) */
+};
+
 struct virtio_balloon_config
 {
 	/* Number of pages host wants Guest to give up. */
 	__le32 num_pages;
 	/* Number of pages we've actually got in balloon. */
 	__le32 actual;
+	/* Memory statistics */
+	struct virtio_balloon_stats stats;
 };
+
+#define VIRTIO_BALLOON_TIMEOUT (30 * HZ)
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */