diff mbox

balloon: Fix failure of updating guest memory status

Message ID 1467271903-23812-1-git-send-email-liang.z.li@intel.com
State New
Headers show

Commit Message

Li, Liang Z June 30, 2016, 7:31 a.m. UTC
After live migration, 'guest-stats' can't get the expected memory
status in the guest. This issue is caused by commit 4eae2a657d.
The value of 's->stats_vq_elem' will be NULL after live migration,
and the check in the function 'balloon_stats_poll_cb()' will
prevent the 'virtio_notify()' from executing. So guest will not
update the memory status.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio-balloon.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Ladi Prosek June 30, 2016, 8:11 a.m. UTC | #1
On Thu, Jun 30, 2016 at 9:31 AM, Liang Li <liang.z.li@intel.com> wrote:
> After live migration, 'guest-stats' can't get the expected memory
> status in the guest. This issue is caused by commit 4eae2a657d.
> The value of 's->stats_vq_elem' will be NULL after live migration,
> and the check in the function 'balloon_stats_poll_cb()' will
> prevent the 'virtio_notify()' from executing. So guest will not
> update the memory status.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Ladi Prosek <lprosek@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio-balloon.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 557d3f9..cc6947f 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -98,13 +98,19 @@ static void balloon_stats_poll_cb(void *opaque)
>  {
>      VirtIOBalloon *s = opaque;
>      VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +    VirtQueueElement elem = {0};
>
> -    if (s->stats_vq_elem == NULL || !balloon_stats_supported(s)) {
> +    if (!balloon_stats_supported(s)) {
>          /* re-schedule */
>          balloon_stats_change_timer(s, s->stats_poll_interval);
>          return;
>      }
>
> +    if (s->stats_vq_elem == NULL) {
> +        virtqueue_push(s->svq, &elem, 0);
> +        virtio_notify(vdev, s->svq);
> +        return;
> +    }

What if we are not in the just-after-live-migration situation though?
If the guest simply didn't add a buffer to the queue for some reason,
wouldn't this newly added push/notify break the balloon protocol?

Thanks!
Ladi

>      virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
>      virtio_notify(vdev, s->svq);
>      g_free(s->stats_vq_elem);
> --
> 1.9.1
>
Li, Liang Z June 30, 2016, 8:39 a.m. UTC | #2
> On Thu, Jun 30, 2016 at 9:31 AM, Liang Li <liang.z.li@intel.com> wrote:

> > After live migration, 'guest-stats' can't get the expected memory

> > status in the guest. This issue is caused by commit 4eae2a657d.

> > The value of 's->stats_vq_elem' will be NULL after live migration, and

> > the check in the function 'balloon_stats_poll_cb()' will prevent the

> > 'virtio_notify()' from executing. So guest will not update the memory

> > status.

> >

> > Signed-off-by: Liang Li <liang.z.li@intel.com>

> > Cc: Michael S. Tsirkin <mst@redhat.com>

> > Cc: Ladi Prosek <lprosek@redhat.com>

> > Cc: Paolo Bonzini <pbonzini@redhat.com>

> > ---

> >  hw/virtio/virtio-balloon.c | 8 +++++++-

> >  1 file changed, 7 insertions(+), 1 deletion(-)

> >

> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c

> > index 557d3f9..cc6947f 100644

> > --- a/hw/virtio/virtio-balloon.c

> > +++ b/hw/virtio/virtio-balloon.c

> > @@ -98,13 +98,19 @@ static void balloon_stats_poll_cb(void *opaque)  {

> >      VirtIOBalloon *s = opaque;

> >      VirtIODevice *vdev = VIRTIO_DEVICE(s);

> > +    VirtQueueElement elem = {0};

> >

> > -    if (s->stats_vq_elem == NULL || !balloon_stats_supported(s)) {

> > +    if (!balloon_stats_supported(s)) {

> >          /* re-schedule */

> >          balloon_stats_change_timer(s, s->stats_poll_interval);

> >          return;

> >      }

> >

> > +    if (s->stats_vq_elem == NULL) {

> > +        virtqueue_push(s->svq, &elem, 0);

> > +        virtio_notify(vdev, s->svq);

> > +        return;

> > +    }

> 

> What if we are not in the just-after-live-migration situation though?

> If the guest simply didn't add a buffer to the queue for some reason,

> wouldn't this newly added push/notify break the balloon protocol?

> 

Could you elaborate how it happens?  
The added code only works for the situation just after live migration. right?

Thanks!
Liang
Ladi Prosek June 30, 2016, 9:09 a.m. UTC | #3
On Thu, Jun 30, 2016 at 10:39 AM, Li, Liang Z <liang.z.li@intel.com> wrote:
>> On Thu, Jun 30, 2016 at 9:31 AM, Liang Li <liang.z.li@intel.com> wrote:
>> > After live migration, 'guest-stats' can't get the expected memory
>> > status in the guest. This issue is caused by commit 4eae2a657d.
>> > The value of 's->stats_vq_elem' will be NULL after live migration, and
>> > the check in the function 'balloon_stats_poll_cb()' will prevent the
>> > 'virtio_notify()' from executing. So guest will not update the memory
>> > status.
>> >
>> > Signed-off-by: Liang Li <liang.z.li@intel.com>
>> > Cc: Michael S. Tsirkin <mst@redhat.com>
>> > Cc: Ladi Prosek <lprosek@redhat.com>
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > ---
>> >  hw/virtio/virtio-balloon.c | 8 +++++++-
>> >  1 file changed, 7 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> > index 557d3f9..cc6947f 100644
>> > --- a/hw/virtio/virtio-balloon.c
>> > +++ b/hw/virtio/virtio-balloon.c
>> > @@ -98,13 +98,19 @@ static void balloon_stats_poll_cb(void *opaque)  {
>> >      VirtIOBalloon *s = opaque;
>> >      VirtIODevice *vdev = VIRTIO_DEVICE(s);
>> > +    VirtQueueElement elem = {0};
>> >
>> > -    if (s->stats_vq_elem == NULL || !balloon_stats_supported(s)) {
>> > +    if (!balloon_stats_supported(s)) {
>> >          /* re-schedule */
>> >          balloon_stats_change_timer(s, s->stats_poll_interval);
>> >          return;
>> >      }
>> >
>> > +    if (s->stats_vq_elem == NULL) {
>> > +        virtqueue_push(s->svq, &elem, 0);
>> > +        virtio_notify(vdev, s->svq);
>> > +        return;
>> > +    }
>>
>> What if we are not in the just-after-live-migration situation though?
>> If the guest simply didn't add a buffer to the queue for some reason,
>> wouldn't this newly added push/notify break the balloon protocol?
>>
> Could you elaborate how it happens?
> The added code only works for the situation just after live migration. right?

I don't believe so. As eloquently stated in the spec (5.5.6.3 Memory
Statistics), the stats queue is "atypical". If everything goes well,
the stats_vq_elem field is indeed expected to be non-NULL here in
balloon_stats_poll_cb. But it may as well be NULL, for example if step
3. "The driver collects memory statistics and writes them into a new
buffer." takes a long time and the driver doesn't make it by the time
the callback fires.

Basically, the driver and device play ping-pong and the value of
stats_vq_elem is NULL or non-NULL depending on which side the ball is.
It looks like stats_vq_elem should be part of the QEMU state that is
preserved across migrations, although I'll admit that I am unfamiliar
with how migrations work and may be missing something important.

Thanks!
Ladi

> Thanks!
> Liang
Paolo Bonzini June 30, 2016, 2:35 p.m. UTC | #4
On 30/06/2016 09:31, Liang Li wrote:
> After live migration, 'guest-stats' can't get the expected memory
> status in the guest. This issue is caused by commit 4eae2a657d.
> The value of 's->stats_vq_elem' will be NULL after live migration,
> and the check in the function 'balloon_stats_poll_cb()' will
> prevent the 'virtio_notify()' from executing. So guest will not
> update the memory status.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Ladi Prosek <lprosek@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio-balloon.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 557d3f9..cc6947f 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -98,13 +98,19 @@ static void balloon_stats_poll_cb(void *opaque)
>  {
>      VirtIOBalloon *s = opaque;
>      VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +    VirtQueueElement elem = {0};
>  
> -    if (s->stats_vq_elem == NULL || !balloon_stats_supported(s)) {
> +    if (!balloon_stats_supported(s)) {
>          /* re-schedule */
>          balloon_stats_change_timer(s, s->stats_poll_interval);
>          return;
>      }
>  
> +    if (s->stats_vq_elem == NULL) {
> +        virtqueue_push(s->svq, &elem, 0);
> +        virtio_notify(vdev, s->svq);
> +        return;
> +    }
>      virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
>      virtio_notify(vdev, s->svq);
>      g_free(s->stats_vq_elem);
> 

Hi, the right fix is to migrate s->stats_vq_elem if it is not NULL.  See
how it's done in hw/char/virtio-serial.c's virtio_serial_save_device
(save) and fetch_active_ports_list (load).

Paolo
Li, Liang Z June 30, 2016, 4:43 p.m. UTC | #5
> >> What if we are not in the just-after-live-migration situation though?

> >> If the guest simply didn't add a buffer to the queue for some reason,

> >> wouldn't this newly added push/notify break the balloon protocol?

> >>

> > Could you elaborate how it happens?

> > The added code only works for the situation just after live migration. right?

> 

> I don't believe so. As eloquently stated in the spec (5.5.6.3 Memory

> Statistics), the stats queue is "atypical". If everything goes well, the

> stats_vq_elem field is indeed expected to be non-NULL here in

> balloon_stats_poll_cb. But it may as well be NULL, for example if step 3. "The

> driver collects memory statistics and writes them into a new buffer." takes a

> long time and the driver doesn't make it by the time the callback fires.

> 

> Basically, the driver and device play ping-pong and the value of

> stats_vq_elem is NULL or non-NULL depending on which side the ball is.

> It looks like stats_vq_elem should be part of the QEMU state that is

> preserved across migrations, although I'll admit that I am unfamiliar with how


Migrate the stats_vq_elem to destination is better, and it can solve the similar problem
I encountered in another patch, I will look into how to do this.

Thanks for your explanation!

Liang

> migrations work and may be missing something important.

> 

> Thanks!

> Ladi

>
Li, Liang Z June 30, 2016, 4:58 p.m. UTC | #6
> > +    if (s->stats_vq_elem == NULL) {

> > +        virtqueue_push(s->svq, &elem, 0);

> > +        virtio_notify(vdev, s->svq);

> > +        return;

> > +    }

> >      virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);

> >      virtio_notify(vdev, s->svq);

> >      g_free(s->stats_vq_elem);

> >

> 

> Hi, the right fix is to migrate s->stats_vq_elem if it is not NULL.  See how it's

> done in hw/char/virtio-serial.c's virtio_serial_save_device

> (save) and fetch_active_ports_list (load).

> 

> Paolo


Thanks for your information, it's very helpful. I will send the v2.

Liang
Li, Liang Z July 1, 2016, 8:43 a.m. UTC | #7
> > +    if (s->stats_vq_elem == NULL) {

> > +        virtqueue_push(s->svq, &elem, 0);

> > +        virtio_notify(vdev, s->svq);

> > +        return;

> > +    }

> >      virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);

> >      virtio_notify(vdev, s->svq);

> >      g_free(s->stats_vq_elem);

> >

> 

> Hi, the right fix is to migrate s->stats_vq_elem if it is not NULL.  See how it's

> done in hw/char/virtio-serial.c's virtio_serial_save_device

> (save) and fetch_active_ports_list (load).

> 

> Paolo


Hi Paolo,

   If we try to migrate ' migrate s->stats_vq_elem ', how to make it works with the old version ?
   with a new feature bit? Or with some version control?
   
Thanks!
Liang
diff mbox

Patch

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 557d3f9..cc6947f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -98,13 +98,19 @@  static void balloon_stats_poll_cb(void *opaque)
 {
     VirtIOBalloon *s = opaque;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    VirtQueueElement elem = {0};
 
-    if (s->stats_vq_elem == NULL || !balloon_stats_supported(s)) {
+    if (!balloon_stats_supported(s)) {
         /* re-schedule */
         balloon_stats_change_timer(s, s->stats_poll_interval);
         return;
     }
 
+    if (s->stats_vq_elem == NULL) {
+        virtqueue_push(s->svq, &elem, 0);
+        virtio_notify(vdev, s->svq);
+        return;
+    }
     virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
     virtio_notify(vdev, s->svq);
     g_free(s->stats_vq_elem);