Message ID | 20100826060513.GF18351@amit-laptop.redhat.com |
---|---|
State | New |
Headers | show |
On 08/26/2010 08:05 AM, Amit Shah wrote: > This is what I have currently. It would need some timer handling in > the save/load case as well, right? When loading you won't have any pending "info balloon" command, so I think the timer need not be preserved across migration. Also, 5 seconds for a stopped guest is actually a lot, so maybe Amit's original patch or a variant thereof would make sense anyway. Paolo
On Thu, Aug 26, 2010 at 10:05:44AM +0200, Paolo Bonzini wrote: > On 08/26/2010 08:05 AM, Amit Shah wrote: > >This is what I have currently. It would need some timer handling in > >the save/load case as well, right? > > When loading you won't have any pending "info balloon" command, so I > think the timer need not be preserved across migration. > > Also, 5 seconds for a stopped guest is actually a lot, so maybe Amit's > original patch or a variant thereof would make sense anyway. We should have a combination of both. If we know the guest is stopped we should return immediately, otherwise we should use the timer as a way to cope with a crashed/evil guest. Daniel
On (Thu) Aug 26 2010 [10:05:44], Paolo Bonzini wrote: > On 08/26/2010 08:05 AM, Amit Shah wrote: > >This is what I have currently. It would need some timer handling in > >the save/load case as well, right? > > When loading you won't have any pending "info balloon" command, so I > think the timer need not be preserved across migration. > > Also, 5 seconds for a stopped guest is actually a lot, That's the problem; it's policy. Where and how to specify it? > so maybe > Amit's original patch or a variant thereof would make sense anyway. This seems to be needed though -- as Anthony mentioned, a guest which has oopsed or similar, incapable of servicing the stats request, is going to block the monitor command from returning forever. So it's better to have a timeout, just that we need to decide how much it should be. Amit
On 08/26/2010 10:17 AM, Amit Shah wrote: >> > >> > Also, 5 seconds for a stopped guest is actually a lot, > That's the problem; it's policy. Where and how to specify it? For a crashed/oopsed guest even 10 seconds may be okay, as long as it's 0 for a stopped guest. We need both patches. Paolo
On Thu, Aug 26, 2010 at 01:47:50PM +0530, Amit Shah wrote: > On (Thu) Aug 26 2010 [10:05:44], Paolo Bonzini wrote: > > On 08/26/2010 08:05 AM, Amit Shah wrote: > > >This is what I have currently. It would need some timer handling in > > >the save/load case as well, right? > > > > When loading you won't have any pending "info balloon" command, so I > > think the timer need not be preserved across migration. > > > > Also, 5 seconds for a stopped guest is actually a lot, > > That's the problem; it's policy. Where and how to specify it? It is unfortunate that this is policy, but we just have to accept that the current query-balloon command is a flawed design. IMHO we should just hardcode the timeout at 5 seconds as you do (plus immediate return for paused guests). Then focus on adding new monitor commands/events to deal with balloon query in a way that doesn't require this kind of policy in QEMU, and deprecate the existing query-balloon command. REgards, Daniel
On Thu, 26 Aug 2010 09:28:42 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote: > On Thu, Aug 26, 2010 at 01:47:50PM +0530, Amit Shah wrote: > > On (Thu) Aug 26 2010 [10:05:44], Paolo Bonzini wrote: > > > On 08/26/2010 08:05 AM, Amit Shah wrote: > > > >This is what I have currently. It would need some timer handling in > > > >the save/load case as well, right? > > > > > > When loading you won't have any pending "info balloon" command, so I > > > think the timer need not be preserved across migration. > > > > > > Also, 5 seconds for a stopped guest is actually a lot, > > > > That's the problem; it's policy. Where and how to specify it? > > It is unfortunate that this is policy, but we just have to accept > that the current query-balloon command is a flawed design. IMHO > we should just hardcode the timeout at 5 seconds as you do (plus > immediate return for paused guests). Then focus on adding new > monitor commands/events to deal with balloon query in a way > that doesn't require this kind of policy in QEMU, and deprecate > the existing query-balloon command. Agreed, but it's not just that: we've never correctly specified how commands that talk with the guest should behave. *brain dump warning* We were talking about making all commands work as synchronous and asynchronous. If we do that, then we'll need a 'global' timeout for all synchronous commands. We could have a default value and a command to set it. *brain dump warning ends* I really don't know what to do 0.13. Probably the hard-coded timer is the best solution we have, but I'm wondering if it's going to cause problems in the near future, when we get proper asynchronous command support.
On 08/26/2010 03:14 AM, Daniel P. Berrange wrote: > On Thu, Aug 26, 2010 at 10:05:44AM +0200, Paolo Bonzini wrote: > >> On 08/26/2010 08:05 AM, Amit Shah wrote: >> >>> This is what I have currently. It would need some timer handling in >>> the save/load case as well, right? >>> >> When loading you won't have any pending "info balloon" command, so I >> think the timer need not be preserved across migration. >> >> Also, 5 seconds for a stopped guest is actually a lot, so maybe Amit's >> original patch or a variant thereof would make sense anyway. >> > We should have a combination of both. If we know the guest is stopped > we should return immediately, otherwise we should use the timer as a > way to cope with a crashed/evil guest. > Stopped doesn't necessarily mean that it's permanently stopped or even that a user has stopped it. We stop a guest during live migration and in some other cases (like on disk error). Returning immediately is an optimization on something that should be a proper fix. Otherwise, you have a guest initiated DoS attack on management tools. Regards, Anthony Liguori > Daniel >
On 08/26/2010 02:57 PM, Luiz Capitulino wrote: > I really don't know what to do 0.13. Probably the hard-coded timer is > the best solution we have, but I'm wondering if it's going to cause > problems in the near future, when we get proper asynchronous command > support. Just make it a different command, or make it dependent on whether the initial handshaking activated the asynchronous command capability. Paolo
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c index 9fe3886..1ec03b3 100644 --- a/hw/virtio-balloon.c +++ b/hw/virtio-balloon.c @@ -40,6 +40,7 @@ typedef struct VirtIOBalloon size_t stats_vq_offset; MonitorCompletion *stats_callback; void *stats_opaque_callback_data; + QEMUTimer *timer; } VirtIOBalloon; static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev) @@ -137,6 +138,11 @@ static void complete_stats_request(VirtIOBalloon *vb) vb->stats_callback = NULL; } +static void stats_timer_expired(void *opaque) +{ + complete_stats_request(opaque); +} + static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBalloon *s = DO_UPCAST(VirtIOBalloon, vdev, vdev); @@ -148,6 +154,8 @@ static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq) return; } + qemu_del_timer(s->timer); + /* Initialize the stats to get rid of any stale values. This is only * needed to handle the case where a guest supports fewer stats than it * used to (ie. it has booted into an old kernel). @@ -215,6 +223,7 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target, dev->stats_callback = cb; dev->stats_opaque_callback_data = cb_data; if (dev->vdev.guest_features & (1 << VIRTIO_BALLOON_F_STATS_VQ)) { + qemu_mod_timer(dev->timer, qemu_get_clock(rt_clock) + 5000); virtqueue_push(dev->svq, &dev->stats_vq_elem, dev->stats_vq_offset); virtio_notify(&dev->vdev, dev->svq); } else { @@ -267,6 +276,8 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev) s->dvq = virtio_add_queue(&s->vdev, 128, virtio_balloon_handle_output); s->svq = virtio_add_queue(&s->vdev, 128, virtio_balloon_receive_stats); + s->timer = qemu_new_timer(rt_clock, stats_timer_expired, s); + reset_stats(s); qemu_add_balloon_handler(virtio_balloon_to_target, s);