Patchwork [PATCHv2,2/2] tap: mark fd handler as device

login
register
mail settings
Submitter Michael S. Tsirkin
Date Nov. 4, 2010, 6:06 p.m.
Message ID <5f91d600c749e66de107f60298c5ebd36645beff.1288892774.git.mst@redhat.com>
Download mbox | patch
Permalink /patch/70153/
State New
Headers show

Comments

Michael S. Tsirkin - Nov. 4, 2010, 6:06 p.m.
There's no reason for tap to run when VM is stopped.
If we let it, it confuses the bridge on TX
and corrupts DMA memory on RX.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/vhost_net.c |    2 +-
 net/tap.c      |    3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)
Juan Quintela - Nov. 15, 2010, 2:52 p.m.
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> There's no reason for tap to run when VM is stopped.
> If we let it, it confuses the bridge on TX
> and corrupts DMA memory on RX.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

once here, what handlers make sense to run while stopped?
/me can think of the normal console, non live migration, loadvm and not
much more.  Perhaps it is easier to just move the other way around?

Later, Juan.
Anthony Liguori - Nov. 15, 2010, 2:55 p.m.
On 11/15/2010 08:52 AM, Juan Quintela wrote:
> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>    
>> There's no reason for tap to run when VM is stopped.
>> If we let it, it confuses the bridge on TX
>> and corrupts DMA memory on RX.
>>
>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>      
> once here, what handlers make sense to run while stopped?
> /me can think of the normal console, non live migration, loadvm and not
> much more.  Perhaps it is easier to just move the other way around?
>    

I'm not sure I concur that this is really a problem.

Semantically, I don't think that stop has to imply that the guest memory 
no longer changes.

Regards,

Anthony Liguori

> Later, Juan.
>
Michael S. Tsirkin - Nov. 15, 2010, 3:18 p.m.
On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote:
> On 11/15/2010 08:52 AM, Juan Quintela wrote:
> >"Michael S. Tsirkin"<mst@redhat.com>  wrote:
> >>There's no reason for tap to run when VM is stopped.
> >>If we let it, it confuses the bridge on TX
> >>and corrupts DMA memory on RX.
> >>
> >>Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >once here, what handlers make sense to run while stopped?
> >/me can think of the normal console, non live migration, loadvm and not
> >much more.  Perhaps it is easier to just move the other way around?
> 
> I'm not sure I concur that this is really a problem.
> Semantically, I don't think that stop has to imply that the guest
> memory no longer changes.
> 
> Regards,
> 
> Anthony Liguori
> 
> >Later, Juan.

Well, I do not really know about vmstop that is not for migration.
For most vmstop calls are for migration. And there, the problems are very
real.

First, it's not just memory.  At least for network transmit, sending out
packets with the same MAC from two locations is a problem. See?

For memory, it is much worse:  any memory changes can either get
discarded or not.  This breaks consistency guarantees that guest relies
upon. Imagine virtio index getting updated but content not being
updated.  See?
Michael S. Tsirkin - Nov. 15, 2010, 3:21 p.m.
On Mon, Nov 15, 2010 at 03:52:54PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > There's no reason for tap to run when VM is stopped.
> > If we let it, it confuses the bridge on TX
> > and corrupts DMA memory on RX.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> once here, what handlers make sense to run while stopped?
> /me can think of the normal console, non live migration, loadvm and not
> much more.  Perhaps it is easier to just move the other way around?
> 
> Later, Juan.

vnc? SDL?
Yes, more devices need to be stopped than not, but I
tread carefully to avoid breaking existing functionality.
If you could solve it for all devices in one swoop, that'd be
great. I'm not up to it.
Anthony Liguori - Nov. 15, 2010, 4:09 p.m.
On 11/15/2010 09:18 AM, Michael S. Tsirkin wrote:
> On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote:
>    
>> On 11/15/2010 08:52 AM, Juan Quintela wrote:
>>      
>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>        
>>>> There's no reason for tap to run when VM is stopped.
>>>> If we let it, it confuses the bridge on TX
>>>> and corrupts DMA memory on RX.
>>>>
>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>>          
>>> once here, what handlers make sense to run while stopped?
>>> /me can think of the normal console, non live migration, loadvm and not
>>> much more.  Perhaps it is easier to just move the other way around?
>>>        
>> I'm not sure I concur that this is really a problem.
>> Semantically, I don't think that stop has to imply that the guest
>> memory no longer changes.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>      
>>> Later, Juan.
>>>        
> Well, I do not really know about vmstop that is not for migration.
>    

They are separate.  For instance, we don't rely on stop to pause pending 
disk I/O because we don't serialize pending disk I/O operations.  
Instead, we flush all pending I/O and rely on the fact that disk I/O 
requests are always submitted in the context of a vCPU operation.  This 
assumption breaks down though with ioeventfd so we need to revisit it.

> For most vmstop calls are for migration. And there, the problems are very
> real.
>
> First, it's not just memory.  At least for network transmit, sending out
> packets with the same MAC from two locations is a problem. See?
>    

I agree it's a problem but I'm not sure that just marking fd handlers 
really helps.

Bottom halves can also trigger transmits.  I think that if we put 
something in the network layer that just queued packets if the vm is 
stopped, it would be a more robust solution to the problem.

> For memory, it is much worse:  any memory changes can either get
> discarded or not.  This breaks consistency guarantees that guest relies
> upon. Imagine virtio index getting updated but content not being
> updated.  See?
>    

If you suppress any I/O then the memory changes don't matter because the 
same changes will happen on the destination too.

I think this basic problem is the same as Kemari.  We can either attempt 
to totally freeze a guest which means stopping all callbacks that are 
device related or we can prevent I/O from happening which should 
introduce enough determinism to fix the problem in practice.

Regards,

Anthony Liguori
Michael S. Tsirkin - Nov. 15, 2010, 8:26 p.m.
On Mon, Nov 15, 2010 at 10:09:29AM -0600, Anthony Liguori wrote:
> On 11/15/2010 09:18 AM, Michael S. Tsirkin wrote:
> >On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote:
> >>On 11/15/2010 08:52 AM, Juan Quintela wrote:
> >>>"Michael S. Tsirkin"<mst@redhat.com>   wrote:
> >>>>There's no reason for tap to run when VM is stopped.
> >>>>If we let it, it confuses the bridge on TX
> >>>>and corrupts DMA memory on RX.
> >>>>
> >>>>Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >>>once here, what handlers make sense to run while stopped?
> >>>/me can think of the normal console, non live migration, loadvm and not
> >>>much more.  Perhaps it is easier to just move the other way around?
> >>I'm not sure I concur that this is really a problem.
> >>Semantically, I don't think that stop has to imply that the guest
> >>memory no longer changes.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >>
> >>>Later, Juan.
> >Well, I do not really know about vmstop that is not for migration.
> 
> They are separate.  For instance, we don't rely on stop to pause
> pending disk I/O because we don't serialize pending disk I/O
> operations.  Instead, we flush all pending I/O and rely on the fact
> that disk I/O requests are always submitted in the context of a vCPU
> operation.  This assumption breaks down though with ioeventfd so we
> need to revisit it.
> 
> >For most vmstop calls are for migration. And there, the problems are very
> >real.
> >
> >First, it's not just memory.  At least for network transmit, sending out
> >packets with the same MAC from two locations is a problem. See?
> 
> I agree it's a problem but I'm not sure that just marking fd
> handlers really helps.
> 
> Bottom halves can also trigger transmits.

Are there any system ones?
Can we just stop processing them?

>  I think that if we put
> something in the network layer that just queued packets if the vm is
> stopped, it would be a more robust solution to the problem.

Will only work for -net. The problem is for anything
that can trigger activity when vm is stopped.

> >For memory, it is much worse:  any memory changes can either get
> >discarded or not.  This breaks consistency guarantees that guest relies
> >upon. Imagine virtio index getting updated but content not being
> >updated.  See?
> 
> If you suppress any I/O then the memory changes don't matter because
> the same changes will happen on the destination too.

They matter, and same changes won't happen.
Example:

virtio used index is in page 1, it can point at data in page 2.
device writes into data, *then* into index. Order matters,
but won't be preserved: migration assumes memory does not
change after vmstop, and so it might send old values for
data but new values for index. Result will be invalid
data coming into guest.

On the destination guest will pick up the index and
get bad (stale) data.


> I think this basic problem is the same as Kemari.  We can either
> attempt to totally freeze a guest which means stopping all callbacks
> that are device related or we can prevent I/O from happening which
> should introduce enough determinism to fix the problem in practice.
> 
> Regards,
> 
> Anthony Liguori


See above. IMO it's a different problem. Unlike Kemari,
I don't really see any drawbacks to stop all callbacks.
Do you?
Anthony Liguori - Nov. 15, 2010, 8:37 p.m.
On 11/15/2010 02:26 PM, Michael S. Tsirkin wrote:
> Are there any system ones?
>    

There's one in qemu-char.c.  Not sure it's easy to just say for the 
future that there can't be BHs that get delivered during vmstop.

> Can we just stop processing them?
>    

We ran into a lot of problems trying to do this with emulating 
synchronous I/O in the block layer.  If we can avoid the 
stop-dispatching handlers approach, I think we'll have far fewer problems.

>>   I think that if we put
>> something in the network layer that just queued packets if the vm is
>> stopped, it would be a more robust solution to the problem.
>>      
> Will only work for -net. The problem is for anything
> that can trigger activity when vm is stopped.
>    

Activity or external I/O?

My assertion is that if we can stop things from hitting the disk, 
escaping the network, or leaving a character device, we're solid.

>>> For memory, it is much worse:  any memory changes can either get
>>> discarded or not.  This breaks consistency guarantees that guest relies
>>> upon. Imagine virtio index getting updated but content not being
>>> updated.  See?
>>>        
>> If you suppress any I/O then the memory changes don't matter because
>> the same changes will happen on the destination too.
>>      
> They matter, and same changes won't happen.
> Example:
>
> virtio used index is in page 1, it can point at data in page 2.
> device writes into data, *then* into index. Order matters,
> but won't be preserved: migration assumes memory does not
> change after vmstop, and so it might send old values for
> data but new values for index. Result will be invalid
> data coming into guest.
>    

No, this scenario is invalid.

Migration copies data while the guest is live and whenever a change 
happens, updates the dirty bitmap (that's why cpu_physical_memory_rw 
matters).

Once the VM is stopped, we never return to the main loop before 
completing migration so nothing else gets an opportunity to run.  This 
means when we send a page, we guarantee it won't be made dirty against 
until after the migration completes.

Once the migration completes, the contents of memory may change but 
that's okay.  As long as you stop packets from being sent, if you need 
to resume the guest, it'll pick up where it left off.

> On the destination guest will pick up the index and
> get bad (stale) data.
>    

If you're seeing this happen with vhost, you aren't updating dirty bits 
correctly.  AFAICT, this cannot happen with userspace device models.

>    
>> I think this basic problem is the same as Kemari.  We can either
>> attempt to totally freeze a guest which means stopping all callbacks
>> that are device related or we can prevent I/O from happening which
>> should introduce enough determinism to fix the problem in practice.
>>
>> Regards,
>>
>> Anthony Liguori
>>      
>
> See above. IMO it's a different problem. Unlike Kemari,
> I don't really see any drawbacks to stop all callbacks.
> Do you?
>    

I think it's going to be extremely difficult to get right and keep 
right.  A bunch of code needs to be converted to make us safe and then 
becomes brittle as it's very simple to change things in such a way that 
we're broken again.

Regards,

Anthony Liguori
Stefan Hajnoczi - Nov. 15, 2010, 8:53 p.m.
On Mon, Nov 15, 2010 at 4:09 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/15/2010 09:18 AM, Michael S. Tsirkin wrote:
>>
>> On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote:
>>
>>>
>>> On 11/15/2010 08:52 AM, Juan Quintela wrote:
>>>
>>>>
>>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>>
>>>>>
>>>>> There's no reason for tap to run when VM is stopped.
>>>>> If we let it, it confuses the bridge on TX
>>>>> and corrupts DMA memory on RX.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>>>
>>>>
>>>> once here, what handlers make sense to run while stopped?
>>>> /me can think of the normal console, non live migration, loadvm and not
>>>> much more.  Perhaps it is easier to just move the other way around?
>>>>
>>>
>>> I'm not sure I concur that this is really a problem.
>>> Semantically, I don't think that stop has to imply that the guest
>>> memory no longer changes.
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>>
>>>>
>>>> Later, Juan.
>>>>
>>
>> Well, I do not really know about vmstop that is not for migration.
>>
>
> They are separate.  For instance, we don't rely on stop to pause pending
> disk I/O because we don't serialize pending disk I/O operations.  Instead,
> we flush all pending I/O and rely on the fact that disk I/O requests are
> always submitted in the context of a vCPU operation.  This assumption breaks
> down though with ioeventfd so we need to revisit it.

vhost has a solution for this: register a VMChangeStateHandler
function that stops ioeventfd while vm_stop(0) is in effect.  Then
poll to see if there is work pending.

I will add equivalent functionality to virtio-ioeventfd.

Stefan
Anthony Liguori - Nov. 15, 2010, 9:05 p.m.
On 11/15/2010 02:53 PM, Stefan Hajnoczi wrote:
>
> vhost has a solution for this: register a VMChangeStateHandler
> function that stops ioeventfd while vm_stop(0) is in effect.  Then
> poll to see if there is work pending.
>
> I will add equivalent functionality to virtio-ioeventfd.
>    

I still think that stopping this at the net/block layer is going to be a 
lot more robust in the long term.  All net/block traffic flows through a 
single set of interfaces whereas getting the devices to completely stop 
themselves requires touching every device and making sure that noone 
adds back vmstop-unfriendly behavior down the road.

Regards,

Anthony Liguori

> Stefan
>
Michael S. Tsirkin - Nov. 15, 2010, 10:44 p.m.
On Mon, Nov 15, 2010 at 03:05:39PM -0600, Anthony Liguori wrote:
> On 11/15/2010 02:53 PM, Stefan Hajnoczi wrote:
> >
> >vhost has a solution for this: register a VMChangeStateHandler
> >function that stops ioeventfd while vm_stop(0) is in effect.  Then
> >poll to see if there is work pending.
> >
> >I will add equivalent functionality to virtio-ioeventfd.
> 
> I still think that stopping this at the net/block layer is going to
> be a lot more robust in the long term.
>  All net/block traffic flows
> through a single set of interfaces whereas getting the devices to
> completely stop themselves requires touching every device and making
> sure that noone adds back vmstop-unfriendly behavior down the road.
> 
> Regards,
> 
> Anthony Liguori

So I'm not sure I understand what is proposed here.
Care posting a patch?
Anthony Liguori - Nov. 15, 2010, 11:46 p.m.
On 11/15/2010 04:44 PM, Michael S. Tsirkin wrote:
> So I'm not sure I understand what is proposed here.
> Care posting a patch?
>    

I *think* just adding:

if (vm_running == 0) {
     return qemu_net_queue_append(queue, sender, flags, data, size, NULL);
}

To qemu_net_queue_send() along with a state notifier to kick the queue 
on start would do the trick.

Regards,

Anthony Liguori

Patch

diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index c068be1..7061d6c 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -139,7 +139,7 @@  int vhost_net_start(struct vhost_net *net,
     }
 
     net->vc->info->poll(net->vc, false);
-    qemu_set_fd_handler(net->backend, NULL, NULL, NULL);
+    qemu_set_fd_handler3(true, net->backend, NULL, NULL, NULL, NULL);
     file.fd = net->backend;
     for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
         r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
diff --git a/net/tap.c b/net/tap.c
index 4afb314..246ef01 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -71,7 +71,8 @@  static void tap_writable(void *opaque);
 
 static void tap_update_fd_handler(TAPState *s)
 {
-    qemu_set_fd_handler2(s->fd,
+    qemu_set_fd_handler3(true,
+                         s->fd,
                          s->read_poll  ? tap_can_send : NULL,
                          s->read_poll  ? tap_send     : NULL,
                          s->write_poll ? tap_writable : NULL,