diff mbox

[v2] net: Flush queued packets when guest resumes

Message ID 1436232067-29144-1-git-send-email-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng July 7, 2015, 1:21 a.m. UTC
Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
net queues need to be explicitly flushed after qemu_can_send_packet()
returns false, because the netdev side will disable the polling of fd.

This fixes the case of "cont" after "stop" (or migration).

Signed-off-by: Fam Zheng <famz@redhat.com>

---

v2: Unify with VM stop handler. (Stefan)
---
 net/net.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

Comments

Jason Wang July 7, 2015, 7:44 a.m. UTC | #1
On 07/07/2015 09:21 AM, Fam Zheng wrote:
> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> net queues need to be explicitly flushed after qemu_can_send_packet()
> returns false, because the netdev side will disable the polling of fd.
>
> This fixes the case of "cont" after "stop" (or migration).
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
>
> ---
>
> v2: Unify with VM stop handler. (Stefan)
> ---
>  net/net.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/net/net.c b/net/net.c
> index 6ff7fec..28a5597 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
>  static void net_vm_change_state_handler(void *opaque, int running,
>                                          RunState state)
>  {
> -    /* Complete all queued packets, to guarantee we don't modify
> -     * state later when VM is not running.
> -     */
> -    if (!running) {
> -        NetClientState *nc;
> -        NetClientState *tmp;
> +    NetClientState *nc;
> +    NetClientState *tmp;
>  
> -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +        if (running) {
> +            /* Flush queued packets and wake up backends. */
> +            if (nc->peer && qemu_can_send_packet(nc)) {
> +                qemu_flush_queued_packets(nc->peer);
> +            }
> +        } else {
> +            /* Complete all queued packets, to guarantee we don't modify
> +             * state later when VM is not running.
> +             */
>              qemu_flush_or_purge_queued_packets(nc, true);
>          }

Looks like qemu_can_send_packet() checks both nc->peer and runstate. So
probably, we can simplify this to:

if (qemu_can_send_packet(nc))
    qemu_flush_queued_packets(nc->peer);
else
    qemu_flush_or_purge_queued_packets(nc, true);

>      }
Michael S. Tsirkin July 7, 2015, 8:13 a.m. UTC | #2
On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> net queues need to be explicitly flushed after qemu_can_send_packet()
> returns false, because the netdev side will disable the polling of fd.
> 
> This fixes the case of "cont" after "stop" (or migration).
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

Note virtio has its own handler which must be used to
flush packets - this one might run too early or too late.

> ---
> 
> v2: Unify with VM stop handler. (Stefan)
> ---
>  net/net.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/net/net.c b/net/net.c
> index 6ff7fec..28a5597 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
>  static void net_vm_change_state_handler(void *opaque, int running,
>                                          RunState state)
>  {
> -    /* Complete all queued packets, to guarantee we don't modify
> -     * state later when VM is not running.
> -     */
> -    if (!running) {
> -        NetClientState *nc;
> -        NetClientState *tmp;
> +    NetClientState *nc;
> +    NetClientState *tmp;
>  
> -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +        if (running) {
> +            /* Flush queued packets and wake up backends. */
> +            if (nc->peer && qemu_can_send_packet(nc)) {
> +                qemu_flush_queued_packets(nc->peer);
> +            }
> +        } else {
> +            /* Complete all queued packets, to guarantee we don't modify
> +             * state later when VM is not running.
> +             */
>              qemu_flush_or_purge_queued_packets(nc, true);
>          }
>      }
> -- 
> 2.4.3
Stefan Hajnoczi July 7, 2015, 8:51 a.m. UTC | #3
On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> net queues need to be explicitly flushed after qemu_can_send_packet()
> returns false, because the netdev side will disable the polling of fd.
> 
> This fixes the case of "cont" after "stop" (or migration).
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> ---
> 
> v2: Unify with VM stop handler. (Stefan)

Thanks!  I'm happy with this but I'll wait for you to respond to Jason's
comment.

> ---
>  net/net.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/net/net.c b/net/net.c
> index 6ff7fec..28a5597 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
>  static void net_vm_change_state_handler(void *opaque, int running,
>                                          RunState state)
>  {
> -    /* Complete all queued packets, to guarantee we don't modify
> -     * state later when VM is not running.
> -     */
> -    if (!running) {
> -        NetClientState *nc;
> -        NetClientState *tmp;
> +    NetClientState *nc;
> +    NetClientState *tmp;
>  
> -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +        if (running) {
> +            /* Flush queued packets and wake up backends. */
> +            if (nc->peer && qemu_can_send_packet(nc)) {
> +                qemu_flush_queued_packets(nc->peer);
> +            }
> +        } else {
> +            /* Complete all queued packets, to guarantee we don't modify
> +             * state later when VM is not running.
> +             */
>              qemu_flush_or_purge_queued_packets(nc, true);
>          }
>      }
> -- 
> 2.4.3
>
Jason Wang July 7, 2015, 8:58 a.m. UTC | #4
On 07/07/2015 04:13 PM, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
>> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
>> net queues need to be explicitly flushed after qemu_can_send_packet()
>> returns false, because the netdev side will disable the polling of fd.
>>
>> This fixes the case of "cont" after "stop" (or migration).
>>
>> Signed-off-by: Fam Zheng <famz@redhat.com>
> Note virtio has its own handler which must be used to
> flush packets - this one might run too early or too late.

If runs too realy (DRIVER_OK is not set), then: packet will be dropped
(if this patch is used with "drop virtio_net_can_receive()", or the
queue will be purged since qemu_can_send_packet() returns false. If too
late, at least tap read poll will be enabled. So still looks ok?

>
>> ---
>>
>> v2: Unify with VM stop handler. (Stefan)
>> ---
>>  net/net.c | 19 ++++++++++++-------
>>  1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/net.c b/net/net.c
>> index 6ff7fec..28a5597 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
>>  static void net_vm_change_state_handler(void *opaque, int running,
>>                                          RunState state)
>>  {
>> -    /* Complete all queued packets, to guarantee we don't modify
>> -     * state later when VM is not running.
>> -     */
>> -    if (!running) {
>> -        NetClientState *nc;
>> -        NetClientState *tmp;
>> +    NetClientState *nc;
>> +    NetClientState *tmp;
>>  
>> -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
>> +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
>> +        if (running) {
>> +            /* Flush queued packets and wake up backends. */
>> +            if (nc->peer && qemu_can_send_packet(nc)) {
>> +                qemu_flush_queued_packets(nc->peer);
>> +            }
>> +        } else {
>> +            /* Complete all queued packets, to guarantee we don't modify
>> +             * state later when VM is not running.
>> +             */
>>              qemu_flush_or_purge_queued_packets(nc, true);
>>          }
>>      }
>> -- 
>> 2.4.3
Fam Zheng July 7, 2015, 9:03 a.m. UTC | #5
On Tue, 07/07 15:44, Jason Wang wrote:
> 
> 
> On 07/07/2015 09:21 AM, Fam Zheng wrote:
> > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > net queues need to be explicitly flushed after qemu_can_send_packet()
> > returns false, because the netdev side will disable the polling of fd.
> >
> > This fixes the case of "cont" after "stop" (or migration).
> >
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> >
> > ---
> >
> > v2: Unify with VM stop handler. (Stefan)
> > ---
> >  net/net.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/net.c b/net/net.c
> > index 6ff7fec..28a5597 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
> >  static void net_vm_change_state_handler(void *opaque, int running,
> >                                          RunState state)
> >  {
> > -    /* Complete all queued packets, to guarantee we don't modify
> > -     * state later when VM is not running.
> > -     */
> > -    if (!running) {
> > -        NetClientState *nc;
> > -        NetClientState *tmp;
> > +    NetClientState *nc;
> > +    NetClientState *tmp;
> >  
> > -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +        if (running) {
> > +            /* Flush queued packets and wake up backends. */
> > +            if (nc->peer && qemu_can_send_packet(nc)) {
> > +                qemu_flush_queued_packets(nc->peer);
> > +            }
> > +        } else {
> > +            /* Complete all queued packets, to guarantee we don't modify
> > +             * state later when VM is not running.
> > +             */
> >              qemu_flush_or_purge_queued_packets(nc, true);
> >          }
> 
> Looks like qemu_can_send_packet() checks both nc->peer and runstate. So
> probably, we can simplify this to:
> 
> if (qemu_can_send_packet(nc))
>     qemu_flush_queued_packets(nc->peer);
> else
>     qemu_flush_or_purge_queued_packets(nc, true);
> 
> >      }
> 

qemu_can_send_packet returns 1 if !nc->peer, so this doesn't work.

Fam
Fam Zheng July 7, 2015, 9:09 a.m. UTC | #6
On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > net queues need to be explicitly flushed after qemu_can_send_packet()
> > returns false, because the netdev side will disable the polling of fd.
> > 
> > This fixes the case of "cont" after "stop" (or migration).
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> Note virtio has its own handler which must be used to
> flush packets - this one might run too early or too late.

Which handler do you mean? I don't think virtio-net handles resume now. (If it
does, we probably should drop it together with this change, since it's needed
by as all NICs.)

Fam

> 
> > ---
> > 
> > v2: Unify with VM stop handler. (Stefan)
> > ---
> >  net/net.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/net.c b/net/net.c
> > index 6ff7fec..28a5597 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
> >  static void net_vm_change_state_handler(void *opaque, int running,
> >                                          RunState state)
> >  {
> > -    /* Complete all queued packets, to guarantee we don't modify
> > -     * state later when VM is not running.
> > -     */
> > -    if (!running) {
> > -        NetClientState *nc;
> > -        NetClientState *tmp;
> > +    NetClientState *nc;
> > +    NetClientState *tmp;
> >  
> > -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > +        if (running) {
> > +            /* Flush queued packets and wake up backends. */
> > +            if (nc->peer && qemu_can_send_packet(nc)) {
> > +                qemu_flush_queued_packets(nc->peer);
> > +            }
> > +        } else {
> > +            /* Complete all queued packets, to guarantee we don't modify
> > +             * state later when VM is not running.
> > +             */
> >              qemu_flush_or_purge_queued_packets(nc, true);
> >          }
> >      }
> > -- 
> > 2.4.3
Michael S. Tsirkin July 7, 2015, 9:10 a.m. UTC | #7
On Tue, Jul 07, 2015 at 04:58:29PM +0800, Jason Wang wrote:
> 
> 
> On 07/07/2015 04:13 PM, Michael S. Tsirkin wrote:
> > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> >> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> >> net queues need to be explicitly flushed after qemu_can_send_packet()
> >> returns false, because the netdev side will disable the polling of fd.
> >>
> >> This fixes the case of "cont" after "stop" (or migration).
> >>
> >> Signed-off-by: Fam Zheng <famz@redhat.com>
> > Note virtio has its own handler which must be used to
> > flush packets - this one might run too early or too late.
> 
> If runs too realy (DRIVER_OK is not set), then: packet will be dropped
> (if this patch is used with "drop virtio_net_can_receive()", or the
> queue will be purged since qemu_can_send_packet() returns false. If too
> late, at least tap read poll will be enabled. So still looks ok?

It's still not helpful for virtio. So which cards are fixed?

I just find the comment
	This fixes the case of "cont" after "stop" (or migration)
too vague.
Can you please include the info about what's broken in the commit log?

> >
> >> ---
> >>
> >> v2: Unify with VM stop handler. (Stefan)
> >> ---
> >>  net/net.c | 19 ++++++++++++-------
> >>  1 file changed, 12 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/net/net.c b/net/net.c
> >> index 6ff7fec..28a5597 100644
> >> --- a/net/net.c
> >> +++ b/net/net.c
> >> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
> >>  static void net_vm_change_state_handler(void *opaque, int running,
> >>                                          RunState state)
> >>  {
> >> -    /* Complete all queued packets, to guarantee we don't modify
> >> -     * state later when VM is not running.
> >> -     */
> >> -    if (!running) {
> >> -        NetClientState *nc;
> >> -        NetClientState *tmp;
> >> +    NetClientState *nc;
> >> +    NetClientState *tmp;
> >>  
> >> -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> >> +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> >> +        if (running) {
> >> +            /* Flush queued packets and wake up backends. */
> >> +            if (nc->peer && qemu_can_send_packet(nc)) {
> >> +                qemu_flush_queued_packets(nc->peer);
> >> +            }
> >> +        } else {
> >> +            /* Complete all queued packets, to guarantee we don't modify
> >> +             * state later when VM is not running.
> >> +             */
> >>              qemu_flush_or_purge_queued_packets(nc, true);
> >>          }
> >>      }
> >> -- 
> >> 2.4.3
Michael S. Tsirkin July 7, 2015, 9:19 a.m. UTC | #8
On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > > net queues need to be explicitly flushed after qemu_can_send_packet()
> > > returns false, because the netdev side will disable the polling of fd.
> > > 
> > > This fixes the case of "cont" after "stop" (or migration).
> > > 
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > 
> > Note virtio has its own handler which must be used to
> > flush packets - this one might run too early or too late.
> 
> Which handler do you mean? I don't think virtio-net handles resume now. (If it
> does, we probably should drop it together with this change, since it's needed
> by as all NICs.)
> 
> Fam

virtio_vmstate_change

It's all far from trivial. I suspect these whack-a-mole approach
spreading purge here and there will only create more bugs.

Why would we ever need to process network packets when
VM is not running? I don't see any point to it.
How about we simply stop the job processing network on
vm stop and restart on vm start?



> > 
> > > ---
> > > 
> > > v2: Unify with VM stop handler. (Stefan)
> > > ---
> > >  net/net.c | 19 ++++++++++++-------
> > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/net/net.c b/net/net.c
> > > index 6ff7fec..28a5597 100644
> > > --- a/net/net.c
> > > +++ b/net/net.c
> > > @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
> > >  static void net_vm_change_state_handler(void *opaque, int running,
> > >                                          RunState state)
> > >  {
> > > -    /* Complete all queued packets, to guarantee we don't modify
> > > -     * state later when VM is not running.
> > > -     */
> > > -    if (!running) {
> > > -        NetClientState *nc;
> > > -        NetClientState *tmp;
> > > +    NetClientState *nc;
> > > +    NetClientState *tmp;
> > >  
> > > -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > > +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > > +        if (running) {
> > > +            /* Flush queued packets and wake up backends. */
> > > +            if (nc->peer && qemu_can_send_packet(nc)) {
> > > +                qemu_flush_queued_packets(nc->peer);
> > > +            }
> > > +        } else {
> > > +            /* Complete all queued packets, to guarantee we don't modify
> > > +             * state later when VM is not running.
> > > +             */
> > >              qemu_flush_or_purge_queued_packets(nc, true);
> > >          }
> > >      }
> > > -- 
> > > 2.4.3
Fam Zheng July 7, 2015, 10:02 a.m. UTC | #9
On Tue, 07/07 12:10, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 04:58:29PM +0800, Jason Wang wrote:
> > 
> > 
> > On 07/07/2015 04:13 PM, Michael S. Tsirkin wrote:
> > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > >> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > >> net queues need to be explicitly flushed after qemu_can_send_packet()
> > >> returns false, because the netdev side will disable the polling of fd.
> > >>
> > >> This fixes the case of "cont" after "stop" (or migration).
> > >>
> > >> Signed-off-by: Fam Zheng <famz@redhat.com>
> > > Note virtio has its own handler which must be used to
> > > flush packets - this one might run too early or too late.
> > 
> > If runs too realy (DRIVER_OK is not set), then: packet will be dropped
> > (if this patch is used with "drop virtio_net_can_receive()", or the
> > queue will be purged since qemu_can_send_packet() returns false. If too
> > late, at least tap read poll will be enabled. So still looks ok?
> 
> It's still not helpful for virtio. So which cards are fixed?
> 
> I just find the comment
> 	This fixes the case of "cont" after "stop" (or migration)
> too vague.
> Can you please include the info about what's broken in the commit log?

It does fix virtio-net, as well as most other cards.  HMP "stop" then "cont"
breaks net, tested with host pinging guest while doing it.

Fam

> 
> > >
> > >> ---
> > >>
> > >> v2: Unify with VM stop handler. (Stefan)
> > >> ---
> > >>  net/net.c | 19 ++++++++++++-------
> > >>  1 file changed, 12 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/net/net.c b/net/net.c
> > >> index 6ff7fec..28a5597 100644
> > >> --- a/net/net.c
> > >> +++ b/net/net.c
> > >> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
> > >>  static void net_vm_change_state_handler(void *opaque, int running,
> > >>                                          RunState state)
> > >>  {
> > >> -    /* Complete all queued packets, to guarantee we don't modify
> > >> -     * state later when VM is not running.
> > >> -     */
> > >> -    if (!running) {
> > >> -        NetClientState *nc;
> > >> -        NetClientState *tmp;
> > >> +    NetClientState *nc;
> > >> +    NetClientState *tmp;
> > >>  
> > >> -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > >> +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> > >> +        if (running) {
> > >> +            /* Flush queued packets and wake up backends. */
> > >> +            if (nc->peer && qemu_can_send_packet(nc)) {
> > >> +                qemu_flush_queued_packets(nc->peer);
> > >> +            }
> > >> +        } else {
> > >> +            /* Complete all queued packets, to guarantee we don't modify
> > >> +             * state later when VM is not running.
> > >> +             */
> > >>              qemu_flush_or_purge_queued_packets(nc, true);
> > >>          }
> > >>      }
> > >> -- 
> > >> 2.4.3
Jason Wang July 8, 2015, 9:40 a.m. UTC | #10
On 07/07/2015 05:03 PM, Fam Zheng wrote:
> On Tue, 07/07 15:44, Jason Wang wrote:
>>
>> On 07/07/2015 09:21 AM, Fam Zheng wrote:
>>> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
>>> net queues need to be explicitly flushed after qemu_can_send_packet()
>>> returns false, because the netdev side will disable the polling of fd.
>>>
>>> This fixes the case of "cont" after "stop" (or migration).
>>>
>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>
>>> ---
>>>
>>> v2: Unify with VM stop handler. (Stefan)
>>> ---
>>>  net/net.c | 19 ++++++++++++-------
>>>  1 file changed, 12 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/net/net.c b/net/net.c
>>> index 6ff7fec..28a5597 100644
>>> --- a/net/net.c
>>> +++ b/net/net.c
>>> @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, Error **errp)
>>>  static void net_vm_change_state_handler(void *opaque, int running,
>>>                                          RunState state)
>>>  {
>>> -    /* Complete all queued packets, to guarantee we don't modify
>>> -     * state later when VM is not running.
>>> -     */
>>> -    if (!running) {
>>> -        NetClientState *nc;
>>> -        NetClientState *tmp;
>>> +    NetClientState *nc;
>>> +    NetClientState *tmp;
>>>  
>>> -        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
>>> +    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
>>> +        if (running) {
>>> +            /* Flush queued packets and wake up backends. */
>>> +            if (nc->peer && qemu_can_send_packet(nc)) {
>>> +                qemu_flush_queued_packets(nc->peer);
>>> +            }
>>> +        } else {
>>> +            /* Complete all queued packets, to guarantee we don't modify
>>> +             * state later when VM is not running.
>>> +             */
>>>              qemu_flush_or_purge_queued_packets(nc, true);
>>>          }
>> Looks like qemu_can_send_packet() checks both nc->peer and runstate. So
>> probably, we can simplify this to:
>>
>> if (qemu_can_send_packet(nc))
>>     qemu_flush_queued_packets(nc->peer);
>> else
>>     qemu_flush_or_purge_queued_packets(nc, true);
>>
>>>      }
> qemu_can_send_packet returns 1 if !nc->peer, so this doesn't work.
>
> Fam

Yes, I was wrong.

Btw, instead of depending on vm handler (which seems racy with other
state change handler). Can we do this in places like vm_start() and
vm_stop(). Like we drain and flush block queue during vm stop.
Stefan Hajnoczi July 9, 2015, 9:25 a.m. UTC | #11
On Tue, Jul 07, 2015 at 12:10:15PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 04:58:29PM +0800, Jason Wang wrote:
> > 
> > 
> > On 07/07/2015 04:13 PM, Michael S. Tsirkin wrote:
> > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > >> Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > >> net queues need to be explicitly flushed after qemu_can_send_packet()
> > >> returns false, because the netdev side will disable the polling of fd.
> > >>
> > >> This fixes the case of "cont" after "stop" (or migration).
> > >>
> > >> Signed-off-by: Fam Zheng <famz@redhat.com>
> > > Note virtio has its own handler which must be used to
> > > flush packets - this one might run too early or too late.
> > 
> > If runs too realy (DRIVER_OK is not set), then: packet will be dropped
> > (if this patch is used with "drop virtio_net_can_receive()", or the
> > queue will be purged since qemu_can_send_packet() returns false. If too
> > late, at least tap read poll will be enabled. So still looks ok?
> 
> It's still not helpful for virtio. So which cards are fixed?
> 
> I just find the comment
> 	This fixes the case of "cont" after "stop" (or migration)
> too vague.
> Can you please include the info about what's broken in the commit log?

Max Fillipov reported that it's needed to fix the gdb stub:
https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg01015.html
Fam Zheng July 10, 2015, 9:03 a.m. UTC | #12
On Tue, 07/07 12:19, Michael S. Tsirkin wrote:
> On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> > On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > > > net queues need to be explicitly flushed after qemu_can_send_packet()
> > > > returns false, because the netdev side will disable the polling of fd.
> > > > 
> > > > This fixes the case of "cont" after "stop" (or migration).
> > > > 
> > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > 
> > > Note virtio has its own handler which must be used to
> > > flush packets - this one might run too early or too late.
> > 
> > Which handler do you mean? I don't think virtio-net handles resume now. (If it
> > does, we probably should drop it together with this change, since it's needed
> > by as all NICs.)
> > 
> > Fam
> 
> virtio_vmstate_change
> 
> It's all far from trivial. I suspect these whack-a-mole approach
> spreading purge here and there will only create more bugs.
> 
> Why would we ever need to process network packets when
> VM is not running? I don't see any point to it.
> How about we simply stop the job processing network on
> vm stop and restart on vm start?

I suppose it is too much for 2.4. I think this approach, adding
qemu_flush_queued_packets(), is consistent with its existing usage (when a
device is becoming active from inactive), like in e1000_write_config.

How about applying this and let's work on "stopping tap when VM not running"
for 2.5?

Fam
Stefan Hajnoczi July 14, 2015, 12:20 p.m. UTC | #13
On Fri, Jul 10, 2015 at 05:03:22PM +0800, Fam Zheng wrote:
> On Tue, 07/07 12:19, Michael S. Tsirkin wrote:
> > On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> > > On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > > > > net queues need to be explicitly flushed after qemu_can_send_packet()
> > > > > returns false, because the netdev side will disable the polling of fd.
> > > > > 
> > > > > This fixes the case of "cont" after "stop" (or migration).
> > > > > 
> > > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > 
> > > > Note virtio has its own handler which must be used to
> > > > flush packets - this one might run too early or too late.
> > > 
> > > Which handler do you mean? I don't think virtio-net handles resume now. (If it
> > > does, we probably should drop it together with this change, since it's needed
> > > by as all NICs.)
> > > 
> > > Fam
> > 
> > virtio_vmstate_change
> > 
> > It's all far from trivial. I suspect these whack-a-mole approach
> > spreading purge here and there will only create more bugs.
> > 
> > Why would we ever need to process network packets when
> > VM is not running? I don't see any point to it.
> > How about we simply stop the job processing network on
> > vm stop and restart on vm start?
> 
> I suppose it is too much for 2.4. I think this approach, adding
> qemu_flush_queued_packets(), is consistent with its existing usage (when a
> device is becoming active from inactive), like in e1000_write_config.
> 
> How about applying this and let's work on "stopping tap when VM not running"
> for 2.5?

Jason has gone happy on this and the virtio-net .can_receive() patch.

Michael: Any further comments?  Are you okay with this patch too?
Michael S. Tsirkin July 14, 2015, 12:42 p.m. UTC | #14
On Tue, Jul 14, 2015 at 01:20:03PM +0100, Stefan Hajnoczi wrote:
> On Fri, Jul 10, 2015 at 05:03:22PM +0800, Fam Zheng wrote:
> > On Tue, 07/07 12:19, Michael S. Tsirkin wrote:
> > > On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> > > > On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > > > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > > > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > > > > > net queues need to be explicitly flushed after qemu_can_send_packet()
> > > > > > returns false, because the netdev side will disable the polling of fd.
> > > > > > 
> > > > > > This fixes the case of "cont" after "stop" (or migration).
> > > > > > 
> > > > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > > 
> > > > > Note virtio has its own handler which must be used to
> > > > > flush packets - this one might run too early or too late.
> > > > 
> > > > Which handler do you mean? I don't think virtio-net handles resume now. (If it
> > > > does, we probably should drop it together with this change, since it's needed
> > > > by as all NICs.)
> > > > 
> > > > Fam
> > > 
> > > virtio_vmstate_change
> > > 
> > > It's all far from trivial. I suspect these whack-a-mole approach
> > > spreading purge here and there will only create more bugs.
> > > 
> > > Why would we ever need to process network packets when
> > > VM is not running? I don't see any point to it.
> > > How about we simply stop the job processing network on
> > > vm stop and restart on vm start?
> > 
> > I suppose it is too much for 2.4. I think this approach, adding
> > qemu_flush_queued_packets(), is consistent with its existing usage (when a
> > device is becoming active from inactive), like in e1000_write_config.
> > 
> > How about applying this and let's work on "stopping tap when VM not running"
> > for 2.5?
> 
> Jason has gone happy on this and the virtio-net .can_receive() patch.
> 
> Michael: Any further comments?  Are you okay with this patch too?

I think it doesn't help virtio - am I wrong?
Fam Zheng July 15, 2015, 2:50 a.m. UTC | #15
On Tue, 07/14 15:42, Michael S. Tsirkin wrote:
> On Tue, Jul 14, 2015 at 01:20:03PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Jul 10, 2015 at 05:03:22PM +0800, Fam Zheng wrote:
> > > On Tue, 07/07 12:19, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> > > > > On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > > > > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
> > > > > > > net queues need to be explicitly flushed after qemu_can_send_packet()
> > > > > > > returns false, because the netdev side will disable the polling of fd.
> > > > > > > 
> > > > > > > This fixes the case of "cont" after "stop" (or migration).
> > > > > > > 
> > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > > > 
> > > > > > Note virtio has its own handler which must be used to
> > > > > > flush packets - this one might run too early or too late.
> > > > > 
> > > > > Which handler do you mean? I don't think virtio-net handles resume now. (If it
> > > > > does, we probably should drop it together with this change, since it's needed
> > > > > by as all NICs.)
> > > > > 
> > > > > Fam
> > > > 
> > > > virtio_vmstate_change
> > > > 
> > > > It's all far from trivial. I suspect these whack-a-mole approach
> > > > spreading purge here and there will only create more bugs.
> > > > 
> > > > Why would we ever need to process network packets when
> > > > VM is not running? I don't see any point to it.
> > > > How about we simply stop the job processing network on
> > > > vm stop and restart on vm start?
> > > 
> > > I suppose it is too much for 2.4. I think this approach, adding
> > > qemu_flush_queued_packets(), is consistent with its existing usage (when a
> > > device is becoming active from inactive), like in e1000_write_config.
> > > 
> > > How about applying this and let's work on "stopping tap when VM not running"
> > > for 2.5?
> > 
> > Jason has gone happy on this and the virtio-net .can_receive() patch.
> > 
> > Michael: Any further comments?  Are you okay with this patch too?

FWIW, I am sending another virtio-net patch to flush at .set_status.

> 
> I think it doesn't help virtio - am I wrong?
> 

Right, virtio doesn't need this because it gets notified with '.set_status()'.
But this patch does help:

http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg01396.html

(as well as all other NICs).

Fam
diff mbox

Patch

diff --git a/net/net.c b/net/net.c
index 6ff7fec..28a5597 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1257,14 +1257,19 @@  void qmp_set_link(const char *name, bool up, Error **errp)
 static void net_vm_change_state_handler(void *opaque, int running,
                                         RunState state)
 {
-    /* Complete all queued packets, to guarantee we don't modify
-     * state later when VM is not running.
-     */
-    if (!running) {
-        NetClientState *nc;
-        NetClientState *tmp;
+    NetClientState *nc;
+    NetClientState *tmp;
 
-        QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
+    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
+        if (running) {
+            /* Flush queued packets and wake up backends. */
+            if (nc->peer && qemu_can_send_packet(nc)) {
+                qemu_flush_queued_packets(nc->peer);
+            }
+        } else {
+            /* Complete all queued packets, to guarantee we don't modify
+             * state later when VM is not running.
+             */
             qemu_flush_or_purge_queued_packets(nc, true);
         }
     }