[V2,9/9] vhost: zerocopy: poll vq in zerocopy callback

Message ID	20120502034254.11782.27314.stgit@amd-6168-8-1.englab.nay.redhat.com
State	Awaiting Upstream, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Subject: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback To: eric.dumazet@gmail.com, mst@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, ebiederm@xmission.com, davem@davemloft.net From: Jason Wang <jasowang@redhat.com> Date: Wed, 02 May 2012 11:42:54 +0800 Message-ID: <20120502034254.11782.27314.stgit@amd-6168-8-1.englab.nay.redhat.com> In-Reply-To: <20120502033901.11782.13157.stgit@amd-6168-8-1.englab.nay.redhat.com> References: <20120502033901.11782.13157.stgit@amd-6168-8-1.englab.nay.redhat.com> User-Agent: StGit/0.16-1-g60c4 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk

Jason Wang May 2, 2012, 3:42 a.m. UTC

We add used and signal guest in worker thread but did not poll the virtqueue
during the zero copy callback. This may lead the missing of adding and
signalling during zerocopy. Solve this by polling the virtqueue and let it
wakeup the worker during callback.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 15, 2012, 4:50 p.m. UTC | #1

On Wed, 2012-05-02 at 11:42 +0800, Jason Wang wrote:
> We add used and signal guest in worker thread but did not poll the
> virtqueue
> during the zero copy callback. This may lead the missing of adding and
> signalling during zerocopy. Solve this by polling the virtqueue and
> let it
> wakeup the worker during callback.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 947f00d..7b75fdf 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
>         struct vhost_ubuf_ref *ubufs = ubuf->arg;
>         struct vhost_virtqueue *vq = ubufs->vq;
> 
> +       vhost_poll_queue(&vq->poll);
>         /* set len = 1 to mark this desc buffers done DMA */
>         vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
>         kref_put(&ubufs->kref, vhost_zerocopy_done_signal);

Doing so, we might have redundant vhost_poll_queue(). Do you know in
which scenario there might be missing of adding and signaling during
zerocopy?

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 16, 2012, 2:58 a.m. UTC | #2

On 05/16/2012 12:50 AM, Shirley Ma wrote:
> On Wed, 2012-05-02 at 11:42 +0800, Jason Wang wrote:
>> We add used and signal guest in worker thread but did not poll the
>> virtqueue
>> during the zero copy callback. This may lead the missing of adding and
>> signalling during zerocopy. Solve this by polling the virtqueue and
>> let it
>> wakeup the worker during callback.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>> ---
>>   drivers/vhost/vhost.c |    1 +
>>   1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 947f00d..7b75fdf 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
>>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
>>          struct vhost_virtqueue *vq = ubufs->vq;
>>
>> +       vhost_poll_queue(&vq->poll);
>>          /* set len = 1 to mark this desc buffers done DMA */
>>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
>>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> Doing so, we might have redundant vhost_poll_queue(). Do you know in
> which scenario there might be missing of adding and signaling during
> zerocopy?

Yes, as we only do signaling and adding during tx work, if there's no tx 
work when the skb were sent, we may lose the opportunity to let guest 
know about the completion. It's easy to be reproduced with netperf test.

Thanks
> Thanks
> Shirley
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 16, 2012, 3:10 p.m. UTC | #3

On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> >>   drivers/vhost/vhost.c |    1 +
> >>   1 files changed, 1 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >> index 947f00d..7b75fdf 100644
> >> --- a/drivers/vhost/vhost.c
> >> +++ b/drivers/vhost/vhost.c
> >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> >>          struct vhost_virtqueue *vq = ubufs->vq;
> >>
> >> +       vhost_poll_queue(&vq->poll);
> >>          /* set len = 1 to mark this desc buffers done DMA */
> >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > Doing so, we might have redundant vhost_poll_queue(). Do you know in
> > which scenario there might be missing of adding and signaling during
> > zerocopy?
> 
> Yes, as we only do signaling and adding during tx work, if there's no
> tx 
> work when the skb were sent, we may lose the opportunity to let guest 
> know about the completion. It's easy to be reproduced with netperf
> test. 

The reason which host signals guest is to free guest tx buffers, if
there is no tx work, then it's not necessary to signal the guest unless
guest runs out of memory. The pending buffers will be released
virtio_net device gone.

What's the behavior of netperf test when you hit this situation?

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin May 16, 2012, 3:14 p.m. UTC | #4

On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > >>   drivers/vhost/vhost.c |    1 +
> > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > >>
> > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > >> index 947f00d..7b75fdf 100644
> > >> --- a/drivers/vhost/vhost.c
> > >> +++ b/drivers/vhost/vhost.c
> > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > >>
> > >> +       vhost_poll_queue(&vq->poll);
> > >>          /* set len = 1 to mark this desc buffers done DMA */
> > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > > Doing so, we might have redundant vhost_poll_queue(). Do you know in
> > > which scenario there might be missing of adding and signaling during
> > > zerocopy?
> > 
> > Yes, as we only do signaling and adding during tx work, if there's no
> > tx 
> > work when the skb were sent, we may lose the opportunity to let guest 
> > know about the completion. It's easy to be reproduced with netperf
> > test. 
> 
> The reason which host signals guest is to free guest tx buffers, if
> there is no tx work, then it's not necessary to signal the guest unless
> guest runs out of memory. The pending buffers will be released
> virtio_net device gone.
> 
> What's the behavior of netperf test when you hit this situation?
> 
> Thanks
> Shirley

IIRC guest networking seems to be lost.

Shirley Ma May 16, 2012, 5:32 p.m. UTC | #5

On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> > On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > > >>   drivers/vhost/vhost.c |    1 +
> > > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > > >>
> > > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > >> index 947f00d..7b75fdf 100644
> > > >> --- a/drivers/vhost/vhost.c
> > > >> +++ b/drivers/vhost/vhost.c
> > > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> > > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > > >>
> > > >> +       vhost_poll_queue(&vq->poll);
> > > >>          /* set len = 1 to mark this desc buffers done DMA */
> > > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > > >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > > > Doing so, we might have redundant vhost_poll_queue(). Do you
> know in
> > > > which scenario there might be missing of adding and signaling
> during
> > > > zerocopy?
> > > 
> > > Yes, as we only do signaling and adding during tx work, if there's
> no
> > > tx 
> > > work when the skb were sent, we may lose the opportunity to let
> guest 
> > > know about the completion. It's easy to be reproduced with netperf
> > > test. 
> > 
> > The reason which host signals guest is to free guest tx buffers, if
> > there is no tx work, then it's not necessary to signal the guest
> unless
> > guest runs out of memory. The pending buffers will be released
> > virtio_net device gone.
> > 
> > What's the behavior of netperf test when you hit this situation?
> > 
> > Thanks
> > Shirley
> 
> IIRC guest networking seems to be lost. 

It seems vhost_enable_notify is missing in somewhere else?

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin May 16, 2012, 6:36 p.m. UTC | #6

On Wed, May 16, 2012 at 10:32:05AM -0700, Shirley Ma wrote:
> On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> > > On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > > > >>   drivers/vhost/vhost.c |    1 +
> > > > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > > > >>
> > > > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > >> index 947f00d..7b75fdf 100644
> > > > >> --- a/drivers/vhost/vhost.c
> > > > >> +++ b/drivers/vhost/vhost.c
> > > > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> > > > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > > > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > > > >>
> > > > >> +       vhost_poll_queue(&vq->poll);
> > > > >>          /* set len = 1 to mark this desc buffers done DMA */
> > > > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > > > >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > > > > Doing so, we might have redundant vhost_poll_queue(). Do you
> > know in
> > > > > which scenario there might be missing of adding and signaling
> > during
> > > > > zerocopy?
> > > > 
> > > > Yes, as we only do signaling and adding during tx work, if there's
> > no
> > > > tx 
> > > > work when the skb were sent, we may lose the opportunity to let
> > guest 
> > > > know about the completion. It's easy to be reproduced with netperf
> > > > test. 
> > > 
> > > The reason which host signals guest is to free guest tx buffers, if
> > > there is no tx work, then it's not necessary to signal the guest
> > unless
> > > guest runs out of memory. The pending buffers will be released
> > > virtio_net device gone.
> > > 
> > > What's the behavior of netperf test when you hit this situation?
> > > 
> > > Thanks
> > > Shirley
> > 
> > IIRC guest networking seems to be lost. 
> 
> It seems vhost_enable_notify is missing in somewhere else?
> 
> Thanks
> Shirley

Donnu. I see virtio sending packets but they do not get
to tun on host. debugging.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 16, 2012, 7:08 p.m. UTC | #7

On Wed, 2012-05-16 at 21:36 +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 10:32:05AM -0700, Shirley Ma wrote:
> > On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> > > > On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > > > > >>   drivers/vhost/vhost.c |    1 +
> > > > > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > > > > >>
> > > > > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > >> index 947f00d..7b75fdf 100644
> > > > > >> --- a/drivers/vhost/vhost.c
> > > > > >> +++ b/drivers/vhost/vhost.c
> > > > > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void
> *arg)
> > > > > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > > > > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > > > > >>
> > > > > >> +       vhost_poll_queue(&vq->poll);
> > > > > >>          /* set len = 1 to mark this desc buffers done DMA
> */
> > > > > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > > > > >>          kref_put(&ubufs->kref,
> vhost_zerocopy_done_signal);
> > > > > > Doing so, we might have redundant vhost_poll_queue(). Do you
> > > know in
> > > > > > which scenario there might be missing of adding and
> signaling
> > > during
> > > > > > zerocopy?
> > > > > 
> > > > > Yes, as we only do signaling and adding during tx work, if
> there's
> > > no
> > > > > tx 
> > > > > work when the skb were sent, we may lose the opportunity to
> let
> > > guest 
> > > > > know about the completion. It's easy to be reproduced with
> netperf
> > > > > test. 
> > > > 
> > > > The reason which host signals guest is to free guest tx buffers,
> if
> > > > there is no tx work, then it's not necessary to signal the guest
> > > unless
> > > > guest runs out of memory. The pending buffers will be released
> > > > virtio_net device gone.
> > > > 
> > > > What's the behavior of netperf test when you hit this situation?
> > > > 
> > > > Thanks
> > > > Shirley
> > > 
> > > IIRC guest networking seems to be lost. 
> > 
> > It seems vhost_enable_notify is missing in somewhere else?
> > 
> > Thanks
> > Shirley
> 
> Donnu. I see virtio sending packets but they do not get
> to tun on host. debugging. 

I looked at the code, if (zerocopy) check is needed for below code:

+	if (zerocopy) {
                        num_pends = likely(vq->upend_idx >= vq->done_idx) ?
                                    (vq->upend_idx - vq->done_idx) :
                                    (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
                        if (unlikely(num_pends > VHOST_MAX_PEND)) {
                                tx_poll_start(net, sock);
				vhost_poll_queue
                                set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
                                break;
                        }
+	}
                        if (unlikely(vhost_enable_notify(&net->dev, vq))) {
                                vhost_disable_notify(&net->dev, vq);
                                continue;
                        }
                        break;


Second, whether it's possible the problem comes from tx_poll_start()? In
some situation, vhost_poll_wakeup() is not being called?

Thanks
Shirley



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 17, 2012, 2:50 a.m. UTC | #8

On 05/17/2012 01:32 AM, Shirley Ma wrote:
> On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
>> On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
>>> On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
>>>>>>    drivers/vhost/vhost.c |    1 +
>>>>>>    1 files changed, 1 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>> index 947f00d..7b75fdf 100644
>>>>>> --- a/drivers/vhost/vhost.c
>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
>>>>>>           struct vhost_ubuf_ref *ubufs = ubuf->arg;
>>>>>>           struct vhost_virtqueue *vq = ubufs->vq;
>>>>>>
>>>>>> +       vhost_poll_queue(&vq->poll);
>>>>>>           /* set len = 1 to mark this desc buffers done DMA */
>>>>>>           vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
>>>>>>           kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
>>>>> Doing so, we might have redundant vhost_poll_queue(). Do you
>> know in
>>>>> which scenario there might be missing of adding and signaling
>> during
>>>>> zerocopy?
>>>> Yes, as we only do signaling and adding during tx work, if there's
>> no
>>>> tx
>>>> work when the skb were sent, we may lose the opportunity to let
>> guest
>>>> know about the completion. It's easy to be reproduced with netperf
>>>> test.
>>> The reason which host signals guest is to free guest tx buffers, if
>>> there is no tx work, then it's not necessary to signal the guest
>> unless
>>> guest runs out of memory. The pending buffers will be released
>>> virtio_net device gone.

Looks like we only free the skbs in .ndo_start_xmit().
>>>
>>> What's the behavior of netperf test when you hit this situation?
>>>
>>> Thanks
>>> Shirley
>> IIRC guest networking seems to be lost.
> It seems vhost_enable_notify is missing in somewhere else?
>
> Thanks
> Shirley
>

The problem is we may stop the tx queue when there no enough capacity to 
place packets, at this moment  we depends on the tx interrupt to 
re-enable the tx queue. So if we didn't poll the vhost during callback, 
guest may lose the tx interrupt to re-enable the tx queue which could 
stall the whole tx queue.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 17, 2012, 3:34 p.m. UTC | #9

On Thu, 2012-05-17 at 10:50 +0800, Jason Wang wrote:
> The problem is we may stop the tx queue when there no enough capacity
> to 
> place packets, at this moment  we depends on the tx interrupt to 
> re-enable the tx queue. So if we didn't poll the vhost during
> callback, 
> guest may lose the tx interrupt to re-enable the tx queue which could 
> stall the whole tx queue. 

VHOST_MAX_PEND should handle the capacity. 

Hasn't the above situation been handled in handle_tx() code?:
...
                        if (unlikely(num_pends > VHOST_MAX_PEND)) {
                                tx_poll_start(net, sock);
                                set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
                                break;
                        }
...

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 18, 2012, 9:58 a.m. UTC | #10

On 05/17/2012 11:34 PM, Shirley Ma wrote:
> On Thu, 2012-05-17 at 10:50 +0800, Jason Wang wrote:
>> The problem is we may stop the tx queue when there no enough capacity
>> to
>> place packets, at this moment  we depends on the tx interrupt to
>> re-enable the tx queue. So if we didn't poll the vhost during
>> callback,
>> guest may lose the tx interrupt to re-enable the tx queue which could
>> stall the whole tx queue.
> VHOST_MAX_PEND should handle the capacity.
>
> Hasn't the above situation been handled in handle_tx() code?:
> ...
>                          if (unlikely(num_pends>  VHOST_MAX_PEND)) {
>                                  tx_poll_start(net, sock);
>                                  set_bit(SOCK_ASYNC_NOSPACE,&sock->flags);
>                                  break;
>                          }
> ...
>
> Thanks
> Shirley

It may not help in because:

- tx polling depends on skb_orphan() which is often called by device 
driver when it place the packet into the queue of the devices instead 
of  when the packets were sent. So it was too early for vhost to be 
notified.

- it only works when the pending DMAs exceeds VHOST_MAX_PEND, it's 
highly possible that guest needs to be notified when the pending packets 
isn't so much.

So this piece of code may not help and could be removed and we need to 
poll the virt-queue during zerocopy callback ( through it could be 
further optimized but may not be easy).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 18, 2012, 3:29 p.m. UTC | #11

On Fri, 2012-05-18 at 17:58 +0800, Jason Wang wrote:
> On 05/17/2012 11:34 PM, Shirley Ma wrote:
> > On Thu, 2012-05-17 at 10:50 +0800, Jason Wang wrote:
> >> The problem is we may stop the tx queue when there no enough
> capacity
> >> to
> >> place packets, at this moment  we depends on the tx interrupt to
> >> re-enable the tx queue. So if we didn't poll the vhost during
> >> callback,
> >> guest may lose the tx interrupt to re-enable the tx queue which
> could
> >> stall the whole tx queue.
> > VHOST_MAX_PEND should handle the capacity.
> >
> > Hasn't the above situation been handled in handle_tx() code?:
> > ...
> >                          if (unlikely(num_pends>  VHOST_MAX_PEND)) {
> >                                  tx_poll_start(net, sock);
> >
> set_bit(SOCK_ASYNC_NOSPACE,&sock->flags);
> >                                  break;
> >                          }
> > ...
> >
> > Thanks
> > Shirley
> 
> It may not help in because:
> 
> - tx polling depends on skb_orphan() which is often called by device 
> driver when it place the packet into the queue of the devices instead 
> of  when the packets were sent. So it was too early for vhost to be 
> notified.
Then do you think it's better to replace with vhost_poll_queue here
instead?

> - it only works when the pending DMAs exceeds VHOST_MAX_PEND, it's 
> highly possible that guest needs to be notified when the pending
> packets 
> isn't so much.
In which situation the guest needs to be notified when there is no TX
besides buffers run out?

> So this piece of code may not help and could be removed and we need
> to 
> poll the virt-queue during zerocopy callback ( through it could be 
> further optimized but may not be easy). 

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 21, 2012, 5:22 a.m. UTC | #12

On 05/17/2012 03:08 AM, Shirley Ma wrote:
> On Wed, 2012-05-16 at 21:36 +0300, Michael S. Tsirkin wrote:
>> On Wed, May 16, 2012 at 10:32:05AM -0700, Shirley Ma wrote:
>>> On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
>>>> On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
>>>>> On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
>>>>>>>>    drivers/vhost/vhost.c |    1 +
>>>>>>>>    1 files changed, 1 insertions(+), 0 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>> index 947f00d..7b75fdf 100644
>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void
>> *arg)
>>>>>>>>           struct vhost_ubuf_ref *ubufs = ubuf->arg;
>>>>>>>>           struct vhost_virtqueue *vq = ubufs->vq;
>>>>>>>>
>>>>>>>> +       vhost_poll_queue(&vq->poll);
>>>>>>>>           /* set len = 1 to mark this desc buffers done DMA
>> */
>>>>>>>>           vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
>>>>>>>>           kref_put(&ubufs->kref,
>> vhost_zerocopy_done_signal);
>>>>>>> Doing so, we might have redundant vhost_poll_queue(). Do you
>>>> know in
>>>>>>> which scenario there might be missing of adding and
>> signaling
>>>> during
>>>>>>> zerocopy?
>>>>>> Yes, as we only do signaling and adding during tx work, if
>> there's
>>>> no
>>>>>> tx
>>>>>> work when the skb were sent, we may lose the opportunity to
>> let
>>>> guest
>>>>>> know about the completion. It's easy to be reproduced with
>> netperf
>>>>>> test.
>>>>> The reason which host signals guest is to free guest tx buffers,
>> if
>>>>> there is no tx work, then it's not necessary to signal the guest
>>>> unless
>>>>> guest runs out of memory. The pending buffers will be released
>>>>> virtio_net device gone.
>>>>>
>>>>> What's the behavior of netperf test when you hit this situation?
>>>>>
>>>>> Thanks
>>>>> Shirley
>>>> IIRC guest networking seems to be lost.
>>> It seems vhost_enable_notify is missing in somewhere else?
>>>
>>> Thanks
>>> Shirley
>> Donnu. I see virtio sending packets but they do not get
>> to tun on host. debugging.
> I looked at the code, if (zerocopy) check is needed for below code:
>
> +	if (zerocopy) {
>                          num_pends = likely(vq->upend_idx>= vq->done_idx) ?
>                                      (vq->upend_idx - vq->done_idx) :
>                                      (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
>                          if (unlikely(num_pends>  VHOST_MAX_PEND)) {
>                                  tx_poll_start(net, sock);
> 				vhost_poll_queue
>                                  set_bit(SOCK_ASYNC_NOSPACE,&sock->flags);
>                                  break;
>                          }
> +	}
>                          if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>                                  vhost_disable_notify(&net->dev, vq);
>                                  continue;
>                          }
>                          break;

No objection to add this but looks no difference to me as we won't touch 
upend_idx and done_idx when zercocopy is not used.

>
> Second, whether it's possible the problem comes from tx_poll_start()? In
> some situation, vhost_poll_wakeup() is not being called?
>
> Thanks
> Shirley
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 21, 2012, 6:05 a.m. UTC | #13

On 05/18/2012 11:29 PM, Shirley Ma wrote:
> On Fri, 2012-05-18 at 17:58 +0800, Jason Wang wrote:
>> On 05/17/2012 11:34 PM, Shirley Ma wrote:
>>> On Thu, 2012-05-17 at 10:50 +0800, Jason Wang wrote:
>>>> The problem is we may stop the tx queue when there no enough
>> capacity
>>>> to
>>>> place packets, at this moment  we depends on the tx interrupt to
>>>> re-enable the tx queue. So if we didn't poll the vhost during
>>>> callback,
>>>> guest may lose the tx interrupt to re-enable the tx queue which
>> could
>>>> stall the whole tx queue.
>>> VHOST_MAX_PEND should handle the capacity.
>>>
>>> Hasn't the above situation been handled in handle_tx() code?:
>>> ...
>>>                           if (unlikely(num_pends>   VHOST_MAX_PEND)) {
>>>                                   tx_poll_start(net, sock);
>>>
>> set_bit(SOCK_ASYNC_NOSPACE,&sock->flags);
>>>                                   break;
>>>                           }
>>> ...
>>>
>>> Thanks
>>> Shirley
>> It may not help in because:
>>
>> - tx polling depends on skb_orphan() which is often called by device
>> driver when it place the packet into the queue of the devices instead
>> of  when the packets were sent. So it was too early for vhost to be
>> notified.
> Then do you think it's better to replace with vhost_poll_queue here
> instead?

Just like what does this patch do - calling vhost_poll_queue() in 
vhost_zerocopy_callback().
>> - it only works when the pending DMAs exceeds VHOST_MAX_PEND, it's
>> highly possible that guest needs to be notified when the pending
>> packets
>> isn't so much.
> In which situation the guest needs to be notified when there is no TX
> besides buffers run out?

Consider guest call virtqueue_enable_cb_delayed() which means it only 
need to be notified when 3/4 of pending buffers ( about 178 buffers 
(256-MAX_SKB_FRAGS-2)*3/4 ) were sent by host. So vhost_net would notify 
guest when about 60 buffers were pending. Since tx polling is only 
enabled when pending packets exceeds VHOST_MAX_PEND 128, so tx work 
would not be notified to run and guest would never get the interrupt it 
expected to re-enable the queue.

And just like what we've discussed, tx polling based adding and 
signaling is too early for vhost.
>> So this piece of code may not help and could be removed and we need
>> to
>> poll the virt-queue during zerocopy callback ( through it could be
>> further optimized but may not be easy).
> Thanks
> Shirley
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 21, 2012, 3:42 p.m. UTC | #14

On Mon, 2012-05-21 at 14:05 +0800, Jason Wang wrote:
> >> - tx polling depends on skb_orphan() which is often called by
> device
> >> driver when it place the packet into the queue of the devices
> instead
> >> of  when the packets were sent. So it was too early for vhost to be
> >> notified.
> > Then do you think it's better to replace with vhost_poll_queue here
> > instead?
> 
> Just like what does this patch do - calling vhost_poll_queue() in 
> vhost_zerocopy_callback().
> >> - it only works when the pending DMAs exceeds VHOST_MAX_PEND, it's
> >> highly possible that guest needs to be notified when the pending
> >> packets
> >> isn't so much.
> > In which situation the guest needs to be notified when there is no
> TX
> > besides buffers run out?
> 
> Consider guest call virtqueue_enable_cb_delayed() which means it only 
> need to be notified when 3/4 of pending buffers ( about 178 buffers 
> (256-MAX_SKB_FRAGS-2)*3/4 ) were sent by host. So vhost_net would
> notify 
> guest when about 60 buffers were pending. Since tx polling is only 
> enabled when pending packets exceeds VHOST_MAX_PEND 128, so tx work 
> would not be notified to run and guest would never get the interrupt
> it 
> expected to re-enable the queue.

So it seems we still need vhost_enable_notify() in handle_tx when there
is no tx in zerocopy case.

Do you know which one is more expensive: the cost of vhost_poll_queue()
in each zerocopy callback or calling vhost_enable_notify()?

Have you compared the results by removing below code in handle_tx()?

-                       if (unlikely(num_pends > VHOST_MAX_PEND)) {
-                                tx_poll_start(net, sock);
-                                set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
-                                break;
-                        }

> 
> And just like what we've discussed, tx polling based adding and 
> signaling is too early for vhost. 

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 21, 2012, 4:12 p.m. UTC | #15

On Mon, 2012-05-21 at 08:42 -0700, Shirley Ma wrote:
> On Mon, 2012-05-21 at 14:05 +0800, Jason Wang wrote:
> > >> - tx polling depends on skb_orphan() which is often called by
> > device
> > >> driver when it place the packet into the queue of the devices
> > instead
> > >> of  when the packets were sent. So it was too early for vhost to
> be
> > >> notified.
> > > Then do you think it's better to replace with vhost_poll_queue
> here
> > > instead?
> > 
> > Just like what does this patch do - calling vhost_poll_queue() in 
> > vhost_zerocopy_callback().
> > >> - it only works when the pending DMAs exceeds VHOST_MAX_PEND,
> it's
> > >> highly possible that guest needs to be notified when the pending
> > >> packets
> > >> isn't so much.
> > > In which situation the guest needs to be notified when there is no
> > TX
> > > besides buffers run out?
> > 
> > Consider guest call virtqueue_enable_cb_delayed() which means it
> only 
> > need to be notified when 3/4 of pending buffers ( about 178 buffers 
> > (256-MAX_SKB_FRAGS-2)*3/4 ) were sent by host. So vhost_net would
> > notify 
> > guest when about 60 buffers were pending. Since tx polling is only 
> > enabled when pending packets exceeds VHOST_MAX_PEND 128, so tx work 
> > would not be notified to run and guest would never get the interrupt
> > it 
> > expected to re-enable the queue.
> 
> So it seems we still need vhost_enable_notify() in handle_tx when there
> is no tx in zerocopy case.
> 
> Do you know which one is more expensive: the cost of
> vhost_poll_queue()
> in each zerocopy callback or calling vhost_enable_notify()?
> 
> Have you compared the results by removing below code in handle_tx()?
> 
> -                       if (unlikely(num_pends > VHOST_MAX_PEND)) {
> -                                tx_poll_start(net, sock);
> -                                set_bit(SOCK_ASYNC_NOSPACE,
> &sock->flags);
> -                                break;
> -                        }
> > 
> > And just like what we've discussed, tx polling based adding and 
> > signaling is too early for vhost. 
> 

Then it could be too early for vhost to notify guest anywhere in
handle_tx for zerocopy. Then we might need to remove any notification in
handle_tx for zerocopy to vhost zerocopy callback instead.

Adding vhost_poll_queue in vhost zerocopy callback unconditionally would
consume unnecessary cpu.

We need to think about a better solution here.

Thanks
Shirley



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 22, 2012, 10:05 a.m. UTC | #16

On 05/21/2012 11:42 PM, Shirley Ma wrote:
> On Mon, 2012-05-21 at 14:05 +0800, Jason Wang wrote:
>>>> - tx polling depends on skb_orphan() which is often called by
>> device
>>>> driver when it place the packet into the queue of the devices
>> instead
>>>> of  when the packets were sent. So it was too early for vhost to be
>>>> notified.
>>> Then do you think it's better to replace with vhost_poll_queue here
>>> instead?
>> Just like what does this patch do - calling vhost_poll_queue() in
>> vhost_zerocopy_callback().
>>>> - it only works when the pending DMAs exceeds VHOST_MAX_PEND, it's
>>>> highly possible that guest needs to be notified when the pending
>>>> packets
>>>> isn't so much.
>>> In which situation the guest needs to be notified when there is no
>> TX
>>> besides buffers run out?
>> Consider guest call virtqueue_enable_cb_delayed() which means it only
>> need to be notified when 3/4 of pending buffers ( about 178 buffers
>> (256-MAX_SKB_FRAGS-2)*3/4 ) were sent by host. So vhost_net would
>> notify
>> guest when about 60 buffers were pending. Since tx polling is only
>> enabled when pending packets exceeds VHOST_MAX_PEND 128, so tx work
>> would not be notified to run and guest would never get the interrupt
>> it
>> expected to re-enable the queue.
> So it seems we still need vhost_enable_notify() in handle_tx when there
> is no tx in zerocopy case.
>
> Do you know which one is more expensive: the cost of vhost_poll_queue()
> in each zerocopy callback or calling vhost_enable_notify()?

Didn't follow here, do you mean vhost_signal() here?
>
> Have you compared the results by removing below code in handle_tx()?
>
> -                       if (unlikely(num_pends>  VHOST_MAX_PEND)) {
> -                                tx_poll_start(net, sock);
> -                                set_bit(SOCK_ASYNC_NOSPACE,&sock->flags);
> -                                break;
> -                        }

I remember I've done some basic test when I send this patch, there's no 
much increasing of cpu utilization. Would double check this again.
>> And just like what we've discussed, tx polling based adding and
>> signaling is too early for vhost.
> Thanks
> Shirley
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 22, 2012, 10:13 a.m. UTC | #17

On 05/22/2012 12:12 AM, Shirley Ma wrote:
> Then it could be too early for vhost to notify guest anywhere in
> handle_tx for zerocopy. Then we might need to remove any notification in
> handle_tx for zerocopy to vhost zerocopy callback instead.
>
> Adding vhost_poll_queue in vhost zerocopy callback unconditionally would
> consume unnecessary cpu.
>
> We need to think about a better solution here.

A possible idea is only call vhost_poll_queue only when the packet of 
used_event were sent: if there's no out-of-order completion vhost could 
signal the guest; if there is, let the pending packets before used_event 
call vhost_poll_queue.
> Thanks
> Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shirley Ma May 22, 2012, 3:55 p.m. UTC | #18

On Tue, 2012-05-22 at 18:05 +0800, Jason Wang wrote:
> On 05/21/2012 11:42 PM, Shirley Ma wrote:
> > On Mon, 2012-05-21 at 14:05 +0800, Jason Wang wrote:
> >>>> - tx polling depends on skb_orphan() which is often called by
> >> device
> >>>> driver when it place the packet into the queue of the devices
> >> instead
> >>>> of  when the packets were sent. So it was too early for vhost to
> be
> >>>> notified.
> >>> Then do you think it's better to replace with vhost_poll_queue
> here
> >>> instead?
> >> Just like what does this patch do - calling vhost_poll_queue() in
> >> vhost_zerocopy_callback().
> >>>> - it only works when the pending DMAs exceeds VHOST_MAX_PEND,
> it's
> >>>> highly possible that guest needs to be notified when the pending
> >>>> packets
> >>>> isn't so much.
> >>> In which situation the guest needs to be notified when there is no
> >> TX
> >>> besides buffers run out?
> >> Consider guest call virtqueue_enable_cb_delayed() which means it
> only
> >> need to be notified when 3/4 of pending buffers ( about 178 buffers
> >> (256-MAX_SKB_FRAGS-2)*3/4 ) were sent by host. So vhost_net would
> >> notify
> >> guest when about 60 buffers were pending. Since tx polling is only
> >> enabled when pending packets exceeds VHOST_MAX_PEND 128, so tx work
> >> would not be notified to run and guest would never get the
> interrupt
> >> it
> >> expected to re-enable the queue.
> > So it seems we still need vhost_enable_notify() in handle_tx when
> there
> > is no tx in zerocopy case.
> >
> > Do you know which one is more expensive: the cost of
> vhost_poll_queue()
> > in each zerocopy callback or calling vhost_enable_notify()?
> 
> Didn't follow here, do you mean vhost_signal() here? 

I meant removing the code in handle_tx for zerocopy as below:

+	if (zcopy) {
                        /* If more outstanding DMAs, queue the work.
                         * Handle upend_idx wrap around
                         */
                        num_pends = likely(vq->upend_idx >= vq->done_idx) ?
                                    (vq->upend_idx - vq->done_idx) :
                                    (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
+			/* zerocopy vhost_enable_notify is under zerocopy callback
+			 * since it could be too early to notify here */
+			break;
+	}
-                       if (unlikely(num_pends > VHOST_MAX_PEND)) {
-                                tx_poll_start(net, sock);
-                                set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
-                                break;
-                        }
                        if (unlikely(vhost_enable_notify(&net->dev, vq))) {
                                vhost_disable_notify(&net->dev, vq);
                                continue;
                        }
                        break;

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang May 23, 2012, 10:31 a.m. UTC | #19

On 05/22/2012 11:55 PM, Shirley Ma wrote:
> On Tue, 2012-05-22 at 18:05 +0800, Jason Wang wrote:
>> On 05/21/2012 11:42 PM, Shirley Ma wrote:
>>> On Mon, 2012-05-21 at 14:05 +0800, Jason Wang wrote:
>>>>>> - tx polling depends on skb_orphan() which is often called by
>>>> device
>>>>>> driver when it place the packet into the queue of the devices
>>>> instead
>>>>>> of  when the packets were sent. So it was too early for vhost to
>> be
>>>>>> notified.
>>>>> Then do you think it's better to replace with vhost_poll_queue
>> here
>>>>> instead?
>>>> Just like what does this patch do - calling vhost_poll_queue() in
>>>> vhost_zerocopy_callback().
>>>>>> - it only works when the pending DMAs exceeds VHOST_MAX_PEND,
>> it's
>>>>>> highly possible that guest needs to be notified when the pending
>>>>>> packets
>>>>>> isn't so much.
>>>>> In which situation the guest needs to be notified when there is no
>>>> TX
>>>>> besides buffers run out?
>>>> Consider guest call virtqueue_enable_cb_delayed() which means it
>> only
>>>> need to be notified when 3/4 of pending buffers ( about 178 buffers
>>>> (256-MAX_SKB_FRAGS-2)*3/4 ) were sent by host. So vhost_net would
>>>> notify
>>>> guest when about 60 buffers were pending. Since tx polling is only
>>>> enabled when pending packets exceeds VHOST_MAX_PEND 128, so tx work
>>>> would not be notified to run and guest would never get the
>> interrupt
>>>> it
>>>> expected to re-enable the queue.
>>> So it seems we still need vhost_enable_notify() in handle_tx when
>> there
>>> is no tx in zerocopy case.
>>>
>>> Do you know which one is more expensive: the cost of
>> vhost_poll_queue()
>>> in each zerocopy callback or calling vhost_enable_notify()?
>> Didn't follow here, do you mean vhost_signal() here?
> I meant removing the code in handle_tx for zerocopy as below:
>
> +	if (zcopy) {
>                          /* If more outstanding DMAs, queue the work.
>                           * Handle upend_idx wrap around
>                           */
>                          num_pends = likely(vq->upend_idx>= vq->done_idx) ?
>                                      (vq->upend_idx - vq->done_idx) :
>                                      (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
> +			/* zerocopy vhost_enable_notify is under zerocopy callback
> +			 * since it could be too early to notify here */
> +			break;
> +	}
> -                       if (unlikely(num_pends>  VHOST_MAX_PEND)) {
> -                                tx_poll_start(net, sock);
> -                                set_bit(SOCK_ASYNC_NOSPACE,&sock->flags);
> -                                break;
> -                        }
>                          if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>                                  vhost_disable_notify(&net->dev, vq);
>                                  continue;
>                          }
>                          break;

Didn't think this can work well as the notification from guest were 
disabled forever.

>
> Thanks
> Shirley
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[V2,9/9] vhost: zerocopy: poll vq in zerocopy callback

Commit Message

Comments

Patch