[{"id":1775803,"web_url":"http://patchwork.ozlabs.org/comment/1775803/","msgid":"<20170926222223-mutt-send-email-mst@kernel.org>","list_archive_url":null,"date":"2017-09-26T19:25:22","subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","submitter":{"id":2235,"url":"http://patchwork.ozlabs.org/api/people/2235/","name":"Michael S. Tsirkin","email":"mst@redhat.com"},"content":"On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote:\n> This patch implements basic batched processing of tx virtqueue by\n> prefetching desc indices and updating used ring in a batch. For\n> non-zerocopy case, vq->heads were used for storing the prefetched\n> indices and updating used ring. It is also a requirement for doing\n> more batching on top. For zerocopy case and for simplicity, batched\n> processing were simply disabled by only fetching and processing one\n> descriptor at a time, this could be optimized in the future.\n> \n> XDP_DROP (without touching skb) on tun (with Moongen in guest) with\n> zercopy disabled:\n> \n> Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:\n> Before: 3.20Mpps\n> After:  3.90Mpps (+22%)\n> \n> No differences were seen with zerocopy enabled.\n> \n> Signed-off-by: Jason Wang <jasowang@redhat.com>\n\nSo where is the speedup coming from? I'd guess the ring is\nhot in cache, it's faster to access it in one go, then\npass many packets to net stack. Is that right?\n\nAnother possibility is better code cache locality.\n\nSo how about this patchset is refactored:\n\n1. use existing APIs just first get packets then\n   transmit them all then use them all\n2. add new APIs and move the loop into vhost core\n   for more speedups\n\n\n\n> ---\n>  drivers/vhost/net.c   | 215 ++++++++++++++++++++++++++++----------------------\n>  drivers/vhost/vhost.c |   2 +-\n>  2 files changed, 121 insertions(+), 96 deletions(-)\n> \n> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c\n> index c89640e..c439892 100644\n> --- a/drivers/vhost/net.c\n> +++ b/drivers/vhost/net.c\n> @@ -408,27 +408,25 @@ static int vhost_net_enable_vq(struct vhost_net *n,\n>  \treturn vhost_poll_start(poll, sock->file);\n>  }\n>  \n> -static int vhost_net_tx_get_vq_desc(struct vhost_net *net,\n> -\t\t\t\t    struct vhost_virtqueue *vq,\n> -\t\t\t\t    struct iovec iov[], unsigned int iov_size,\n> -\t\t\t\t    unsigned int *out_num, unsigned int *in_num)\n> +static bool vhost_net_tx_avail(struct vhost_net *net,\n> +\t\t\t       struct vhost_virtqueue *vq)\n>  {\n>  \tunsigned long uninitialized_var(endtime);\n> -\tint r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),\n> -\t\t\t\t  out_num, in_num, NULL, NULL);\n>  \n> -\tif (r == vq->num && vq->busyloop_timeout) {\n> -\t\tpreempt_disable();\n> -\t\tendtime = busy_clock() + vq->busyloop_timeout;\n> -\t\twhile (vhost_can_busy_poll(vq->dev, endtime) &&\n> -\t\t       vhost_vq_avail_empty(vq->dev, vq))\n> -\t\t\tcpu_relax();\n> -\t\tpreempt_enable();\n> -\t\tr = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),\n> -\t\t\t\t      out_num, in_num, NULL, NULL);\n> -\t}\n> +\tif (!vq->busyloop_timeout)\n> +\t\treturn false;\n>  \n> -\treturn r;\n> +\tif (!vhost_vq_avail_empty(vq->dev, vq))\n> +\t\treturn true;\n> +\n> +\tpreempt_disable();\n> +\tendtime = busy_clock() + vq->busyloop_timeout;\n> +\twhile (vhost_can_busy_poll(vq->dev, endtime) &&\n> +\t\tvhost_vq_avail_empty(vq->dev, vq))\n> +\t\tcpu_relax();\n> +\tpreempt_enable();\n> +\n> +\treturn !vhost_vq_avail_empty(vq->dev, vq);\n>  }\n>  \n>  static bool vhost_exceeds_maxpend(struct vhost_net *net)\n> @@ -446,8 +444,9 @@ static void handle_tx(struct vhost_net *net)\n>  {\n>  \tstruct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];\n>  \tstruct vhost_virtqueue *vq = &nvq->vq;\n> +\tstruct vring_used_elem used, *heads = vq->heads;\n>  \tunsigned out, in;\n> -\tint head;\n> +\tint avails, head;\n>  \tstruct msghdr msg = {\n>  \t\t.msg_name = NULL,\n>  \t\t.msg_namelen = 0,\n> @@ -461,6 +460,7 @@ static void handle_tx(struct vhost_net *net)\n>  \tstruct socket *sock;\n>  \tstruct vhost_net_ubuf_ref *uninitialized_var(ubufs);\n>  \tbool zcopy, zcopy_used;\n> +\tint i, batched = VHOST_NET_BATCH;\n>  \n>  \tmutex_lock(&vq->mutex);\n>  \tsock = vq->private_data;\n> @@ -475,6 +475,12 @@ static void handle_tx(struct vhost_net *net)\n>  \thdr_size = nvq->vhost_hlen;\n>  \tzcopy = nvq->ubufs;\n>  \n> +\t/* Disable zerocopy batched fetching for simplicity */\n> +\tif (zcopy) {\n> +\t\theads = &used;\n> +\t\tbatched = 1;\n> +\t}\n> +\n>  \tfor (;;) {\n>  \t\t/* Release DMAs done buffers first */\n>  \t\tif (zcopy)\n> @@ -486,95 +492,114 @@ static void handle_tx(struct vhost_net *net)\n>  \t\tif (unlikely(vhost_exceeds_maxpend(net)))\n>  \t\t\tbreak;\n>  \n> -\t\thead = vhost_net_tx_get_vq_desc(net, vq, vq->iov,\n> -\t\t\t\t\t\tARRAY_SIZE(vq->iov),\n> -\t\t\t\t\t\t&out, &in);\n> +\t\tavails = vhost_prefetch_desc_indices(vq, heads, batched, !zcopy);\n>  \t\t/* On error, stop handling until the next kick. */\n> -\t\tif (unlikely(head < 0))\n> +\t\tif (unlikely(avails < 0))\n>  \t\t\tbreak;\n> -\t\t/* Nothing new?  Wait for eventfd to tell us they refilled. */\n> -\t\tif (head == vq->num) {\n> +\t\t/* Nothing new?  Busy poll for a while or wait for\n> +\t\t * eventfd to tell us they refilled. */\n> +\t\tif (!avails) {\n> +\t\t\tif (vhost_net_tx_avail(net, vq))\n> +\t\t\t\tcontinue;\n>  \t\t\tif (unlikely(vhost_enable_notify(&net->dev, vq))) {\n>  \t\t\t\tvhost_disable_notify(&net->dev, vq);\n>  \t\t\t\tcontinue;\n>  \t\t\t}\n>  \t\t\tbreak;\n>  \t\t}\n> -\t\tif (in) {\n> -\t\t\tvq_err(vq, \"Unexpected descriptor format for TX: \"\n> -\t\t\t       \"out %d, int %d\\n\", out, in);\n> -\t\t\tbreak;\n> -\t\t}\n> -\t\t/* Skip header. TODO: support TSO. */\n> -\t\tlen = iov_length(vq->iov, out);\n> -\t\tiov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);\n> -\t\tiov_iter_advance(&msg.msg_iter, hdr_size);\n> -\t\t/* Sanity check */\n> -\t\tif (!msg_data_left(&msg)) {\n> -\t\t\tvq_err(vq, \"Unexpected header len for TX: \"\n> -\t\t\t       \"%zd expected %zd\\n\",\n> -\t\t\t       len, hdr_size);\n> -\t\t\tbreak;\n> -\t\t}\n> -\t\tlen = msg_data_left(&msg);\n> -\n> -\t\tzcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN\n> -\t\t\t\t   && (nvq->upend_idx + 1) % UIO_MAXIOV !=\n> -\t\t\t\t      nvq->done_idx\n> -\t\t\t\t   && vhost_net_tx_select_zcopy(net);\n> -\n> -\t\t/* use msg_control to pass vhost zerocopy ubuf info to skb */\n> -\t\tif (zcopy_used) {\n> -\t\t\tstruct ubuf_info *ubuf;\n> -\t\t\tubuf = nvq->ubuf_info + nvq->upend_idx;\n> -\n> -\t\t\tvq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);\n> -\t\t\tvq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;\n> -\t\t\tubuf->callback = vhost_zerocopy_callback;\n> -\t\t\tubuf->ctx = nvq->ubufs;\n> -\t\t\tubuf->desc = nvq->upend_idx;\n> -\t\t\trefcount_set(&ubuf->refcnt, 1);\n> -\t\t\tmsg.msg_control = ubuf;\n> -\t\t\tmsg.msg_controllen = sizeof(ubuf);\n> -\t\t\tubufs = nvq->ubufs;\n> -\t\t\tatomic_inc(&ubufs->refcount);\n> -\t\t\tnvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;\n> -\t\t} else {\n> -\t\t\tmsg.msg_control = NULL;\n> -\t\t\tubufs = NULL;\n> -\t\t}\n> +\t\tfor (i = 0; i < avails; i++) {\n> +\t\t\thead = __vhost_get_vq_desc(vq, vq->iov,\n> +\t\t\t\t\t\t   ARRAY_SIZE(vq->iov),\n> +\t\t\t\t\t\t   &out, &in, NULL, NULL,\n> +\t\t\t\t\t       vhost16_to_cpu(vq, heads[i].id));\n> +\t\t\tif (in) {\n> +\t\t\t\tvq_err(vq, \"Unexpected descriptor format for \"\n> +\t\t\t\t\t   \"TX: out %d, int %d\\n\", out, in);\n> +\t\t\t\tgoto out;\n> +\t\t\t}\n>  \n> -\t\ttotal_len += len;\n> -\t\tif (total_len < VHOST_NET_WEIGHT &&\n> -\t\t    !vhost_vq_avail_empty(&net->dev, vq) &&\n> -\t\t    likely(!vhost_exceeds_maxpend(net))) {\n> -\t\t\tmsg.msg_flags |= MSG_MORE;\n> -\t\t} else {\n> -\t\t\tmsg.msg_flags &= ~MSG_MORE;\n> -\t\t}\n> +\t\t\t/* Skip header. TODO: support TSO. */\n> +\t\t\tlen = iov_length(vq->iov, out);\n> +\t\t\tiov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);\n> +\t\t\tiov_iter_advance(&msg.msg_iter, hdr_size);\n> +\t\t\t/* Sanity check */\n> +\t\t\tif (!msg_data_left(&msg)) {\n> +\t\t\t\tvq_err(vq, \"Unexpected header len for TX: \"\n> +\t\t\t\t\t\"%zd expected %zd\\n\",\n> +\t\t\t\t\tlen, hdr_size);\n> +\t\t\t\tgoto out;\n> +\t\t\t}\n> +\t\t\tlen = msg_data_left(&msg);\n>  \n> -\t\t/* TODO: Check specific error and bomb out unless ENOBUFS? */\n> -\t\terr = sock->ops->sendmsg(sock, &msg, len);\n> -\t\tif (unlikely(err < 0)) {\n> +\t\t\tzcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN\n> +\t\t\t\t     && (nvq->upend_idx + 1) % UIO_MAXIOV !=\n> +\t\t\t\t\tnvq->done_idx\n> +\t\t\t\t     && vhost_net_tx_select_zcopy(net);\n> +\n> +\t\t\t/* use msg_control to pass vhost zerocopy ubuf\n> +\t\t\t * info to skb\n> +\t\t\t */\n>  \t\t\tif (zcopy_used) {\n> -\t\t\t\tvhost_net_ubuf_put(ubufs);\n> -\t\t\t\tnvq->upend_idx = ((unsigned)nvq->upend_idx - 1)\n> -\t\t\t\t\t% UIO_MAXIOV;\n> +\t\t\t\tstruct ubuf_info *ubuf;\n> +\t\t\t\tubuf = nvq->ubuf_info + nvq->upend_idx;\n> +\n> +\t\t\t\tvq->heads[nvq->upend_idx].id =\n> +\t\t\t\t\tcpu_to_vhost32(vq, head);\n> +\t\t\t\tvq->heads[nvq->upend_idx].len =\n> +\t\t\t\t\tVHOST_DMA_IN_PROGRESS;\n> +\t\t\t\tubuf->callback = vhost_zerocopy_callback;\n> +\t\t\t\tubuf->ctx = nvq->ubufs;\n> +\t\t\t\tubuf->desc = nvq->upend_idx;\n> +\t\t\t\trefcount_set(&ubuf->refcnt, 1);\n> +\t\t\t\tmsg.msg_control = ubuf;\n> +\t\t\t\tmsg.msg_controllen = sizeof(ubuf);\n> +\t\t\t\tubufs = nvq->ubufs;\n> +\t\t\t\tatomic_inc(&ubufs->refcount);\n> +\t\t\t\tnvq->upend_idx =\n> +\t\t\t\t\t(nvq->upend_idx + 1) % UIO_MAXIOV;\n> +\t\t\t} else {\n> +\t\t\t\tmsg.msg_control = NULL;\n> +\t\t\t\tubufs = NULL;\n> +\t\t\t}\n> +\n> +\t\t\ttotal_len += len;\n> +\t\t\tif (total_len < VHOST_NET_WEIGHT &&\n> +\t\t\t\t!vhost_vq_avail_empty(&net->dev, vq) &&\n> +\t\t\t\tlikely(!vhost_exceeds_maxpend(net))) {\n> +\t\t\t\tmsg.msg_flags |= MSG_MORE;\n> +\t\t\t} else {\n> +\t\t\t\tmsg.msg_flags &= ~MSG_MORE;\n> +\t\t\t}\n> +\n> +\t\t\t/* TODO: Check specific error and bomb out\n> +\t\t\t * unless ENOBUFS?\n> +\t\t\t */\n> +\t\t\terr = sock->ops->sendmsg(sock, &msg, len);\n> +\t\t\tif (unlikely(err < 0)) {\n> +\t\t\t\tif (zcopy_used) {\n> +\t\t\t\t\tvhost_net_ubuf_put(ubufs);\n> +\t\t\t\t\tnvq->upend_idx =\n> +\t\t\t\t   ((unsigned)nvq->upend_idx - 1) % UIO_MAXIOV;\n> +\t\t\t\t}\n> +\t\t\t\tvhost_discard_vq_desc(vq, 1);\n> +\t\t\t\tgoto out;\n> +\t\t\t}\n> +\t\t\tif (err != len)\n> +\t\t\t\tpr_debug(\"Truncated TX packet: \"\n> +\t\t\t\t\t\" len %d != %zd\\n\", err, len);\n> +\t\t\tif (!zcopy) {\n> +\t\t\t\tvhost_add_used_idx(vq, 1);\n> +\t\t\t\tvhost_signal(&net->dev, vq);\n> +\t\t\t} else if (!zcopy_used) {\n> +\t\t\t\tvhost_add_used_and_signal(&net->dev,\n> +\t\t\t\t\t\t\t  vq, head, 0);\n> +\t\t\t} else\n> +\t\t\t\tvhost_zerocopy_signal_used(net, vq);\n> +\t\t\tvhost_net_tx_packet(net);\n> +\t\t\tif (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n> +\t\t\t\tvhost_poll_queue(&vq->poll);\n> +\t\t\t\tgoto out;\n>  \t\t\t}\n> -\t\t\tvhost_discard_vq_desc(vq, 1);\n> -\t\t\tbreak;\n> -\t\t}\n> -\t\tif (err != len)\n> -\t\t\tpr_debug(\"Truncated TX packet: \"\n> -\t\t\t\t \" len %d != %zd\\n\", err, len);\n> -\t\tif (!zcopy_used)\n> -\t\t\tvhost_add_used_and_signal(&net->dev, vq, head, 0);\n> -\t\telse\n> -\t\t\tvhost_zerocopy_signal_used(net, vq);\n> -\t\tvhost_net_tx_packet(net);\n> -\t\tif (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n> -\t\t\tvhost_poll_queue(&vq->poll);\n> -\t\t\tbreak;\n>  \t\t}\n>  \t}\n>  out:\n> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c\n> index 6532cda..8764df5 100644\n> --- a/drivers/vhost/vhost.c\n> +++ b/drivers/vhost/vhost.c\n> @@ -392,7 +392,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)\n>  \t\tvq->indirect = kmalloc(sizeof *vq->indirect * UIO_MAXIOV,\n>  \t\t\t\t       GFP_KERNEL);\n>  \t\tvq->log = kmalloc(sizeof *vq->log * UIO_MAXIOV, GFP_KERNEL);\n> -\t\tvq->heads = kmalloc(sizeof *vq->heads * UIO_MAXIOV, GFP_KERNEL);\n> +\t\tvq->heads = kzalloc(sizeof *vq->heads * UIO_MAXIOV, GFP_KERNEL);\n>  \t\tif (!vq->indirect || !vq->log || !vq->heads)\n>  \t\t\tgoto err_nomem;\n>  \t}\n> -- \n> 2.7.4","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=mst@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y1rWz54crz9t3m\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 27 Sep 2017 05:25:34 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S968186AbdIZTZZ (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 26 Sep 2017 15:25:25 -0400","from mx1.redhat.com ([209.132.183.28]:50574 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S965423AbdIZTZX (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tTue, 26 Sep 2017 15:25:23 -0400","from smtp.corp.redhat.com\n\t(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id ADD1F7E42F;\n\tTue, 26 Sep 2017 19:25:23 +0000 (UTC)","from redhat.com (ovpn-122-9.rdu2.redhat.com [10.10.122.9])\n\tby smtp.corp.redhat.com (Postfix) with SMTP id 1D77A424D;\n\tTue, 26 Sep 2017 19:25:23 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com ADD1F7E42F","Date":"Tue, 26 Sep 2017 22:25:22 +0300","From":"\"Michael S. Tsirkin\" <mst@redhat.com>","To":"Jason Wang <jasowang@redhat.com>","Cc":"virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org, kvm@vger.kernel.org","Subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","Message-ID":"<20170926222223-mutt-send-email-mst@kernel.org>","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>\n\t<1506067355-5771-6-git-send-email-jasowang@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<1506067355-5771-6-git-send-email-jasowang@redhat.com>","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.12","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.27]);\n\tTue, 26 Sep 2017 19:25:23 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1775989,"web_url":"http://patchwork.ozlabs.org/comment/1775989/","msgid":"<16ea7512-d770-21ef-edb6-3ada51f08592@redhat.com>","list_archive_url":null,"date":"2017-09-27T02:04:18","subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","submitter":{"id":5225,"url":"http://patchwork.ozlabs.org/api/people/5225/","name":"Jason Wang","email":"jasowang@redhat.com"},"content":"On 2017年09月27日 03:25, Michael S. Tsirkin wrote:\n> On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote:\n>> This patch implements basic batched processing of tx virtqueue by\n>> prefetching desc indices and updating used ring in a batch. For\n>> non-zerocopy case, vq->heads were used for storing the prefetched\n>> indices and updating used ring. It is also a requirement for doing\n>> more batching on top. For zerocopy case and for simplicity, batched\n>> processing were simply disabled by only fetching and processing one\n>> descriptor at a time, this could be optimized in the future.\n>>\n>> XDP_DROP (without touching skb) on tun (with Moongen in guest) with\n>> zercopy disabled:\n>>\n>> Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:\n>> Before: 3.20Mpps\n>> After:  3.90Mpps (+22%)\n>>\n>> No differences were seen with zerocopy enabled.\n>>\n>> Signed-off-by: Jason Wang <jasowang@redhat.com>\n> So where is the speedup coming from? I'd guess the ring is\n> hot in cache, it's faster to access it in one go, then\n> pass many packets to net stack. Is that right?\n>\n> Another possibility is better code cache locality.\n\nYes, I think the speed up comes from:\n\n- less cache misses\n- less cache line bounce when virtqueue is about to be full (guest is \nfaster than host which is the case of MoonGen)\n- less memory barriers\n- possible faster copy speed by using copy_to_user() on modern CPUs\n\n>\n> So how about this patchset is refactored:\n>\n> 1. use existing APIs just first get packets then\n>     transmit them all then use them all\n\nLooks like current API can not get packets first, it only support get \npacket one by one (if you mean vhost_get_vq_desc()). And used ring \nupdating may get more misses in this case.\n\n> 2. add new APIs and move the loop into vhost core\n>     for more speedups\n\nI don't see any advantages, looks like just need some e.g callbacks in \nthis case.\n\nThanks","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=jasowang@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y21NQ63Xlz9t3m\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 27 Sep 2017 12:04:38 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S969942AbdI0CE1 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 26 Sep 2017 22:04:27 -0400","from mx1.redhat.com ([209.132.183.28]:52642 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S968076AbdI0CEZ (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tTue, 26 Sep 2017 22:04:25 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 8BFF581DFD;\n\tWed, 27 Sep 2017 02:04:25 +0000 (UTC)","from [10.72.12.40] (ovpn-12-40.pek2.redhat.com [10.72.12.40])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 0F46174AC1;\n\tWed, 27 Sep 2017 02:04:21 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 8BFF581DFD","Subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","To":"\"Michael S. Tsirkin\" <mst@redhat.com>","Cc":"virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org, kvm@vger.kernel.org","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>\n\t<1506067355-5771-6-git-send-email-jasowang@redhat.com>\n\t<20170926222223-mutt-send-email-mst@kernel.org>","From":"Jason Wang <jasowang@redhat.com>","Message-ID":"<16ea7512-d770-21ef-edb6-3ada51f08592@redhat.com>","Date":"Wed, 27 Sep 2017 10:04:18 +0800","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170926222223-mutt-send-email-mst@kernel.org>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Transfer-Encoding":"8bit","Content-Language":"en-US","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.25]);\n\tWed, 27 Sep 2017 02:04:25 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776655,"web_url":"http://patchwork.ozlabs.org/comment/1776655/","msgid":"<20170928011724-mutt-send-email-mst@kernel.org>","list_archive_url":null,"date":"2017-09-27T22:19:40","subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","submitter":{"id":2235,"url":"http://patchwork.ozlabs.org/api/people/2235/","name":"Michael S. Tsirkin","email":"mst@redhat.com"},"content":"On Wed, Sep 27, 2017 at 10:04:18AM +0800, Jason Wang wrote:\n> \n> \n> On 2017年09月27日 03:25, Michael S. Tsirkin wrote:\n> > On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote:\n> > > This patch implements basic batched processing of tx virtqueue by\n> > > prefetching desc indices and updating used ring in a batch. For\n> > > non-zerocopy case, vq->heads were used for storing the prefetched\n> > > indices and updating used ring. It is also a requirement for doing\n> > > more batching on top. For zerocopy case and for simplicity, batched\n> > > processing were simply disabled by only fetching and processing one\n> > > descriptor at a time, this could be optimized in the future.\n> > > \n> > > XDP_DROP (without touching skb) on tun (with Moongen in guest) with\n> > > zercopy disabled:\n> > > \n> > > Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:\n> > > Before: 3.20Mpps\n> > > After:  3.90Mpps (+22%)\n> > > \n> > > No differences were seen with zerocopy enabled.\n> > > \n> > > Signed-off-by: Jason Wang <jasowang@redhat.com>\n> > So where is the speedup coming from? I'd guess the ring is\n> > hot in cache, it's faster to access it in one go, then\n> > pass many packets to net stack. Is that right?\n> > \n> > Another possibility is better code cache locality.\n> \n> Yes, I think the speed up comes from:\n> \n> - less cache misses\n> - less cache line bounce when virtqueue is about to be full (guest is faster\n> than host which is the case of MoonGen)\n> - less memory barriers\n> - possible faster copy speed by using copy_to_user() on modern CPUs\n> \n> > \n> > So how about this patchset is refactored:\n> > \n> > 1. use existing APIs just first get packets then\n> >     transmit them all then use them all\n> \n> Looks like current API can not get packets first, it only support get packet\n> one by one (if you mean vhost_get_vq_desc()). And used ring updating may get\n> more misses in this case.\n\nRight. So if you do\n\nfor (...)\n\tvhost_get_vq_desc\n\n\nthen later\n\nfor (...)\n\tvhost_add_used\n\n\nthen you get most of benefits except maybe code cache misses\nand copy_to_user.\n\n\n\n\n\n\n\n> > 2. add new APIs and move the loop into vhost core\n> >     for more speedups\n> \n> I don't see any advantages, looks like just need some e.g callbacks in this\n> case.\n> \n> Thanks\n\nIUC callbacks pretty much destroy the code cache locality advantages,\nIP is jumping around too much.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=mst@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2XLd0BB7z9t6C\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 08:19:53 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752471AbdI0WTn (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 27 Sep 2017 18:19:43 -0400","from mx1.redhat.com ([209.132.183.28]:45822 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1752139AbdI0WTm (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tWed, 27 Sep 2017 18:19:42 -0400","from smtp.corp.redhat.com\n\t(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id BC7AC16A950;\n\tWed, 27 Sep 2017 22:19:41 +0000 (UTC)","from redhat.com (ovpn-123-149.rdu2.redhat.com [10.10.123.149])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id D04A980E7F;\n\tWed, 27 Sep 2017 22:19:40 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com BC7AC16A950","Date":"Thu, 28 Sep 2017 01:19:40 +0300","From":"\"Michael S. Tsirkin\" <mst@redhat.com>","To":"Jason Wang <jasowang@redhat.com>","Cc":"virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org, kvm@vger.kernel.org","Subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","Message-ID":"<20170928011724-mutt-send-email-mst@kernel.org>","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>\n\t<1506067355-5771-6-git-send-email-jasowang@redhat.com>\n\t<20170926222223-mutt-send-email-mst@kernel.org>\n\t<16ea7512-d770-21ef-edb6-3ada51f08592@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<16ea7512-d770-21ef-edb6-3ada51f08592@redhat.com>","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.12","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.27]);\n\tWed, 27 Sep 2017 22:19:41 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776708,"web_url":"http://patchwork.ozlabs.org/comment/1776708/","msgid":"<CAF=yD-LuKXo93kUYS_1fhL88jGmS6LHiTzi=JpSoRucNSp3k4g@mail.gmail.com>","list_archive_url":null,"date":"2017-09-28T00:55:13","subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","submitter":{"id":67615,"url":"http://patchwork.ozlabs.org/api/people/67615/","name":"Willem de Bruijn","email":"willemdebruijn.kernel@gmail.com"},"content":"> @@ -461,6 +460,7 @@ static void handle_tx(struct vhost_net *net)\n>         struct socket *sock;\n>         struct vhost_net_ubuf_ref *uninitialized_var(ubufs);\n>         bool zcopy, zcopy_used;\n> +       int i, batched = VHOST_NET_BATCH;\n>\n>         mutex_lock(&vq->mutex);\n>         sock = vq->private_data;\n> @@ -475,6 +475,12 @@ static void handle_tx(struct vhost_net *net)\n>         hdr_size = nvq->vhost_hlen;\n>         zcopy = nvq->ubufs;\n>\n> +       /* Disable zerocopy batched fetching for simplicity */\n\nThis special case can perhaps be avoided if we no longer block\non vhost_exceeds_maxpend, but revert to copying.\n\n> +       if (zcopy) {\n> +               heads = &used;\n\nCan this special case of batchsize 1 not use vq->heads?\n\n> +               batched = 1;\n> +       }\n> +\n>         for (;;) {\n>                 /* Release DMAs done buffers first */\n>                 if (zcopy)\n> @@ -486,95 +492,114 @@ static void handle_tx(struct vhost_net *net)\n>                 if (unlikely(vhost_exceeds_maxpend(net)))\n>                         break;\n\n> +                       /* TODO: Check specific error and bomb out\n> +                        * unless ENOBUFS?\n> +                        */\n> +                       err = sock->ops->sendmsg(sock, &msg, len);\n> +                       if (unlikely(err < 0)) {\n> +                               if (zcopy_used) {\n> +                                       vhost_net_ubuf_put(ubufs);\n> +                                       nvq->upend_idx =\n> +                                  ((unsigned)nvq->upend_idx - 1) % UIO_MAXIOV;\n> +                               }\n> +                               vhost_discard_vq_desc(vq, 1);\n> +                               goto out;\n> +                       }\n> +                       if (err != len)\n> +                               pr_debug(\"Truncated TX packet: \"\n> +                                       \" len %d != %zd\\n\", err, len);\n> +                       if (!zcopy) {\n> +                               vhost_add_used_idx(vq, 1);\n> +                               vhost_signal(&net->dev, vq);\n> +                       } else if (!zcopy_used) {\n> +                               vhost_add_used_and_signal(&net->dev,\n> +                                                         vq, head, 0);\n\nWhile batching, perhaps can also move this producer index update\nout of the loop and using vhost_add_used_and_signal_n.\n\n> +                       } else\n> +                               vhost_zerocopy_signal_used(net, vq);\n> +                       vhost_net_tx_packet(net);\n> +                       if (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n> +                               vhost_poll_queue(&vq->poll);\n> +                               goto out;\n>                         }\n> -                       vhost_discard_vq_desc(vq, 1);\n> -                       break;\n> -               }\n> -               if (err != len)\n> -                       pr_debug(\"Truncated TX packet: \"\n> -                                \" len %d != %zd\\n\", err, len);\n> -               if (!zcopy_used)\n> -                       vhost_add_used_and_signal(&net->dev, vq, head, 0);\n> -               else\n> -                       vhost_zerocopy_signal_used(net, vq);\n> -               vhost_net_tx_packet(net);\n> -               if (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n> -                       vhost_poll_queue(&vq->poll);\n> -                       break;\n\nThis patch touches many lines just for indentation. If having to touch\nthese lines anyway (dirtying git blame), it may be a good time to move\nthe processing of a single descriptor code into a separate helper function.\nAnd while breaking up, perhaps another helper for setting up ubuf_info.\nIf you agree, preferably in a separate noop refactor patch that precedes\nthe functional changes.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"PLzNS5WM\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2bpv2y0Jz9t5C\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 10:56:07 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752539AbdI1Az4 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 27 Sep 2017 20:55:56 -0400","from mail-oi0-f67.google.com ([209.85.218.67]:33193 \"EHLO\n\tmail-oi0-f67.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1752445AbdI1Azy (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Wed, 27 Sep 2017 20:55:54 -0400","by mail-oi0-f67.google.com with SMTP id z73so14214oia.0;\n\tWed, 27 Sep 2017 17:55:54 -0700 (PDT)","by 10.168.31.195 with HTTP; Wed, 27 Sep 2017 17:55:13 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=mime-version:in-reply-to:references:from:date:message-id:subject:to\n\t:cc; bh=LD7Z1vfiwHbgBZhsPVCFvOv+ncy2ueQq3mVoJLn7NuI=;\n\tb=PLzNS5WMG1tkCoPqtUXQvG9VRNgKEG4ukHPIc02YVA9pIOnXV6NZO/5AwWjocWpHkG\n\tN6InKnkti0MfrMDfWTBfOC1Fot+vkdjS65OWQJTI8wdMUKWQtz9KaiF5ewQbg/My688s\n\tX1oKw3RyVgz0IsDk8tN9nwMiN/bOQPtUxU0lXIPQgb5/TGAGwOoqWDToVcA01zYlenvP\n\ttR2uAyOcAP+V4ihER4ZKpQHrbdOMEmCBfLae98OT52mm/MVGhqcQtRlkA2R5dZi6hxOu\n\tQA4Ge+dtL4Z2VKtSMW0BUlEcf8hyXAceHDk4ifynjtafXWMf6Pz/tBVb4xQyjdHh2rxt\n\thX8g==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=LD7Z1vfiwHbgBZhsPVCFvOv+ncy2ueQq3mVoJLn7NuI=;\n\tb=bLIx296beRSobOUKQ0WuTrcqCuaUXzYl0QceFpmQmdw1h85RF/SYwTRznStLX1lpKh\n\tqkdzmow4WKFVknxlVSo+8S/SJgZNisV6VS8p26I5pFo0JraiEE95oGbucsZVhdPt1OOS\n\tlDLs8q5xGBxseAKLvVJqqwYnOyTjNOIja2m2sb0aYGGhyPpbQJwXXsSiNzdavWLT7Tle\n\tJEoCOA01lyErNEoyq96/NYIzK6aro5gCQZzReCPw8UHg0ZfzPbtcFr3j9JI7fGIiI5j9\n\tkNSgQV+HuEbbtIrrtQjO4d69kmFUkWvpPOXtYdSCu0h9vyjz9Y/vZBYQpc7hWYKfBP9y\n\tSFIg==","X-Gm-Message-State":"AHPjjUh+mT+Pla4QmxKXMy86vOC6sj0qTqOlKpclDNYY1wXf4gZBbJ69\n\tfCMdD9Kns+7WgqhvahEuic2C03kf4elN2F0qd/U=","X-Google-Smtp-Source":"AOwi7QBTHlETh+FEhw0OI2UYTnG5dsZoI6pzhCuqZSBCXQ/y/bT2PBSqdrKRB60K3yhpjgeLAD/zRnwSOL+UFaSDp9Q=","X-Received":"by 10.202.49.14 with SMTP id x14mr1655446oix.425.1506560153751; \n\tWed, 27 Sep 2017 17:55:53 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<1506067355-5771-6-git-send-email-jasowang@redhat.com>","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>\n\t<1506067355-5771-6-git-send-email-jasowang@redhat.com>","From":"Willem de Bruijn <willemdebruijn.kernel@gmail.com>","Date":"Wed, 27 Sep 2017 20:55:13 -0400","Message-ID":"<CAF=yD-LuKXo93kUYS_1fhL88jGmS6LHiTzi=JpSoRucNSp3k4g@mail.gmail.com>","Subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","To":"Jason Wang <jasowang@redhat.com>","Cc":"\"Michael S. Tsirkin\" <mst@redhat.com>,\n\tvirtualization@lists.linux-foundation.org,\n\tNetwork Development <netdev@vger.kernel.org>,\n\tLKML <linux-kernel@vger.kernel.org>, kvm@vger.kernel.org","Content-Type":"text/plain; charset=\"UTF-8\"","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776803,"web_url":"http://patchwork.ozlabs.org/comment/1776803/","msgid":"<632b9bdd-4459-8659-e491-ba125adc2038@redhat.com>","list_archive_url":null,"date":"2017-09-28T07:02:48","subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","submitter":{"id":5225,"url":"http://patchwork.ozlabs.org/api/people/5225/","name":"Jason Wang","email":"jasowang@redhat.com"},"content":"On 2017年09月28日 06:19, Michael S. Tsirkin wrote:\n> On Wed, Sep 27, 2017 at 10:04:18AM +0800, Jason Wang wrote:\n>>\n>> On 2017年09月27日 03:25, Michael S. Tsirkin wrote:\n>>> On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote:\n>>>> This patch implements basic batched processing of tx virtqueue by\n>>>> prefetching desc indices and updating used ring in a batch. For\n>>>> non-zerocopy case, vq->heads were used for storing the prefetched\n>>>> indices and updating used ring. It is also a requirement for doing\n>>>> more batching on top. For zerocopy case and for simplicity, batched\n>>>> processing were simply disabled by only fetching and processing one\n>>>> descriptor at a time, this could be optimized in the future.\n>>>>\n>>>> XDP_DROP (without touching skb) on tun (with Moongen in guest) with\n>>>> zercopy disabled:\n>>>>\n>>>> Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:\n>>>> Before: 3.20Mpps\n>>>> After:  3.90Mpps (+22%)\n>>>>\n>>>> No differences were seen with zerocopy enabled.\n>>>>\n>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>\n>>> So where is the speedup coming from? I'd guess the ring is\n>>> hot in cache, it's faster to access it in one go, then\n>>> pass many packets to net stack. Is that right?\n>>>\n>>> Another possibility is better code cache locality.\n>> Yes, I think the speed up comes from:\n>>\n>> - less cache misses\n>> - less cache line bounce when virtqueue is about to be full (guest is faster\n>> than host which is the case of MoonGen)\n>> - less memory barriers\n>> - possible faster copy speed by using copy_to_user() on modern CPUs\n>>\n>>> So how about this patchset is refactored:\n>>>\n>>> 1. use existing APIs just first get packets then\n>>>      transmit them all then use them all\n>> Looks like current API can not get packets first, it only support get packet\n>> one by one (if you mean vhost_get_vq_desc()). And used ring updating may get\n>> more misses in this case.\n> Right. So if you do\n>\n> for (...)\n> \tvhost_get_vq_desc\n>\n>\n> then later\n>\n> for (...)\n> \tvhost_add_used\n>\n>\n> then you get most of benefits except maybe code cache misses\n> and copy_to_user.\n>\n\nIf you mean vhost_add_used_n(), then more barriers and userspace memory \naccess as well. And you still need to cache vring_used_elem in vq->heads \nor elsewhere. Since you do not batch reading avail ring indexes, more \nwrite miss will happen if the ring is about to be full. So it looks \nslower than what is done in this patch?\n\nAnd if we want to indo more batching on top (do we want this?), we still \nneed new APIs anyway.\n\nThanks\n\n>\n>\n>\n>\n>\n>>> 2. add new APIs and move the loop into vhost core\n>>>      for more speedups\n>> I don't see any advantages, looks like just need some e.g callbacks in this\n>> case.\n>>\n>> Thanks\n> IUC callbacks pretty much destroy the code cache locality advantages,\n> IP is jumping around too much.\n>\n>","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx08.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx08.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=jasowang@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2lyS4SYLz9t5C\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 17:03:12 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751580AbdI1HC5 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 03:02:57 -0400","from mx1.redhat.com ([209.132.183.28]:52232 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751246AbdI1HCz (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 28 Sep 2017 03:02:55 -0400","from smtp.corp.redhat.com\n\t(int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 76ACCC081F34;\n\tThu, 28 Sep 2017 07:02:55 +0000 (UTC)","from [10.72.12.139] (ovpn-12-139.pek2.redhat.com [10.72.12.139])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 889018F884;\n\tThu, 28 Sep 2017 07:02:51 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 76ACCC081F34","Subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","To":"\"Michael S. Tsirkin\" <mst@redhat.com>","Cc":"virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org, kvm@vger.kernel.org","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>\n\t<1506067355-5771-6-git-send-email-jasowang@redhat.com>\n\t<20170926222223-mutt-send-email-mst@kernel.org>\n\t<16ea7512-d770-21ef-edb6-3ada51f08592@redhat.com>\n\t<20170928011724-mutt-send-email-mst@kernel.org>","From":"Jason Wang <jasowang@redhat.com>","Message-ID":"<632b9bdd-4459-8659-e491-ba125adc2038@redhat.com>","Date":"Thu, 28 Sep 2017 15:02:48 +0800","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170928011724-mutt-send-email-mst@kernel.org>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Transfer-Encoding":"8bit","Content-Language":"en-US","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.14","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.32]);\n\tThu, 28 Sep 2017 07:02:55 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776835,"web_url":"http://patchwork.ozlabs.org/comment/1776835/","msgid":"<a0b47264-75b9-4ab5-3c78-7b08cee7995c@redhat.com>","list_archive_url":null,"date":"2017-09-28T07:50:14","subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","submitter":{"id":5225,"url":"http://patchwork.ozlabs.org/api/people/5225/","name":"Jason Wang","email":"jasowang@redhat.com"},"content":"On 2017年09月28日 08:55, Willem de Bruijn wrote:\n>> @@ -461,6 +460,7 @@ static void handle_tx(struct vhost_net *net)\n>>          struct socket *sock;\n>>          struct vhost_net_ubuf_ref *uninitialized_var(ubufs);\n>>          bool zcopy, zcopy_used;\n>> +       int i, batched = VHOST_NET_BATCH;\n>>\n>>          mutex_lock(&vq->mutex);\n>>          sock = vq->private_data;\n>> @@ -475,6 +475,12 @@ static void handle_tx(struct vhost_net *net)\n>>          hdr_size = nvq->vhost_hlen;\n>>          zcopy = nvq->ubufs;\n>>\n>> +       /* Disable zerocopy batched fetching for simplicity */\n> This special case can perhaps be avoided if we no longer block\n> on vhost_exceeds_maxpend, but revert to copying.\n\nYes, I think so. For simplicity, I do it for data copy first. If the \nidea is convinced, I will try to do zerocopy on top.\n\n>\n>> +       if (zcopy) {\n>> +               heads = &used;\n> Can this special case of batchsize 1 not use vq->heads?\n\nIt doesn't in fact?\n\n>\n>> +               batched = 1;\n>> +       }\n>> +\n>>          for (;;) {\n>>                  /* Release DMAs done buffers first */\n>>                  if (zcopy)\n>> @@ -486,95 +492,114 @@ static void handle_tx(struct vhost_net *net)\n>>                  if (unlikely(vhost_exceeds_maxpend(net)))\n>>                          break;\n>> +                       /* TODO: Check specific error and bomb out\n>> +                        * unless ENOBUFS?\n>> +                        */\n>> +                       err = sock->ops->sendmsg(sock, &msg, len);\n>> +                       if (unlikely(err < 0)) {\n>> +                               if (zcopy_used) {\n>> +                                       vhost_net_ubuf_put(ubufs);\n>> +                                       nvq->upend_idx =\n>> +                                  ((unsigned)nvq->upend_idx - 1) % UIO_MAXIOV;\n>> +                               }\n>> +                               vhost_discard_vq_desc(vq, 1);\n>> +                               goto out;\n>> +                       }\n>> +                       if (err != len)\n>> +                               pr_debug(\"Truncated TX packet: \"\n>> +                                       \" len %d != %zd\\n\", err, len);\n>> +                       if (!zcopy) {\n>> +                               vhost_add_used_idx(vq, 1);\n>> +                               vhost_signal(&net->dev, vq);\n>> +                       } else if (!zcopy_used) {\n>> +                               vhost_add_used_and_signal(&net->dev,\n>> +                                                         vq, head, 0);\n> While batching, perhaps can also move this producer index update\n> out of the loop and using vhost_add_used_and_signal_n.\n\nYes.\n\n>\n>> +                       } else\n>> +                               vhost_zerocopy_signal_used(net, vq);\n>> +                       vhost_net_tx_packet(net);\n>> +                       if (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n>> +                               vhost_poll_queue(&vq->poll);\n>> +                               goto out;\n>>                          }\n>> -                       vhost_discard_vq_desc(vq, 1);\n>> -                       break;\n>> -               }\n>> -               if (err != len)\n>> -                       pr_debug(\"Truncated TX packet: \"\n>> -                                \" len %d != %zd\\n\", err, len);\n>> -               if (!zcopy_used)\n>> -                       vhost_add_used_and_signal(&net->dev, vq, head, 0);\n>> -               else\n>> -                       vhost_zerocopy_signal_used(net, vq);\n>> -               vhost_net_tx_packet(net);\n>> -               if (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n>> -                       vhost_poll_queue(&vq->poll);\n>> -                       break;\n> This patch touches many lines just for indentation. If having to touch\n> these lines anyway (dirtying git blame), it may be a good time to move\n> the processing of a single descriptor code into a separate helper function.\n> And while breaking up, perhaps another helper for setting up ubuf_info.\n> If you agree, preferably in a separate noop refactor patch that precedes\n> the functional changes.\n\nRight and it looks better, will try to do this.\n\nThanks","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=jasowang@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2n194RFFz9t2c\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 17:50:37 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751979AbdI1Hu0 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 03:50:26 -0400","from mx1.redhat.com ([209.132.183.28]:47502 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751592AbdI1HuZ (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 28 Sep 2017 03:50:25 -0400","from smtp.corp.redhat.com\n\t(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 2E5BCA058A;\n\tThu, 28 Sep 2017 07:50:25 +0000 (UTC)","from [10.72.12.139] (ovpn-12-139.pek2.redhat.com [10.72.12.139])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id D755B63BCE;\n\tThu, 28 Sep 2017 07:50:18 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 2E5BCA058A","Subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","To":"Willem de Bruijn <willemdebruijn.kernel@gmail.com>","Cc":"\"Michael S. Tsirkin\" <mst@redhat.com>,\n\tvirtualization@lists.linux-foundation.org,\n\tNetwork Development <netdev@vger.kernel.org>,\n\tLKML <linux-kernel@vger.kernel.org>, kvm@vger.kernel.org","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>\n\t<1506067355-5771-6-git-send-email-jasowang@redhat.com>\n\t<CAF=yD-LuKXo93kUYS_1fhL88jGmS6LHiTzi=JpSoRucNSp3k4g@mail.gmail.com>","From":"Jason Wang <jasowang@redhat.com>","Message-ID":"<a0b47264-75b9-4ab5-3c78-7b08cee7995c@redhat.com>","Date":"Thu, 28 Sep 2017 15:50:14 +0800","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<CAF=yD-LuKXo93kUYS_1fhL88jGmS6LHiTzi=JpSoRucNSp3k4g@mail.gmail.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Transfer-Encoding":"8bit","Content-Language":"en-US","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.15","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.25]);\n\tThu, 28 Sep 2017 07:50:25 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776837,"web_url":"http://patchwork.ozlabs.org/comment/1776837/","msgid":"<1455d9db-3f2f-4139-8729-29fb03dd73aa@redhat.com>","list_archive_url":null,"date":"2017-09-28T07:52:59","subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","submitter":{"id":5225,"url":"http://patchwork.ozlabs.org/api/people/5225/","name":"Jason Wang","email":"jasowang@redhat.com"},"content":"On 2017年09月28日 06:19, Michael S. Tsirkin wrote:\n>\n>>> 2. add new APIs and move the loop into vhost core\n>>>      for more speedups\n>> I don't see any advantages, looks like just need some e.g callbacks in this\n>> case.\n>>\n>> Thanks\n> IUC callbacks pretty much destroy the code cache locality advantages,\n> IP is jumping around too much.\n\nI may miss something, but we need net specific code anyway in this case?\n\nThanks","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=jasowang@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2n4H4z98z9t5x\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 17:53:19 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752332AbdI1HxK (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 03:53:10 -0400","from mx1.redhat.com ([209.132.183.28]:34254 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1752223AbdI1HxI (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 28 Sep 2017 03:53:08 -0400","from smtp.corp.redhat.com\n\t(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 2906C9818E;\n\tThu, 28 Sep 2017 07:53:08 +0000 (UTC)","from [10.72.12.139] (ovpn-12-139.pek2.redhat.com [10.72.12.139])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 6701682717;\n\tThu, 28 Sep 2017 07:53:02 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 2906C9818E","Subject":"Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","To":"\"Michael S. Tsirkin\" <mst@redhat.com>","Cc":"virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org, kvm@vger.kernel.org","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>\n\t<1506067355-5771-6-git-send-email-jasowang@redhat.com>\n\t<20170926222223-mutt-send-email-mst@kernel.org>\n\t<16ea7512-d770-21ef-edb6-3ada51f08592@redhat.com>\n\t<20170928011724-mutt-send-email-mst@kernel.org>","From":"Jason Wang <jasowang@redhat.com>","Message-ID":"<1455d9db-3f2f-4139-8729-29fb03dd73aa@redhat.com>","Date":"Thu, 28 Sep 2017 15:52:59 +0800","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170928011724-mutt-send-email-mst@kernel.org>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Transfer-Encoding":"8bit","Content-Language":"en-US","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.12","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.30]);\n\tThu, 28 Sep 2017 07:53:08 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}}]