{"id":817372,"url":"http://patchwork.ozlabs.org/api/patches/817372/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/patch/1506067355-5771-6-git-send-email-jasowang@redhat.com/","project":{"id":7,"url":"http://patchwork.ozlabs.org/api/projects/7/?format=json","name":"Linux network development","link_name":"netdev","list_id":"netdev.vger.kernel.org","list_email":"netdev@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<1506067355-5771-6-git-send-email-jasowang@redhat.com>","list_archive_url":null,"date":"2017-09-22T08:02:35","name":"[net-next,RFC,5/5] vhost_net: basic tx virtqueue batched processing","commit_ref":null,"pull_url":null,"state":"rfc","archived":true,"hash":"ab74d78a25bf0491fd86574563f4c05fbc1280cc","submitter":{"id":5225,"url":"http://patchwork.ozlabs.org/api/people/5225/?format=json","name":"Jason Wang","email":"jasowang@redhat.com"},"delegate":{"id":34,"url":"http://patchwork.ozlabs.org/api/users/34/?format=json","username":"davem","first_name":"David","last_name":"Miller","email":"davem@davemloft.net"},"mbox":"http://patchwork.ozlabs.org/project/netdev/patch/1506067355-5771-6-git-send-email-jasowang@redhat.com/mbox/","series":[{"id":4565,"url":"http://patchwork.ozlabs.org/api/series/4565/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/list/?series=4565","date":"2017-09-22T08:02:35","name":"batched tx processing in vhost_net","version":1,"mbox":"http://patchwork.ozlabs.org/series/4565/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/817372/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/817372/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=jasowang@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xz5Zb3hncz9sNc\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 22 Sep 2017 18:03:19 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752076AbdIVIDG (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 22 Sep 2017 04:03:06 -0400","from mx1.redhat.com ([209.132.183.28]:35108 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751995AbdIVIDC (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 22 Sep 2017 04:03:02 -0400","from smtp.corp.redhat.com\n\t(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id AA2BB7E45B;\n\tFri, 22 Sep 2017 08:03:02 +0000 (UTC)","from jason-ThinkPad-T450s.redhat.com (ovpn-12-19.pek2.redhat.com\n\t[10.72.12.19])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 148C062675;\n\tFri, 22 Sep 2017 08:02:57 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com AA2BB7E45B","From":"Jason Wang <jasowang@redhat.com>","To":"mst@redhat.com, jasowang@redhat.com,\n\tvirtualization@lists.linux-foundation.org, netdev@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org","Cc":"kvm@vger.kernel.org","Subject":"[PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched\n\tprocessing","Date":"Fri, 22 Sep 2017 16:02:35 +0800","Message-Id":"<1506067355-5771-6-git-send-email-jasowang@redhat.com>","In-Reply-To":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>","References":"<1506067355-5771-1-git-send-email-jasowang@redhat.com>","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.15","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.27]);\n\tFri, 22 Sep 2017 08:03:02 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"},"content":"This patch implements basic batched processing of tx virtqueue by\nprefetching desc indices and updating used ring in a batch. For\nnon-zerocopy case, vq->heads were used for storing the prefetched\nindices and updating used ring. It is also a requirement for doing\nmore batching on top. For zerocopy case and for simplicity, batched\nprocessing were simply disabled by only fetching and processing one\ndescriptor at a time, this could be optimized in the future.\n\nXDP_DROP (without touching skb) on tun (with Moongen in guest) with\nzercopy disabled:\n\nIntel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:\nBefore: 3.20Mpps\nAfter:  3.90Mpps (+22%)\n\nNo differences were seen with zerocopy enabled.\n\nSigned-off-by: Jason Wang <jasowang@redhat.com>\n---\n drivers/vhost/net.c   | 215 ++++++++++++++++++++++++++++----------------------\n drivers/vhost/vhost.c |   2 +-\n 2 files changed, 121 insertions(+), 96 deletions(-)","diff":"diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c\nindex c89640e..c439892 100644\n--- a/drivers/vhost/net.c\n+++ b/drivers/vhost/net.c\n@@ -408,27 +408,25 @@ static int vhost_net_enable_vq(struct vhost_net *n,\n \treturn vhost_poll_start(poll, sock->file);\n }\n \n-static int vhost_net_tx_get_vq_desc(struct vhost_net *net,\n-\t\t\t\t    struct vhost_virtqueue *vq,\n-\t\t\t\t    struct iovec iov[], unsigned int iov_size,\n-\t\t\t\t    unsigned int *out_num, unsigned int *in_num)\n+static bool vhost_net_tx_avail(struct vhost_net *net,\n+\t\t\t       struct vhost_virtqueue *vq)\n {\n \tunsigned long uninitialized_var(endtime);\n-\tint r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),\n-\t\t\t\t  out_num, in_num, NULL, NULL);\n \n-\tif (r == vq->num && vq->busyloop_timeout) {\n-\t\tpreempt_disable();\n-\t\tendtime = busy_clock() + vq->busyloop_timeout;\n-\t\twhile (vhost_can_busy_poll(vq->dev, endtime) &&\n-\t\t       vhost_vq_avail_empty(vq->dev, vq))\n-\t\t\tcpu_relax();\n-\t\tpreempt_enable();\n-\t\tr = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),\n-\t\t\t\t      out_num, in_num, NULL, NULL);\n-\t}\n+\tif (!vq->busyloop_timeout)\n+\t\treturn false;\n \n-\treturn r;\n+\tif (!vhost_vq_avail_empty(vq->dev, vq))\n+\t\treturn true;\n+\n+\tpreempt_disable();\n+\tendtime = busy_clock() + vq->busyloop_timeout;\n+\twhile (vhost_can_busy_poll(vq->dev, endtime) &&\n+\t\tvhost_vq_avail_empty(vq->dev, vq))\n+\t\tcpu_relax();\n+\tpreempt_enable();\n+\n+\treturn !vhost_vq_avail_empty(vq->dev, vq);\n }\n \n static bool vhost_exceeds_maxpend(struct vhost_net *net)\n@@ -446,8 +444,9 @@ static void handle_tx(struct vhost_net *net)\n {\n \tstruct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];\n \tstruct vhost_virtqueue *vq = &nvq->vq;\n+\tstruct vring_used_elem used, *heads = vq->heads;\n \tunsigned out, in;\n-\tint head;\n+\tint avails, head;\n \tstruct msghdr msg = {\n \t\t.msg_name = NULL,\n \t\t.msg_namelen = 0,\n@@ -461,6 +460,7 @@ static void handle_tx(struct vhost_net *net)\n \tstruct socket *sock;\n \tstruct vhost_net_ubuf_ref *uninitialized_var(ubufs);\n \tbool zcopy, zcopy_used;\n+\tint i, batched = VHOST_NET_BATCH;\n \n \tmutex_lock(&vq->mutex);\n \tsock = vq->private_data;\n@@ -475,6 +475,12 @@ static void handle_tx(struct vhost_net *net)\n \thdr_size = nvq->vhost_hlen;\n \tzcopy = nvq->ubufs;\n \n+\t/* Disable zerocopy batched fetching for simplicity */\n+\tif (zcopy) {\n+\t\theads = &used;\n+\t\tbatched = 1;\n+\t}\n+\n \tfor (;;) {\n \t\t/* Release DMAs done buffers first */\n \t\tif (zcopy)\n@@ -486,95 +492,114 @@ static void handle_tx(struct vhost_net *net)\n \t\tif (unlikely(vhost_exceeds_maxpend(net)))\n \t\t\tbreak;\n \n-\t\thead = vhost_net_tx_get_vq_desc(net, vq, vq->iov,\n-\t\t\t\t\t\tARRAY_SIZE(vq->iov),\n-\t\t\t\t\t\t&out, &in);\n+\t\tavails = vhost_prefetch_desc_indices(vq, heads, batched, !zcopy);\n \t\t/* On error, stop handling until the next kick. */\n-\t\tif (unlikely(head < 0))\n+\t\tif (unlikely(avails < 0))\n \t\t\tbreak;\n-\t\t/* Nothing new?  Wait for eventfd to tell us they refilled. */\n-\t\tif (head == vq->num) {\n+\t\t/* Nothing new?  Busy poll for a while or wait for\n+\t\t * eventfd to tell us they refilled. */\n+\t\tif (!avails) {\n+\t\t\tif (vhost_net_tx_avail(net, vq))\n+\t\t\t\tcontinue;\n \t\t\tif (unlikely(vhost_enable_notify(&net->dev, vq))) {\n \t\t\t\tvhost_disable_notify(&net->dev, vq);\n \t\t\t\tcontinue;\n \t\t\t}\n \t\t\tbreak;\n \t\t}\n-\t\tif (in) {\n-\t\t\tvq_err(vq, \"Unexpected descriptor format for TX: \"\n-\t\t\t       \"out %d, int %d\\n\", out, in);\n-\t\t\tbreak;\n-\t\t}\n-\t\t/* Skip header. TODO: support TSO. */\n-\t\tlen = iov_length(vq->iov, out);\n-\t\tiov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);\n-\t\tiov_iter_advance(&msg.msg_iter, hdr_size);\n-\t\t/* Sanity check */\n-\t\tif (!msg_data_left(&msg)) {\n-\t\t\tvq_err(vq, \"Unexpected header len for TX: \"\n-\t\t\t       \"%zd expected %zd\\n\",\n-\t\t\t       len, hdr_size);\n-\t\t\tbreak;\n-\t\t}\n-\t\tlen = msg_data_left(&msg);\n-\n-\t\tzcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN\n-\t\t\t\t   && (nvq->upend_idx + 1) % UIO_MAXIOV !=\n-\t\t\t\t      nvq->done_idx\n-\t\t\t\t   && vhost_net_tx_select_zcopy(net);\n-\n-\t\t/* use msg_control to pass vhost zerocopy ubuf info to skb */\n-\t\tif (zcopy_used) {\n-\t\t\tstruct ubuf_info *ubuf;\n-\t\t\tubuf = nvq->ubuf_info + nvq->upend_idx;\n-\n-\t\t\tvq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);\n-\t\t\tvq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;\n-\t\t\tubuf->callback = vhost_zerocopy_callback;\n-\t\t\tubuf->ctx = nvq->ubufs;\n-\t\t\tubuf->desc = nvq->upend_idx;\n-\t\t\trefcount_set(&ubuf->refcnt, 1);\n-\t\t\tmsg.msg_control = ubuf;\n-\t\t\tmsg.msg_controllen = sizeof(ubuf);\n-\t\t\tubufs = nvq->ubufs;\n-\t\t\tatomic_inc(&ubufs->refcount);\n-\t\t\tnvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;\n-\t\t} else {\n-\t\t\tmsg.msg_control = NULL;\n-\t\t\tubufs = NULL;\n-\t\t}\n+\t\tfor (i = 0; i < avails; i++) {\n+\t\t\thead = __vhost_get_vq_desc(vq, vq->iov,\n+\t\t\t\t\t\t   ARRAY_SIZE(vq->iov),\n+\t\t\t\t\t\t   &out, &in, NULL, NULL,\n+\t\t\t\t\t       vhost16_to_cpu(vq, heads[i].id));\n+\t\t\tif (in) {\n+\t\t\t\tvq_err(vq, \"Unexpected descriptor format for \"\n+\t\t\t\t\t   \"TX: out %d, int %d\\n\", out, in);\n+\t\t\t\tgoto out;\n+\t\t\t}\n \n-\t\ttotal_len += len;\n-\t\tif (total_len < VHOST_NET_WEIGHT &&\n-\t\t    !vhost_vq_avail_empty(&net->dev, vq) &&\n-\t\t    likely(!vhost_exceeds_maxpend(net))) {\n-\t\t\tmsg.msg_flags |= MSG_MORE;\n-\t\t} else {\n-\t\t\tmsg.msg_flags &= ~MSG_MORE;\n-\t\t}\n+\t\t\t/* Skip header. TODO: support TSO. */\n+\t\t\tlen = iov_length(vq->iov, out);\n+\t\t\tiov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);\n+\t\t\tiov_iter_advance(&msg.msg_iter, hdr_size);\n+\t\t\t/* Sanity check */\n+\t\t\tif (!msg_data_left(&msg)) {\n+\t\t\t\tvq_err(vq, \"Unexpected header len for TX: \"\n+\t\t\t\t\t\"%zd expected %zd\\n\",\n+\t\t\t\t\tlen, hdr_size);\n+\t\t\t\tgoto out;\n+\t\t\t}\n+\t\t\tlen = msg_data_left(&msg);\n \n-\t\t/* TODO: Check specific error and bomb out unless ENOBUFS? */\n-\t\terr = sock->ops->sendmsg(sock, &msg, len);\n-\t\tif (unlikely(err < 0)) {\n+\t\t\tzcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN\n+\t\t\t\t     && (nvq->upend_idx + 1) % UIO_MAXIOV !=\n+\t\t\t\t\tnvq->done_idx\n+\t\t\t\t     && vhost_net_tx_select_zcopy(net);\n+\n+\t\t\t/* use msg_control to pass vhost zerocopy ubuf\n+\t\t\t * info to skb\n+\t\t\t */\n \t\t\tif (zcopy_used) {\n-\t\t\t\tvhost_net_ubuf_put(ubufs);\n-\t\t\t\tnvq->upend_idx = ((unsigned)nvq->upend_idx - 1)\n-\t\t\t\t\t% UIO_MAXIOV;\n+\t\t\t\tstruct ubuf_info *ubuf;\n+\t\t\t\tubuf = nvq->ubuf_info + nvq->upend_idx;\n+\n+\t\t\t\tvq->heads[nvq->upend_idx].id =\n+\t\t\t\t\tcpu_to_vhost32(vq, head);\n+\t\t\t\tvq->heads[nvq->upend_idx].len =\n+\t\t\t\t\tVHOST_DMA_IN_PROGRESS;\n+\t\t\t\tubuf->callback = vhost_zerocopy_callback;\n+\t\t\t\tubuf->ctx = nvq->ubufs;\n+\t\t\t\tubuf->desc = nvq->upend_idx;\n+\t\t\t\trefcount_set(&ubuf->refcnt, 1);\n+\t\t\t\tmsg.msg_control = ubuf;\n+\t\t\t\tmsg.msg_controllen = sizeof(ubuf);\n+\t\t\t\tubufs = nvq->ubufs;\n+\t\t\t\tatomic_inc(&ubufs->refcount);\n+\t\t\t\tnvq->upend_idx =\n+\t\t\t\t\t(nvq->upend_idx + 1) % UIO_MAXIOV;\n+\t\t\t} else {\n+\t\t\t\tmsg.msg_control = NULL;\n+\t\t\t\tubufs = NULL;\n+\t\t\t}\n+\n+\t\t\ttotal_len += len;\n+\t\t\tif (total_len < VHOST_NET_WEIGHT &&\n+\t\t\t\t!vhost_vq_avail_empty(&net->dev, vq) &&\n+\t\t\t\tlikely(!vhost_exceeds_maxpend(net))) {\n+\t\t\t\tmsg.msg_flags |= MSG_MORE;\n+\t\t\t} else {\n+\t\t\t\tmsg.msg_flags &= ~MSG_MORE;\n+\t\t\t}\n+\n+\t\t\t/* TODO: Check specific error and bomb out\n+\t\t\t * unless ENOBUFS?\n+\t\t\t */\n+\t\t\terr = sock->ops->sendmsg(sock, &msg, len);\n+\t\t\tif (unlikely(err < 0)) {\n+\t\t\t\tif (zcopy_used) {\n+\t\t\t\t\tvhost_net_ubuf_put(ubufs);\n+\t\t\t\t\tnvq->upend_idx =\n+\t\t\t\t   ((unsigned)nvq->upend_idx - 1) % UIO_MAXIOV;\n+\t\t\t\t}\n+\t\t\t\tvhost_discard_vq_desc(vq, 1);\n+\t\t\t\tgoto out;\n+\t\t\t}\n+\t\t\tif (err != len)\n+\t\t\t\tpr_debug(\"Truncated TX packet: \"\n+\t\t\t\t\t\" len %d != %zd\\n\", err, len);\n+\t\t\tif (!zcopy) {\n+\t\t\t\tvhost_add_used_idx(vq, 1);\n+\t\t\t\tvhost_signal(&net->dev, vq);\n+\t\t\t} else if (!zcopy_used) {\n+\t\t\t\tvhost_add_used_and_signal(&net->dev,\n+\t\t\t\t\t\t\t  vq, head, 0);\n+\t\t\t} else\n+\t\t\t\tvhost_zerocopy_signal_used(net, vq);\n+\t\t\tvhost_net_tx_packet(net);\n+\t\t\tif (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n+\t\t\t\tvhost_poll_queue(&vq->poll);\n+\t\t\t\tgoto out;\n \t\t\t}\n-\t\t\tvhost_discard_vq_desc(vq, 1);\n-\t\t\tbreak;\n-\t\t}\n-\t\tif (err != len)\n-\t\t\tpr_debug(\"Truncated TX packet: \"\n-\t\t\t\t \" len %d != %zd\\n\", err, len);\n-\t\tif (!zcopy_used)\n-\t\t\tvhost_add_used_and_signal(&net->dev, vq, head, 0);\n-\t\telse\n-\t\t\tvhost_zerocopy_signal_used(net, vq);\n-\t\tvhost_net_tx_packet(net);\n-\t\tif (unlikely(total_len >= VHOST_NET_WEIGHT)) {\n-\t\t\tvhost_poll_queue(&vq->poll);\n-\t\t\tbreak;\n \t\t}\n \t}\n out:\ndiff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c\nindex 6532cda..8764df5 100644\n--- a/drivers/vhost/vhost.c\n+++ b/drivers/vhost/vhost.c\n@@ -392,7 +392,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)\n \t\tvq->indirect = kmalloc(sizeof *vq->indirect * UIO_MAXIOV,\n \t\t\t\t       GFP_KERNEL);\n \t\tvq->log = kmalloc(sizeof *vq->log * UIO_MAXIOV, GFP_KERNEL);\n-\t\tvq->heads = kmalloc(sizeof *vq->heads * UIO_MAXIOV, GFP_KERNEL);\n+\t\tvq->heads = kzalloc(sizeof *vq->heads * UIO_MAXIOV, GFP_KERNEL);\n \t\tif (!vq->indirect || !vq->log || !vq->heads)\n \t\t\tgoto err_nomem;\n \t}\n","prefixes":["net-next","RFC","5/5"]}