From patchwork Mon May 21 09:04:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917477 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCcz0c36z9s1R for ; Mon, 21 May 2018 19:08:50 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751892AbeEUJEw (ORCPT ); Mon, 21 May 2018 05:04:52 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:54666 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751557AbeEUJEp (ORCPT ); Mon, 21 May 2018 05:04:45 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2504C400E9A7; Mon, 21 May 2018 09:04:45 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1D2944E68B; Mon, 21 May 2018 09:04:41 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 01/12] vhost_net: introduce helper to initialize tx iov iter Date: Mon, 21 May 2018 17:04:22 +0800 Message-Id: <1526893473-20128-2-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 21 May 2018 09:04:45 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 21 May 2018 09:04:45 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Signed-off-by: Jason Wang --- drivers/vhost/net.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index c4b49fc..15d191a 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -459,6 +459,26 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net) min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2); } +static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter, + size_t hdr_size, int out) +{ + /* Skip header. TODO: support TSO. */ + size_t len = iov_length(vq->iov, out); + + iov_iter_init(iter, WRITE, vq->iov, out, len); + iov_iter_advance(iter, hdr_size); + /* Sanity check */ + if (!iov_iter_count(iter)) { + vq_err(vq, "Unexpected header len for TX: " + "%zd expected %zd\n", + len, hdr_size); + return -EFAULT; + } + len = iov_iter_count(iter); + + return len; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -521,18 +541,10 @@ static void handle_tx(struct vhost_net *net) "out %d, int %d\n", out, in); break; } - /* Skip header. TODO: support TSO. */ - len = iov_length(vq->iov, out); - iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len); - iov_iter_advance(&msg.msg_iter, hdr_size); - /* Sanity check */ - if (!msg_data_left(&msg)) { - vq_err(vq, "Unexpected header len for TX: " - "%zd expected %zd\n", - len, hdr_size); + + len = init_iov_iter(vq, &msg.msg_iter, hdr_size, out); + if (len < 0) break; - } - len = msg_data_left(&msg); zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN && !vhost_exceeds_maxpend(net) From patchwork Mon May 21 09:04:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917475 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCcK58jrz9s1R for ; Mon, 21 May 2018 19:08:17 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752291AbeEUJEz (ORCPT ); Mon, 21 May 2018 05:04:55 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43670 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751685AbeEUJEt (ORCPT ); Mon, 21 May 2018 05:04:49 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9FFCA7C6A9; Mon, 21 May 2018 09:04:48 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id D58FB4AF1; Mon, 21 May 2018 09:04:45 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 02/12] vhost_net: introduce vhost_exceeds_weight() Date: Mon, 21 May 2018 17:04:23 +0800 Message-Id: <1526893473-20128-3-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 21 May 2018 09:04:48 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 21 May 2018 09:04:48 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Signed-off-by: Jason Wang --- drivers/vhost/net.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 15d191a..de544ee 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -479,6 +479,12 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter, return len; } +static bool vhost_exceeds_weight(int pkts, int total_len) +{ + return unlikely(total_len >= VHOST_NET_WEIGHT) || + unlikely(pkts >= VHOST_NET_PKT_WEIGHT); +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -570,7 +576,6 @@ static void handle_tx(struct vhost_net *net) msg.msg_control = NULL; ubufs = NULL; } - total_len += len; if (total_len < VHOST_NET_WEIGHT && !vhost_vq_avail_empty(&net->dev, vq) && @@ -600,8 +605,7 @@ static void handle_tx(struct vhost_net *net) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT)) { + if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) { vhost_poll_queue(&vq->poll); break; } @@ -887,8 +891,7 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) { + if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) { vhost_poll_queue(&vq->poll); goto out; } From patchwork Mon May 21 09:04:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917478 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCdB2ZxLz9s1R for ; Mon, 21 May 2018 19:09:02 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752731AbeEUJIu (ORCPT ); Mon, 21 May 2018 05:08:50 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:39974 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751778AbeEUJEw (ORCPT ); Mon, 21 May 2018 05:04:52 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0B54281663DD; Mon, 21 May 2018 09:04:52 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4D1BF76C0; Mon, 21 May 2018 09:04:48 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 03/12] vhost_net: introduce vhost_has_more_pkts() Date: Mon, 21 May 2018 17:04:24 +0800 Message-Id: <1526893473-20128-4-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 21 May 2018 09:04:52 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 21 May 2018 09:04:52 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Signed-off-by: Jason Wang --- drivers/vhost/net.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index de544ee..4ebac76 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -485,6 +485,13 @@ static bool vhost_exceeds_weight(int pkts, int total_len) unlikely(pkts >= VHOST_NET_PKT_WEIGHT); } +static bool vhost_has_more_pkts(struct vhost_net *net, + struct vhost_virtqueue *vq) +{ + return !vhost_vq_avail_empty(&net->dev, vq) && + likely(!vhost_exceeds_maxpend(net)); +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -578,8 +585,7 @@ static void handle_tx(struct vhost_net *net) } total_len += len; if (total_len < VHOST_NET_WEIGHT && - !vhost_vq_avail_empty(&net->dev, vq) && - likely(!vhost_exceeds_maxpend(net))) { + vhost_has_more_pkts(net, vq)) { msg.msg_flags |= MSG_MORE; } else { msg.msg_flags &= ~MSG_MORE; @@ -605,7 +611,7 @@ static void handle_tx(struct vhost_net *net) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) { + if (vhost_exceeds_weight(++sent_pkts, total_len)) { vhost_poll_queue(&vq->poll); break; } From patchwork Mon May 21 09:04:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917476 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCcX1M8lz9s1R for ; Mon, 21 May 2018 19:08:28 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752522AbeEUJIQ (ORCPT ); Mon, 21 May 2018 05:08:16 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:41288 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752308AbeEUJE4 (ORCPT ); Mon, 21 May 2018 05:04:56 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AA3F6401DEA6; Mon, 21 May 2018 09:04:55 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id AD2CB50329; Mon, 21 May 2018 09:04:52 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 04/12] vhost_net: split out datacopy logic Date: Mon, 21 May 2018 17:04:25 +0800 Message-Id: <1526893473-20128-5-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 21 May 2018 09:04:55 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 21 May 2018 09:04:55 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Instead of mixing zerocopy and datacopy logics, this patch tries to split datacopy logic out. This results for a more compact code and specific optimization could be done on top more easily. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 102 insertions(+), 9 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 4ebac76..4682fcc 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -492,9 +492,95 @@ static bool vhost_has_more_pkts(struct vhost_net *net, likely(!vhost_exceeds_maxpend(net)); } +static void handle_tx_copy(struct vhost_net *net) +{ + struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; + struct vhost_virtqueue *vq = &nvq->vq; + unsigned out, in; + int head; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err; + size_t hdr_size; + struct socket *sock; + struct vhost_net_ubuf_ref *uninitialized_var(ubufs); + int sent_pkts = 0; + + mutex_lock(&vq->mutex); + sock = vq->private_data; + if (!sock) + goto out; + + if (!vq_iotlb_prefetch(vq)) + goto out; + + vhost_disable_notify(&net->dev, vq); + vhost_net_disable_vq(net, vq); + + hdr_size = nvq->vhost_hlen; + + for (;;) { + head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, + ARRAY_SIZE(vq->iov), + &out, &in); + /* On error, stop handling until the next kick. */ + if (unlikely(head < 0)) + break; + /* Nothing new? Wait for eventfd to tell us they refilled. */ + if (head == vq->num) { + if (unlikely(vhost_enable_notify(&net->dev, vq))) { + vhost_disable_notify(&net->dev, vq); + continue; + } + break; + } + if (in) { + vq_err(vq, "Unexpected descriptor format for TX: " + "out %d, int %d\n", out, in); + break; + } + + len = init_iov_iter(vq, &msg.msg_iter, hdr_size, out); + if (len < 0) + break; + + total_len += len; + if (total_len < VHOST_NET_WEIGHT && + vhost_has_more_pkts(net, vq)) { + msg.msg_flags |= MSG_MORE; + } else { + msg.msg_flags &= ~MSG_MORE; + } + + /* TODO: Check specific error and bomb out unless ENOBUFS? */ + err = sock->ops->sendmsg(sock, &msg, len); + if (unlikely(err < 0)) { + vhost_discard_vq_desc(vq, 1); + vhost_net_enable_vq(net, vq); + break; + } + if (err != len) + pr_debug("Truncated TX packet: " + " len %d != %zd\n", err, len); + vhost_add_used_and_signal(&net->dev, vq, head, 0); + if (vhost_exceeds_weight(++sent_pkts, total_len)) { + vhost_poll_queue(&vq->poll); + break; + } + } +out: + mutex_unlock(&vq->mutex); +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ -static void handle_tx(struct vhost_net *net) +static void handle_tx_zerocopy(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; @@ -512,7 +598,7 @@ static void handle_tx(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); - bool zcopy, zcopy_used; + bool zcopy_used; int sent_pkts = 0; mutex_lock(&vq->mutex); @@ -527,13 +613,10 @@ static void handle_tx(struct vhost_net *net) vhost_net_disable_vq(net, vq); hdr_size = nvq->vhost_hlen; - zcopy = nvq->ubufs; for (;;) { /* Release DMAs done buffers first */ - if (zcopy) - vhost_zerocopy_signal_used(net, vq); - + vhost_zerocopy_signal_used(net, vq); head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, ARRAY_SIZE(vq->iov), @@ -559,9 +642,9 @@ static void handle_tx(struct vhost_net *net) if (len < 0) break; - zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN - && !vhost_exceeds_maxpend(net) - && vhost_net_tx_select_zcopy(net); + zcopy_used = len >= VHOST_GOODCOPY_LEN + && !vhost_exceeds_maxpend(net) + && vhost_net_tx_select_zcopy(net); /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { @@ -620,6 +703,16 @@ static void handle_tx(struct vhost_net *net) mutex_unlock(&vq->mutex); } +static void handle_tx(struct vhost_net *net) +{ + struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; + + if (nvq->ubufs) + handle_tx_zerocopy(net); + else + handle_tx_copy(net); +} + static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk) { struct sk_buff *head; From patchwork Mon May 21 09:04:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917467 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCXf2yxdz9s1R for ; Mon, 21 May 2018 19:05:06 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752572AbeEUJFE (ORCPT ); Mon, 21 May 2018 05:05:04 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:41294 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751727AbeEUJFC (ORCPT ); Mon, 21 May 2018 05:05:02 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 92B9D401DEA6; Mon, 21 May 2018 09:05:01 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id C94E04AF1; Mon, 21 May 2018 09:04:56 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 05/12] vhost_net: batch update used ring for datacopy TX Date: Mon, 21 May 2018 17:04:26 +0800 Message-Id: <1526893473-20128-6-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 21 May 2018 09:05:01 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 21 May 2018 09:05:01 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Like commit e2b3b35eb989 ("vhost_net: batch used ring update in rx"), this patches implements batch used ring update for datacopy TX (zerocopy has already done some kind of batching). Testpmd transmission from guest to ixgbe via XDP_REDIRECT shows about 15% improvement from 2.8Mpps to 3.2Mpps. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 4682fcc..f0639d7 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -511,6 +511,7 @@ static void handle_tx_copy(struct vhost_net *net) struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); int sent_pkts = 0; + s16 nheads = 0; mutex_lock(&vq->mutex); sock = vq->private_data; @@ -550,6 +551,9 @@ static void handle_tx_copy(struct vhost_net *net) if (len < 0) break; + vq->heads[nheads].id = cpu_to_vhost32(vq, head); + vq->heads[nheads].len = 0; + total_len += len; if (total_len < VHOST_NET_WEIGHT && vhost_has_more_pkts(net, vq)) { @@ -568,13 +572,20 @@ static void handle_tx_copy(struct vhost_net *net) if (err != len) pr_debug("Truncated TX packet: " " len %d != %zd\n", err, len); - vhost_add_used_and_signal(&net->dev, vq, head, 0); + if (++nheads == VHOST_RX_BATCH) { + vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, + nheads); + nheads = 0; + } if (vhost_exceeds_weight(++sent_pkts, total_len)) { vhost_poll_queue(&vq->poll); break; } } out: + if (nheads) + vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, + nheads); mutex_unlock(&vq->mutex); } From patchwork Mon May 21 09:04:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917474 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCbh2rT7z9s1R for ; Mon, 21 May 2018 19:07:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752656AbeEUJFL (ORCPT ); Mon, 21 May 2018 05:05:11 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:52440 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751234AbeEUJFG (ORCPT ); Mon, 21 May 2018 05:05:06 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D0F4FBB409; Mon, 21 May 2018 09:05:05 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5487B4E68B; Mon, 21 May 2018 09:05:01 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 06/12] tuntap: enable premmption early Date: Mon, 21 May 2018 17:04:27 +0800 Message-Id: <1526893473-20128-7-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 21 May 2018 09:05:05 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 21 May 2018 09:05:05 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Signed-off-by: Jason Wang --- drivers/net/tun.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 44d4f3d..24ecd82 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1697,6 +1697,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, goto err_xdp; } } + rcu_read_unlock(); + preempt_enable(); skb = build_skb(buf, buflen); if (!skb) { @@ -1710,9 +1712,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, get_page(alloc_frag->page); alloc_frag->offset += buflen; - rcu_read_unlock(); - preempt_enable(); - return skb; err_redirect: From patchwork Mon May 21 09:04:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917473 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCbQ4Slyz9s1R for ; Mon, 21 May 2018 19:07:30 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752680AbeEUJFP (ORCPT ); Mon, 21 May 2018 05:05:15 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43678 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752653AbeEUJFL (ORCPT ); Mon, 21 May 2018 05:05:11 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D05F97C6A9; Mon, 21 May 2018 09:05:10 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id B533976C0; Mon, 21 May 2018 09:05:06 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 07/12] tuntap: simplify error handling in tun_build_skb() Date: Mon, 21 May 2018 17:04:28 +0800 Message-Id: <1526893473-20128-8-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 21 May 2018 09:05:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 21 May 2018 09:05:10 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Signed-off-by: Jason Wang --- drivers/net/tun.c | 36 ++++++++++++++---------------------- 1 file changed, 14 insertions(+), 22 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 24ecd82..f6e0f96 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1612,7 +1612,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, int len, int *skb_xdp) { struct page_frag *alloc_frag = ¤t->task_frag; - struct sk_buff *skb; + struct sk_buff *skb = NULL; struct bpf_prog *xdp_prog; int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); unsigned int delta = 0; @@ -1638,6 +1638,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, if (copied != len) return ERR_PTR(-EFAULT); + get_page(alloc_frag->page); + alloc_frag->offset += buflen; + /* There's a small window that XDP may be set after the check * of xdp_prog above, this should be rare and for simplicity * we do XDP on skb in case the headroom is not enough. @@ -1665,24 +1668,16 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, switch (act) { case XDP_REDIRECT: - get_page(alloc_frag->page); - alloc_frag->offset += buflen; err = xdp_do_redirect(tun->dev, &xdp, xdp_prog); xdp_do_flush_map(); if (err) - goto err_redirect; - rcu_read_unlock(); - preempt_enable(); - return NULL; + goto err_xdp; + goto out; case XDP_TX: - get_page(alloc_frag->page); - alloc_frag->offset += buflen; if (tun_xdp_tx(tun->dev, &xdp)) - goto err_redirect; + goto err_xdp; tun_xdp_flush(tun->dev); - rcu_read_unlock(); - preempt_enable(); - return NULL; + goto out; case XDP_PASS: delta = orig_data - xdp.data; len = xdp.data_end - xdp.data; @@ -1702,25 +1697,22 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, skb = build_skb(buf, buflen); if (!skb) { - rcu_read_unlock(); - preempt_enable(); - return ERR_PTR(-ENOMEM); + skb = ERR_PTR(-ENOMEM); + goto out; } skb_reserve(skb, pad - delta); skb_put(skb, len); - get_page(alloc_frag->page); - alloc_frag->offset += buflen; return skb; -err_redirect: - put_page(alloc_frag->page); err_xdp: + alloc_frag->offset -= buflen; + put_page(alloc_frag->page); +out: rcu_read_unlock(); preempt_enable(); - this_cpu_inc(tun->pcpu_stats->rx_dropped); - return NULL; + return skb; } /* Get packet from user space buffer */ From patchwork Mon May 21 09:04:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917472 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCb572v7z9s1R for ; Mon, 21 May 2018 19:07:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752707AbeEUJFS (ORCPT ); Mon, 21 May 2018 05:05:18 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:54680 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752670AbeEUJFP (ORCPT ); Mon, 21 May 2018 05:05:15 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9C96E400E9A7; Mon, 21 May 2018 09:05:14 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9C04F4AF1; Mon, 21 May 2018 09:05:11 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 08/12] tuntap: tweak on the path of non-xdp case in tun_build_skb() Date: Mon, 21 May 2018 17:04:29 +0800 Message-Id: <1526893473-20128-9-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 21 May 2018 09:05:14 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 21 May 2018 09:05:14 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If we're sure not to go native XDP, there's no need for several things like touching preemption counter and rcu stuffs. Signed-off-by: Jason Wang --- drivers/net/tun.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index f6e0f96..ed04291 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1645,10 +1645,12 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, * of xdp_prog above, this should be rare and for simplicity * we do XDP on skb in case the headroom is not enough. */ - if (hdr->gso_type || !xdp_prog) + if (hdr->gso_type || !xdp_prog) { *skb_xdp = 1; - else - *skb_xdp = 0; + goto build; + } + + *skb_xdp = 0; preempt_disable(); rcu_read_lock(); @@ -1695,6 +1697,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, rcu_read_unlock(); preempt_enable(); +build: skb = build_skb(buf, buflen); if (!skb) { skb = ERR_PTR(-ENOMEM); From patchwork Mon May 21 09:04:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917470 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCZc1xzhz9s31 for ; Mon, 21 May 2018 19:06:48 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751688AbeEUJF3 (ORCPT ); Mon, 21 May 2018 05:05:29 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:54682 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751368AbeEUJFV (ORCPT ); Mon, 21 May 2018 05:05:21 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DD8E1400E9A7; Mon, 21 May 2018 09:05:20 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id B50844AF1; Mon, 21 May 2018 09:05:15 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 09/12] tuntap: split out XDP logic Date: Mon, 21 May 2018 17:04:30 +0800 Message-Id: <1526893473-20128-10-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 21 May 2018 09:05:20 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 21 May 2018 09:05:20 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch split out XDP logic into a single function. This make it to be reused by XDP batching path in the following patch. Signed-off-by: Jason Wang --- drivers/net/tun.c | 85 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index ed04291..2560378 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1605,6 +1605,44 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, return true; } +static u32 tun_do_xdp(struct tun_struct *tun, + struct tun_file *tfile, + struct bpf_prog *xdp_prog, + struct xdp_buff *xdp, + int *err) +{ + u32 act = bpf_prog_run_xdp(xdp_prog, xdp); + + switch (act) { + case XDP_REDIRECT: + *err = xdp_do_redirect(tun->dev, xdp, xdp_prog); + xdp_do_flush_map(); + if (*err) + break; + goto out; + case XDP_TX: + *err = tun_xdp_tx(tun->dev, xdp); + if (*err) + break; + tun_xdp_flush(tun->dev); + goto out; + case XDP_PASS: + goto out; + default: + bpf_warn_invalid_xdp_action(act); + /* fall through */ + case XDP_ABORTED: + trace_xdp_exception(tun->dev, xdp_prog, act); + /* fall through */ + case XDP_DROP: + break; + } + + put_page(virt_to_head_page(xdp->data_hard_start)); +out: + return act; +} + static struct sk_buff *tun_build_skb(struct tun_struct *tun, struct tun_file *tfile, struct iov_iter *from, @@ -1615,10 +1653,10 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, struct sk_buff *skb = NULL; struct bpf_prog *xdp_prog; int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - unsigned int delta = 0; char *buf; size_t copied; - int err, pad = TUN_RX_PAD; + int pad = TUN_RX_PAD; + int err = 0; rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); @@ -1655,9 +1693,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, preempt_disable(); rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); - if (xdp_prog && !*skb_xdp) { + if (xdp_prog) { struct xdp_buff xdp; - void *orig_data; u32 act; xdp.data_hard_start = buf; @@ -1665,34 +1702,14 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + len; xdp.rxq = &tfile->xdp_rxq; - orig_data = xdp.data; - act = bpf_prog_run_xdp(xdp_prog, &xdp); - - switch (act) { - case XDP_REDIRECT: - err = xdp_do_redirect(tun->dev, &xdp, xdp_prog); - xdp_do_flush_map(); - if (err) - goto err_xdp; - goto out; - case XDP_TX: - if (tun_xdp_tx(tun->dev, &xdp)) - goto err_xdp; - tun_xdp_flush(tun->dev); - goto out; - case XDP_PASS: - delta = orig_data - xdp.data; - len = xdp.data_end - xdp.data; - break; - default: - bpf_warn_invalid_xdp_action(act); - /* fall through */ - case XDP_ABORTED: - trace_xdp_exception(tun->dev, xdp_prog, act); - /* fall through */ - case XDP_DROP: + act = tun_do_xdp(tun, tfile, xdp_prog, &xdp, &err); + if (err) goto err_xdp; - } + if (act != XDP_PASS) + goto out; + + pad = xdp.data - xdp.data_hard_start; + len = xdp.data_end - xdp.data; } rcu_read_unlock(); preempt_enable(); @@ -1700,18 +1717,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, build: skb = build_skb(buf, buflen); if (!skb) { + put_page(alloc_frag->page); skb = ERR_PTR(-ENOMEM); goto out; } - skb_reserve(skb, pad - delta); + skb_reserve(skb, pad); skb_put(skb, len); return skb; err_xdp: - alloc_frag->offset -= buflen; - put_page(alloc_frag->page); + this_cpu_inc(tun->pcpu_stats->rx_dropped); out: rcu_read_unlock(); preempt_enable(); From patchwork Mon May 21 09:04:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917471 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCZr3nR8z9s1R for ; Mon, 21 May 2018 19:07:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752735AbeEUJF1 (ORCPT ); Mon, 21 May 2018 05:05:27 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:41312 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751186AbeEUJFZ (ORCPT ); Mon, 21 May 2018 05:05:25 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9138D401DEA6; Mon, 21 May 2018 09:05:24 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id A97B64AF1; Mon, 21 May 2018 09:05:21 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 10/12] vhost_net: build xdp buff Date: Mon, 21 May 2018 17:04:31 +0800 Message-Id: <1526893473-20128-11-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 21 May 2018 09:05:24 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 21 May 2018 09:05:24 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch implement build XDP buffers in vhost_net. The idea is do userspace copy in vhost_net and build XDP buff based on the page. Vhost_net can then submit one or an array of XDP buffs to underlayer socket (e.g TUN). TUN can choose to do XDP or call build_skb() to build skb. To support build skb, vnet header were also stored into the header of the XDP buff. This userspace copy and XDP buffs building is key to achieve XDP batching in TUN, since TUN does not need to care about userspace copy and then can disable premmption for several XDP buffs to achieve batching from XDP. TODO: reserve headroom based on the TUN XDP. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f0639d7..1209e84 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -492,6 +492,80 @@ static bool vhost_has_more_pkts(struct vhost_net *net, likely(!vhost_exceeds_maxpend(net)); } +#define VHOST_NET_HEADROOM 256 +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD) + +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, + struct iov_iter *from, + struct xdp_buff *xdp) +{ + struct vhost_virtqueue *vq = &nvq->vq; + struct page_frag *alloc_frag = ¤t->task_frag; + struct virtio_net_hdr *gso; + size_t len = iov_iter_count(from); + int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + VHOST_NET_HEADROOM + + nvq->sock_hlen); + int sock_hlen = nvq->sock_hlen; + void *buf; + int copied; + + if (len < nvq->sock_hlen) + return -EFAULT; + + if (SKB_DATA_ALIGN(len + pad) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) + return -ENOSPC; + + buflen += SKB_DATA_ALIGN(len + pad); + alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); + if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) + return -ENOMEM; + + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; + + /* We store two kinds of metadata in the header which will be + * used for XDP_PASS to do build_skb(): + * offset 0: buflen + * offset sizeof(int): vnet header + */ + copied = copy_page_from_iter(alloc_frag->page, + alloc_frag->offset + sizeof(int), sock_hlen, from); + if (copied != sock_hlen) + return -EFAULT; + + gso = (struct virtio_net_hdr *)(buf + sizeof(int)); + + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && + vhost16_to_cpu(vq, gso->csum_start) + + vhost16_to_cpu(vq, gso->csum_offset) + 2 > + vhost16_to_cpu(vq, gso->hdr_len)) { + gso->hdr_len = cpu_to_vhost16(vq, + vhost16_to_cpu(vq, gso->csum_start) + + vhost16_to_cpu(vq, gso->csum_offset) + 2); + + if (vhost16_to_cpu(vq, gso->hdr_len) > len) + return -EINVAL; + } + + len -= sock_hlen; + copied = copy_page_from_iter(alloc_frag->page, + alloc_frag->offset + pad, + len, from); + if (copied != len) + return -EFAULT; + + xdp->data_hard_start = buf; + xdp->data = buf + pad; + xdp->data_end = xdp->data + len; + *(int *)(xdp->data_hard_start)= buflen; + + get_page(alloc_frag->page); + alloc_frag->offset += buflen; + + return 0; +} + static void handle_tx_copy(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; From patchwork Mon May 21 09:04:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917469 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCZP5dx6z9s1R for ; Mon, 21 May 2018 19:06:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751363AbeEUJFd (ORCPT ); Mon, 21 May 2018 05:05:33 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46340 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752742AbeEUJF3 (ORCPT ); Mon, 21 May 2018 05:05:29 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8E567401EF10; Mon, 21 May 2018 09:05:28 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 493044E68B; Mon, 21 May 2018 09:05:24 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 11/12] vhost_net: passing raw xdp buff to tun Date: Mon, 21 May 2018 17:04:32 +0800 Message-Id: <1526893473-20128-12-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:28 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:28 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patches implement a TUN specific msg_control: #define TUN_MSG_UBUF 1 #define TUN_MSG_PTR 2 struct tun_msg_ctl { int type; void *ptr; }; The first supported type is ubuf which is already used by vhost_net zerocopy code. The second is XDP buff, which allows vhost_net to pass XDP buff to TUN. This could be used to implement accepting an array of XDP buffs from vhost_net in the following patches. Signed-off-by: Jason Wang --- drivers/net/tun.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++- drivers/vhost/net.c | 21 ++++++++++-- include/linux/if_tun.h | 7 ++++ 3 files changed, 116 insertions(+), 3 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 2560378..b586b3f 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2387,18 +2387,107 @@ static void tun_sock_write_space(struct sock *sk) kill_fasync(&tfile->fasync, SIGIO, POLL_OUT); } +static int tun_xdp_one(struct tun_struct *tun, + struct tun_file *tfile, + struct xdp_buff *xdp) +{ + struct virtio_net_hdr *gso = xdp->data_hard_start + sizeof(int); + struct tun_pcpu_stats *stats; + struct bpf_prog *xdp_prog; + struct sk_buff *skb = NULL; + u32 rxhash = 0, act; + int buflen = *(int *)xdp->data_hard_start; + int err = 0; + bool skb_xdp = false; + + preempt_disable(); + rcu_read_lock(); + + xdp_prog = rcu_dereference(tun->xdp_prog); + if (xdp_prog) { + if (gso->gso_type) { + skb_xdp = true; + goto build; + } + xdp_set_data_meta_invalid(xdp); + xdp->rxq = &tfile->xdp_rxq; + act = tun_do_xdp(tun, tfile, xdp_prog, xdp, &err); + if (err) + goto out; + if (act != XDP_PASS) + goto out; + } + +build: + skb = build_skb(xdp->data_hard_start, buflen); + if (!skb) { + err = -ENOMEM; + goto out; + } + + if (skb_xdp) { + err = do_xdp_generic(xdp_prog, skb); + if (err != XDP_PASS) + goto out; + } + + skb_reserve(skb, xdp->data - xdp->data_hard_start); + skb_put(skb, xdp->data_end - xdp->data); + + if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) { + this_cpu_inc(tun->pcpu_stats->rx_frame_errors); + kfree_skb(skb); + err = -EINVAL; + goto out; + } + + skb->protocol = eth_type_trans(skb, tun->dev); + skb_reset_network_header(skb); + skb_probe_transport_header(skb, 0); + + if (!rcu_dereference(tun->steering_prog)) + rxhash = __skb_get_hash_symmetric(skb); + + netif_receive_skb(skb); + + stats = get_cpu_ptr(tun->pcpu_stats); + u64_stats_update_begin(&stats->syncp); + stats->rx_packets++; + stats->rx_bytes += skb->len; + u64_stats_update_end(&stats->syncp); + put_cpu_ptr(stats); + + if (rxhash) + tun_flow_update(tun, rxhash, tfile); + +out: + rcu_read_unlock(); + preempt_enable(); + + return err; +} + static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { int ret; struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); + struct tun_msg_ctl *ctl = m->msg_control; if (!tun) return -EBADFD; - ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter, + if (ctl && ctl->type == TUN_MSG_PTR) { + ret = tun_xdp_one(tun, tfile, ctl->ptr); + if (!ret) + ret = total_len; + goto out; + } + + ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter, m->msg_flags & MSG_DONTWAIT, m->msg_flags & MSG_MORE); +out: tun_put(tun); return ret; } diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 1209e84..0d84de6 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -117,6 +117,7 @@ struct vhost_net_virtqueue { struct vhost_net_ubuf_ref *ubufs; struct ptr_ring *rx_ring; struct vhost_net_buf rxq; + struct xdp_buff xdp[VHOST_RX_BATCH]; }; struct vhost_net { @@ -570,6 +571,7 @@ static void handle_tx_copy(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; + struct xdp_buff xdp; unsigned out, in; int head; struct msghdr msg = { @@ -584,6 +586,7 @@ static void handle_tx_copy(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); + struct tun_msg_ctl ctl; int sent_pkts = 0; s16 nheads = 0; @@ -628,6 +631,14 @@ static void handle_tx_copy(struct vhost_net *net) vq->heads[nheads].id = cpu_to_vhost32(vq, head); vq->heads[nheads].len = 0; + err = vhost_net_build_xdp(nvq, &msg.msg_iter, &xdp); + if (!err) { + ctl.type = TUN_MSG_PTR; + ctl.ptr = &xdp; + msg.msg_control = &ctl; + } else + msg.msg_control = NULL; + total_len += len; if (total_len < VHOST_NET_WEIGHT && vhost_has_more_pkts(net, vq)) { @@ -734,16 +745,21 @@ static void handle_tx_zerocopy(struct vhost_net *net) /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { struct ubuf_info *ubuf; + struct tun_msg_ctl ctl; + ubuf = nvq->ubuf_info + nvq->upend_idx; + ctl.type = TUN_MSG_UBUF; + ctl.ptr = ubuf; + vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head); vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; refcount_set(&ubuf->refcnt, 1); - msg.msg_control = ubuf; - msg.msg_controllen = sizeof(ubuf); + msg.msg_control = &ctl; + msg.msg_controllen = sizeof(ctl); ubufs = nvq->ubufs; atomic_inc(&ubufs->refcount); nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV; @@ -751,6 +767,7 @@ static void handle_tx_zerocopy(struct vhost_net *net) msg.msg_control = NULL; ubufs = NULL; } + total_len += len; if (total_len < VHOST_NET_WEIGHT && vhost_has_more_pkts(net, vq)) { diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 3d2996d..ba46dce 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -19,6 +19,13 @@ #define TUN_XDP_FLAG 0x1UL +#define TUN_MSG_UBUF 1 +#define TUN_MSG_PTR 2 +struct tun_msg_ctl { + int type; + void *ptr; +}; + #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); struct ptr_ring *tun_get_tx_ring(struct file *file); From patchwork Mon May 21 09:04:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 917468 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40qCYh6KW0z9s1R for ; Mon, 21 May 2018 19:06:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752792AbeEUJFg (ORCPT ); Mon, 21 May 2018 05:05:36 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46344 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752766AbeEUJFd (ORCPT ); Mon, 21 May 2018 05:05:33 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1252D401EF0C; Mon, 21 May 2018 09:05:33 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB1C64E68B; Mon, 21 May 2018 09:05:28 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 12/12] vhost_net: batch submitting XDP buffers to underlayer sockets Date: Mon, 21 May 2018 17:04:33 +0800 Message-Id: <1526893473-20128-13-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:33 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:33 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch implements XDP batching for vhost_net with tun. This is done by batching XDP buffs in vhost and submit them when: - vhost_net can not build XDP buff (mostly because of the size of packet) - #batched exceeds the limitation (VHOST_NET_RX_BATCH). - tun accept a batch of XDP buff through msg_control and process them in a batch With this tun XDP can benefit from e.g batch transmission during XDP_REDIRECT or XDP_TX. Tests shows 21% improvement on TX pps (from ~3.2Mpps to ~3.9Mpps) while transmitting through testpmd from guest to host by xdp_redirect_map between tap0 and ixgbe. Signed-off-by: Jason Wang --- drivers/net/tun.c | 36 +++++++++++++++++---------- drivers/vhost/net.c | 71 ++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 71 insertions(+), 36 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index b586b3f..5d16d18 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1616,7 +1616,6 @@ static u32 tun_do_xdp(struct tun_struct *tun, switch (act) { case XDP_REDIRECT: *err = xdp_do_redirect(tun->dev, xdp, xdp_prog); - xdp_do_flush_map(); if (*err) break; goto out; @@ -1624,7 +1623,6 @@ static u32 tun_do_xdp(struct tun_struct *tun, *err = tun_xdp_tx(tun->dev, xdp); if (*err) break; - tun_xdp_flush(tun->dev); goto out; case XDP_PASS: goto out; @@ -2400,9 +2398,6 @@ static int tun_xdp_one(struct tun_struct *tun, int err = 0; bool skb_xdp = false; - preempt_disable(); - rcu_read_lock(); - xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog) { if (gso->gso_type) { @@ -2461,15 +2456,12 @@ static int tun_xdp_one(struct tun_struct *tun, tun_flow_update(tun, rxhash, tfile); out: - rcu_read_unlock(); - preempt_enable(); - return err; } static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { - int ret; + int ret, i; struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); struct tun_msg_ctl *ctl = m->msg_control; @@ -2477,10 +2469,28 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) if (!tun) return -EBADFD; - if (ctl && ctl->type == TUN_MSG_PTR) { - ret = tun_xdp_one(tun, tfile, ctl->ptr); - if (!ret) - ret = total_len; + if (ctl && ((ctl->type & 0xF) == TUN_MSG_PTR)) { + int n = ctl->type >> 16; + + preempt_disable(); + rcu_read_lock(); + + for (i = 0; i < n; i++) { + struct xdp_buff *x = (struct xdp_buff *)ctl->ptr; + struct xdp_buff *xdp = &x[i]; + + xdp_set_data_meta_invalid(xdp); + xdp->rxq = &tfile->xdp_rxq; + tun_xdp_one(tun, tfile, xdp); + } + + xdp_do_flush_map(); + tun_xdp_flush(tun->dev); + + rcu_read_unlock(); + preempt_enable(); + + ret = total_len; goto out; } diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 0d84de6..bec4109 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -118,6 +118,7 @@ struct vhost_net_virtqueue { struct ptr_ring *rx_ring; struct vhost_net_buf rxq; struct xdp_buff xdp[VHOST_RX_BATCH]; + struct vring_used_elem heads[VHOST_RX_BATCH]; }; struct vhost_net { @@ -511,7 +512,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, void *buf; int copied; - if (len < nvq->sock_hlen) + if (unlikely(len < nvq->sock_hlen)) return -EFAULT; if (SKB_DATA_ALIGN(len + pad) + @@ -567,11 +568,37 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, return 0; } +static void vhost_tx_batch(struct vhost_net *net, + struct vhost_net_virtqueue *nvq, + struct socket *sock, + struct msghdr *msghdr, int n) +{ + struct tun_msg_ctl ctl = { + .type = n << 16 | TUN_MSG_PTR, + .ptr = nvq->xdp, + }; + int err; + + if (n == 0) + return; + + msghdr->msg_control = &ctl; + err = sock->ops->sendmsg(sock, msghdr, 0); + + if (unlikely(err < 0)) { + /* FIXME vq_err() */ + vq_err(&nvq->vq, "sendmsg err!\n"); + return; + } + vhost_add_used_and_signal_n(&net->dev, &nvq->vq, nvq->vq.heads, n); +} + +/* Expects to be always run from workqueue - which acts as + * read-size critical section for our kind of RCU. */ static void handle_tx_copy(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; - struct xdp_buff xdp; unsigned out, in; int head; struct msghdr msg = { @@ -586,7 +613,6 @@ static void handle_tx_copy(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); - struct tun_msg_ctl ctl; int sent_pkts = 0; s16 nheads = 0; @@ -631,22 +657,24 @@ static void handle_tx_copy(struct vhost_net *net) vq->heads[nheads].id = cpu_to_vhost32(vq, head); vq->heads[nheads].len = 0; - err = vhost_net_build_xdp(nvq, &msg.msg_iter, &xdp); - if (!err) { - ctl.type = TUN_MSG_PTR; - ctl.ptr = &xdp; - msg.msg_control = &ctl; - } else - msg.msg_control = NULL; - total_len += len; - if (total_len < VHOST_NET_WEIGHT && - vhost_has_more_pkts(net, vq)) { - msg.msg_flags |= MSG_MORE; - } else { - msg.msg_flags &= ~MSG_MORE; + err = vhost_net_build_xdp(nvq, &msg.msg_iter, + &nvq->xdp[nheads]); + if (!err) { + if (++nheads == VHOST_RX_BATCH) { + vhost_tx_batch(net, nvq, sock, &msg, nheads); + nheads = 0; + } + goto done; + } else if (unlikely(err != -ENOSPC)) { + vq_err(vq, "Fail to build XDP buffer\n"); + break; } + vhost_tx_batch(net, nvq, sock, &msg, nheads); + msg.msg_control = NULL; + nheads = 0; + /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { @@ -657,11 +685,9 @@ static void handle_tx_copy(struct vhost_net *net) if (err != len) pr_debug("Truncated TX packet: " " len %d != %zd\n", err, len); - if (++nheads == VHOST_RX_BATCH) { - vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, - nheads); - nheads = 0; - } + + vhost_add_used_and_signal(&net->dev, vq, head, 0); +done: if (vhost_exceeds_weight(++sent_pkts, total_len)) { vhost_poll_queue(&vq->poll); break; @@ -669,8 +695,7 @@ static void handle_tx_copy(struct vhost_net *net) } out: if (nheads) - vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, - nheads); + vhost_tx_batch(net, nvq, sock, &msg, nheads); mutex_unlock(&vq->mutex); }