{"id":810300,"url":"http://patchwork.ozlabs.org/api/1.2/patches/810300/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/patch/20170905223551.27925-2-ppenkov@google.com/","project":{"id":7,"url":"http://patchwork.ozlabs.org/api/1.2/projects/7/?format=json","name":"Linux network development","link_name":"netdev","list_id":"netdev.vger.kernel.org","list_email":"netdev@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20170905223551.27925-2-ppenkov@google.com>","list_archive_url":null,"date":"2017-09-05T22:35:50","name":"[net-next,RFC,1/2] tun: enable NAPI for TUN/TAP driver","commit_ref":null,"pull_url":null,"state":"rfc","archived":true,"hash":"b6ead5652f366caf38297e95b206138b65ac8fc5","submitter":{"id":72299,"url":"http://patchwork.ozlabs.org/api/1.2/people/72299/?format=json","name":"Petar Penkov","email":"ppenkov@google.com"},"delegate":{"id":34,"url":"http://patchwork.ozlabs.org/api/1.2/users/34/?format=json","username":"davem","first_name":"David","last_name":"Miller","email":"davem@davemloft.net"},"mbox":"http://patchwork.ozlabs.org/project/netdev/patch/20170905223551.27925-2-ppenkov@google.com/mbox/","series":[{"id":1659,"url":"http://patchwork.ozlabs.org/api/1.2/series/1659/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/list/?series=1659","date":"2017-09-05T22:35:50","name":"Improve code coverage of syzkaller","version":1,"mbox":"http://patchwork.ozlabs.org/series/1659/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/810300/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/810300/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=google.com header.i=@google.com\n\theader.b=\"J8Y0k5dq\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xn1ll0CKdz9sR9\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed,  6 Sep 2017 08:36:19 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1753992AbdIEWgQ (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 5 Sep 2017 18:36:16 -0400","from mail-pf0-f169.google.com ([209.85.192.169]:36479 \"EHLO\n\tmail-pf0-f169.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1752834AbdIEWgM (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Tue, 5 Sep 2017 18:36:12 -0400","by mail-pf0-f169.google.com with SMTP id e199so9921270pfh.3\n\tfor <netdev@vger.kernel.org>; Tue, 05 Sep 2017 15:36:11 -0700 (PDT)","from localhost ([2620:15c:2cb:1:183d:cea1:ba48:3c2f])\n\tby smtp.gmail.com with ESMTPSA id\n\tl85sm1058pfb.176.2017.09.05.15.36.10\n\t(version=TLS1_2 cipher=AES128-SHA bits=128/128);\n\tTue, 05 Sep 2017 15:36:10 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=google.com; s=20161025;\n\th=from:to:cc:subject:date:message-id:in-reply-to:references;\n\tbh=jUwVaVG3BlQ0DQXmCHrO6iqiVEnIg/gkGPxid4qQYro=;\n\tb=J8Y0k5dqPERDO1mKNRq2JSTmMPmea2bjBdX8fSJ/cbbs3R0Zf72ceYCNuAN5WT0D7W\n\ttKErBwiP6oI9Ttrxa6WYecx9elaC1O8R82qPpt0kLj0SOb/O2Hco2okwb8eO598IgHkj\n\tuFSb6rd2DFodXqM8erTsiBPWEHr+vDx4a3DGnb48rMKGwEYLV91T94ogyWY5Cp15BufN\n\tgaxle43UjD6IO9ruGn2Dqeqizxv+tj/rHuUOmv6P2E4FmH2IwxjHh6nnYVmNipPI9YKT\n\t9lTmGg86BKb4HPpsCw2EqIJ8342HH9FRsvzO3YYIReQWJKrO2RwpvUneonH/pTnP7VR7\n\teOsw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to\n\t:references;\n\tbh=jUwVaVG3BlQ0DQXmCHrO6iqiVEnIg/gkGPxid4qQYro=;\n\tb=is9iVCO95HKwer9AKTZNB2Qnx0kXCOu2SZwQNU/oT6DsfePzeBDwWtnZUrblSL0GHS\n\thl7Ew+o+Ub9MlWuTU10BiiiByRYRBHAherviBBsDaDyQtTUL7jdxQrDheAlIMYhy9cPV\n\tZTGOYx77VU3Fz72mQmIkp9gdGv4c/BL8lBU7QNe6pG3PFzptaYcSsO0ViKPsr23DrxYE\n\tALFCMFrfjOBXIfb24hAXUYorLkjp+CZZ10TYdJ9DIx4jr3KoZvtZ6/2updrP/09SvJgI\n\tYbQ4cT3OkjtpP97RzKrLM1RZM3VSgHe9llaPMrDlbJuWcyVAWBnROrNckAh4KCKUzapk\n\tvBGg==","X-Gm-Message-State":"AHPjjUhDUtla7uE0RP03GHd4AJ5QVRZH+C6rEtWZLIz1avAwtCXwMvb0\n\tIayAGhItRQ1Qry5HNb8HmH0T","X-Google-Smtp-Source":"ADKCNb7EahMEvdGlXAWuAyvJTL23Tdv3F9vA/3JP5L/zeTU+FGsBRu9hBzRUkr4UPfRcz6jYS6d9zg==","X-Received":"by 10.98.10.12 with SMTP id s12mr5284530pfi.127.1504650970777;\n\tTue, 05 Sep 2017 15:36:10 -0700 (PDT)","From":"Petar Penkov <ppenkov@google.com>","To":"netdev@vger.kernel.org","Cc":"Petar Penkov <ppenkov@google.com>, Eric Dumazet <edumazet@google.com>,\n\tMahesh Bandewar <maheshb@google.com>,\n\tWillem de Bruijn <willemb@google.com>, davem@davemloft.net,\n\tppenkov@stanford.edu","Subject":"[PATCH net-next RFC 1/2] tun: enable NAPI for TUN/TAP driver","Date":"Tue,  5 Sep 2017 15:35:50 -0700","Message-Id":"<20170905223551.27925-2-ppenkov@google.com>","X-Mailer":"git-send-email 2.14.1.581.gf28d330327-goog","In-Reply-To":"<20170905223551.27925-1-ppenkov@google.com>","References":"<20170905223551.27925-1-ppenkov@google.com>","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"},"content":"Changes TUN driver to use napi_gro_receive() upon receiving packets\nrather than netif_rx_ni(). Adds flag CONFIG_TUN_NAPI that enables\nthese changes and operation is not affected if the flag is disabled.\nSKBs are constructed upon packet arrival and are queued to be\nprocessed later.\n\nThe new path was evaluated with a benchmark with the following setup:\nOpen two tap devices and a receiver thread that reads in a loop for\neach device. Start one sender thread and pin all threads to different\nCPUs. Send 1M minimum UDP packets to each device and measure sending\ntime for each of the sending methods:\n\tnapi_gro_receive(): \t4.90s\n\tnetif_rx_ni(): \t\t4.90s\n\tnetif_receive_skb(): \t7.20s\n\nSigned-off-by: Petar Penkov <ppenkov@google.com>\nCc: Eric Dumazet <edumazet@google.com>\nCc: Mahesh Bandewar <maheshb@google.com>\nCc: Willem de Bruijn <willemb@google.com>\nCc: davem@davemloft.net\nCc: ppenkov@stanford.edu\n---\n drivers/net/Kconfig |   8 ++++\n drivers/net/tun.c   | 120 +++++++++++++++++++++++++++++++++++++++++++++++-----\n 2 files changed, 118 insertions(+), 10 deletions(-)","diff":"diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig\nindex 83a1616903f8..34850d71ddd1 100644\n--- a/drivers/net/Kconfig\n+++ b/drivers/net/Kconfig\n@@ -307,6 +307,14 @@ config TAP\n \t  This option is selected by any driver implementing tap user space\n \t  interface for a virtual interface to re-use core tap functionality.\n \n+config TUN_NAPI\n+\tbool \"NAPI support on tx path for TUN/TAP driver\"\n+\tdefault n\n+\tdepends on TUN\n+\t---help---\n+\t  This option allows the TUN/TAP driver to use NAPI to pass packets to\n+\t  the kernel when receiving packets from user space via write()/send().\n+\n config TUN_VNET_CROSS_LE\n \tbool \"Support for cross-endian vnet headers on little-endian kernels\"\n \tdefault n\ndiff --git a/drivers/net/tun.c b/drivers/net/tun.c\nindex 06e8f0bb2dab..d5c824e3ec42 100644\n--- a/drivers/net/tun.c\n+++ b/drivers/net/tun.c\n@@ -172,6 +172,7 @@ struct tun_file {\n \t\tu16 queue_index;\n \t\tunsigned int ifindex;\n \t};\n+\tstruct napi_struct napi;\n \tstruct list_head next;\n \tstruct tun_struct *detached;\n \tstruct skb_array tx_array;\n@@ -229,6 +230,67 @@ struct tun_struct {\n \tstruct bpf_prog __rcu *xdp_prog;\n };\n \n+static int tun_napi_receive(struct napi_struct *napi, int budget)\n+{\n+\tstruct tun_file *tfile = container_of(napi, struct tun_file, napi);\n+\tstruct sk_buff_head *queue = &tfile->sk.sk_write_queue;\n+\tstruct sk_buff_head process_queue;\n+\tstruct sk_buff *skb;\n+\tint received = 0;\n+\n+\t__skb_queue_head_init(&process_queue);\n+\n+\tspin_lock(&queue->lock);\n+\tskb_queue_splice_tail_init(queue, &process_queue);\n+\tspin_unlock(&queue->lock);\n+\n+\twhile (received < budget && (skb = __skb_dequeue(&process_queue))) {\n+\t\tnapi_gro_receive(napi, skb);\n+\t\t++received;\n+\t}\n+\n+\tif (!skb_queue_empty(&process_queue)) {\n+\t\tspin_lock(&queue->lock);\n+\t\tskb_queue_splice(&process_queue, queue);\n+\t\tspin_unlock(&queue->lock);\n+\t}\n+\n+\treturn received;\n+}\n+\n+static int tun_napi_poll(struct napi_struct *napi, int budget)\n+{\n+\tunsigned int received;\n+\n+\treceived = tun_napi_receive(napi, budget);\n+\n+\tif (received < budget)\n+\t\tnapi_complete_done(napi, received);\n+\n+\treturn received;\n+}\n+\n+static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile)\n+{\n+\tif (IS_ENABLED(CONFIG_TUN_NAPI)) {\n+\t\tnetif_napi_add(tun->dev, &tfile->napi, tun_napi_poll,\n+\t\t\t       NAPI_POLL_WEIGHT);\n+\t\tnapi_enable(&tfile->napi);\n+\t}\n+}\n+\n+static void tun_napi_disable(struct tun_file *tfile)\n+{\n+\tif (IS_ENABLED(CONFIG_TUN_NAPI))\n+\t\tnapi_disable(&tfile->napi);\n+}\n+\n+static void tun_napi_del(struct tun_file *tfile)\n+{\n+\tif (IS_ENABLED(CONFIG_TUN_NAPI))\n+\t\tnetif_napi_del(&tfile->napi);\n+}\n+\n #ifdef CONFIG_TUN_VNET_CROSS_LE\n static inline bool tun_legacy_is_little_endian(struct tun_struct *tun)\n {\n@@ -541,6 +603,11 @@ static void __tun_detach(struct tun_file *tfile, bool clean)\n \n \ttun = rtnl_dereference(tfile->tun);\n \n+\tif (tun && clean) {\n+\t\ttun_napi_disable(tfile);\n+\t\ttun_napi_del(tfile);\n+\t}\n+\n \tif (tun && !tfile->detached) {\n \t\tu16 index = tfile->queue_index;\n \t\tBUG_ON(index >= tun->numqueues);\n@@ -598,6 +665,7 @@ static void tun_detach_all(struct net_device *dev)\n \tfor (i = 0; i < n; i++) {\n \t\ttfile = rtnl_dereference(tun->tfiles[i]);\n \t\tBUG_ON(!tfile);\n+\t\ttun_napi_disable(tfile);\n \t\ttfile->socket.sk->sk_shutdown = RCV_SHUTDOWN;\n \t\ttfile->socket.sk->sk_data_ready(tfile->socket.sk);\n \t\tRCU_INIT_POINTER(tfile->tun, NULL);\n@@ -613,6 +681,7 @@ static void tun_detach_all(struct net_device *dev)\n \tsynchronize_net();\n \tfor (i = 0; i < n; i++) {\n \t\ttfile = rtnl_dereference(tun->tfiles[i]);\n+\t\ttun_napi_del(tfile);\n \t\t/* Drop read queue */\n \t\ttun_queue_purge(tfile);\n \t\tsock_put(&tfile->sk);\n@@ -677,10 +746,12 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte\n \trcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);\n \ttun->numqueues++;\n \n-\tif (tfile->detached)\n+\tif (tfile->detached) {\n \t\ttun_enable_queue(tfile);\n-\telse\n+\t} else {\n \t\tsock_hold(&tfile->sk);\n+\t\ttun_napi_init(tun, tfile);\n+\t}\n \n \ttun_set_real_num_queues(tun);\n \n@@ -956,13 +1027,28 @@ static void tun_poll_controller(struct net_device *dev)\n \t * Tun only receives frames when:\n \t * 1) the char device endpoint gets data from user space\n \t * 2) the tun socket gets a sendmsg call from user space\n-\t * Since both of those are synchronous operations, we are guaranteed\n-\t * never to have pending data when we poll for it\n-\t * so there is nothing to do here but return.\n+\t * If NAPI is not enabled, since both of those are synchronous\n+\t * operations, we are guaranteed never to have pending data when we poll\n+\t * for it so there is nothing to do here but return.\n \t * We need this though so netpoll recognizes us as an interface that\n \t * supports polling, which enables bridge devices in virt setups to\n \t * still use netconsole\n+\t * If NAPI is enabled, however, we need to schedule polling for all\n+\t * queues.\n \t */\n+\n+\tif (IS_ENABLED(CONFIG_TUN_NAPI)) {\n+\t\tstruct tun_struct *tun = netdev_priv(dev);\n+\t\tstruct tun_file *tfile;\n+\t\tint i;\n+\n+\t\trcu_read_lock();\n+\t\tfor (i = 0; i < tun->numqueues; i++) {\n+\t\t\ttfile = rcu_dereference(tun->tfiles[i]);\n+\t\t\tnapi_schedule(&tfile->napi);\n+\t\t}\n+\t\trcu_read_unlock();\n+\t}\n \treturn;\n }\n #endif\n@@ -1535,11 +1621,25 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,\n \t}\n \n \trxhash = __skb_get_hash_symmetric(skb);\n-#ifndef CONFIG_4KSTACKS\n-\ttun_rx_batched(tun, tfile, skb, more);\n-#else\n-\tnetif_rx_ni(skb);\n-#endif\n+\n+\tif (IS_ENABLED(CONFIG_TUN_NAPI)) {\n+\t\tstruct sk_buff_head *queue = &tfile->sk.sk_write_queue;\n+\t\tint queue_len;\n+\n+\t\tspin_lock_bh(&queue->lock);\n+\t\t__skb_queue_tail(queue, skb);\n+\t\tqueue_len = skb_queue_len(queue);\n+\t\tspin_unlock(&queue->lock);\n+\n+\t\tif (!more || queue_len > NAPI_POLL_WEIGHT)\n+\t\t\tnapi_schedule(&tfile->napi);\n+\n+\t\tlocal_bh_enable();\n+\t} else if (!IS_ENABLED(CONFIG_4KSTACKS)) {\n+\t\ttun_rx_batched(tun, tfile, skb, more);\n+\t} else {\n+\t\tnetif_rx_ni(skb);\n+\t}\n \n \tstats = get_cpu_ptr(tun->pcpu_stats);\n \tu64_stats_update_begin(&stats->syncp);\n","prefixes":["net-next","RFC","1/2"]}