From patchwork Fri Dec 20 00:41:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1213843 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="KlnCmZs1"; dkim-atps=neutral Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47f91q51vVz9sRd for ; Fri, 20 Dec 2019 11:42:23 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 0C0F388118; Fri, 20 Dec 2019 00:42:21 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VkbIhUL8WfkX; Fri, 20 Dec 2019 00:42:17 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id B0328876FB; Fri, 20 Dec 2019 00:42:17 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8DF44C1AE8; Fri, 20 Dec 2019 00:42:17 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id D019BC077D for ; Fri, 20 Dec 2019 00:42:15 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id B57848716E for ; Fri, 20 Dec 2019 00:42:15 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KScQ5ipyfFzi for ; Fri, 20 Dec 2019 00:42:14 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 32A308716A for ; Fri, 20 Dec 2019 00:42:14 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id u63so3207968pjb.0 for ; Thu, 19 Dec 2019 16:42:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=Iddzk1q6Sbgh6jzVzbO445g2Ug6AGcVqWaDBek/sNfM=; b=KlnCmZs116z6t4lC0aM/ME1rfNnHvJLc157alCRuxvTelT/u4skB1Bz27JlJCQi9Ks 4XTcLDD8lNEDoJsGugcy8qVIcR8BLRrM9iKXzXtxdmYZHAqe4EWrArD+vMPcjw5CgyC0 UU42N++WjXLV8MfgNO+LA7ICRNUntANUKpUFog3G9OA7vWnyFnq9J82N7vqDd5vnGADZ 4zZeVVzeAhgujS2gRS0UysT1bpeGmH3dYEjU4ox6GJ8CFSybXqQOgxIlu7JKiUNeuAP6 tnF5oaTOKAhmohaBp0PcFyURTPvxR0SK/gTRTGJVryFIaymi0J0uNGiy6VnLxs1qCrw9 Zy3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=Iddzk1q6Sbgh6jzVzbO445g2Ug6AGcVqWaDBek/sNfM=; b=Mn7S+MrdwwfjGK1bC+EiA712uYbheDwRsE6pVavR79ejAGyibOcFAcQkvGIFriv57v PV5mmPzjhD9nvCwKc412e60K8ZufZBW0VIXGImHzXMdAP4Q2sWCcHPGEIDSj9Yz/RgkH Ger6ZR5Ia6njZ2pQHWOdFzp54h1t+9OFJ9MHbUlO4+Jn9DDwUo+frnR6k3TWMwz5XFKI yQp3x0Iri/M+Z4hHc9FwIXuyTttZP70YAU6/HUwVPt9gIrpi9jkNnZ1sWZi4GOMLEFK5 Y2RxNAq57RvVsMXGM2FK+jL7zyjgIDOpAUqmyCd1zDX9KcJIcFJU+nNwZOrj4PRvtNrZ DqvA== X-Gm-Message-State: APjAAAWjVQwTnI+ojbLSaW3UZsRxC166cgDMwkJBoI4nvhOFnZwNnqAV 4yKTB1S1MiNRRE/tgZHA7RKLecLm X-Google-Smtp-Source: APXvYqyv8uOSZ8QNRkXVmOvskwcIe/WPQQ/Ay7nrtKQT4uxRkwXfjIpsi2KLJ+QJ2zGhSAlKF3qiZg== X-Received: by 2002:a17:90a:c389:: with SMTP id h9mr4697356pjt.128.1576802533267; Thu, 19 Dec 2019 16:42:13 -0800 (PST) Received: from sc9-mailhost3.vmware.com ([66.170.99.95]) by smtp.gmail.com with ESMTPSA id i11sm8431430pjg.0.2019.12.19.16.42.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Dec 2019 16:42:12 -0800 (PST) From: William Tu To: dev@openvswitch.org Date: Thu, 19 Dec 2019 16:41:25 -0800 Message-Id: <1576802485-15017-1-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Currently the performance of sending packets from userspace ovs to kernel veth device is pretty bad as reported from YiYang[1]. The patch adds AF_PACKET v3, tpacket v3, as another way to tx/rx packet to linux device, hopefully showing better performance. AF_PACKET v3 should get closed to 1Mpps, as shown[2]. However, my current patch using iperf tcp shows only 1.4Gbps, maybe I'm doing something wrong. Also DPDK has similar implementation using AF_PACKET v2[3]. This is still work-in-progress but any feedbacks are welcome. [1] https://patchwork.ozlabs.org/patch/1204939/ [2] slide 18, https://www.netdevconf.info/2.2/slides/karlsson-afpacket-talk.pdf [3] dpdk/drivers/net/af_packet/rte_eth_af_packet.c --- lib/automake.mk | 2 + lib/netdev-linux-private.h | 23 +++ lib/netdev-linux.c | 24 ++- lib/netdev-provider.h | 1 + lib/netdev-tpacket.c | 487 +++++++++++++++++++++++++++++++++++++++++++++ lib/netdev-tpacket.h | 43 ++++ lib/netdev.c | 1 + 7 files changed, 580 insertions(+), 1 deletion(-) create mode 100644 lib/netdev-tpacket.c create mode 100644 lib/netdev-tpacket.h diff --git a/lib/automake.mk b/lib/automake.mk index 17b36b43d9d7..0c635404cb43 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -398,6 +398,8 @@ lib_libopenvswitch_la_SOURCES += \ lib/netdev-linux.c \ lib/netdev-linux.h \ lib/netdev-linux-private.h \ + lib/netdev-tpacket.c \ + lib/netdev-tpacket.h \ lib/netdev-offload-tc.c \ lib/netlink-conntrack.c \ lib/netlink-conntrack.h \ diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h index f08159aa7b53..99a2c03bb2a6 100644 --- a/lib/netdev-linux-private.h +++ b/lib/netdev-linux-private.h @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -37,6 +38,24 @@ struct netdev; +/* tpacket rx and tx ring structure. */ +struct tp_ring { + struct iovec *rd; /* rd[n] points to mmap area. */ + int rd_len; + int rd_num; + char *mm; /* mmap address. */ + size_t mm_len; + unsigned int next_avail_block; + int frame_len; +}; + +struct tpacket_info { + int fd; + struct tpacket_req3 req; + struct tp_ring rxring; + struct tp_ring txring; +}; + struct netdev_rxq_linux { struct netdev_rxq up; bool is_tap; @@ -110,6 +129,10 @@ struct netdev_linux { struct netdev_afxdp_tx_lock *tx_locks; /* Array of locks for TX queues. */ #endif + + /* tpacket v3 information. */ + struct tpacket_info **tps; + int n_tps; }; static bool diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index f8e59bacfb13..edfc389ee6f2 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -36,9 +36,10 @@ #include #include #include +#include #include #include -#include +//#include #include #include #include @@ -57,6 +58,7 @@ #include "openvswitch/hmap.h" #include "netdev-afxdp.h" #include "netdev-provider.h" +#include "netdev-tpacket.h" #include "netdev-vport.h" #include "netlink-notifier.h" #include "netlink-socket.h" @@ -3315,6 +3317,26 @@ const struct netdev_class netdev_afxdp_class = { .rxq_recv = netdev_afxdp_rxq_recv, }; #endif + +const struct netdev_class netdev_tpacket_class = { + NETDEV_LINUX_CLASS_COMMON, + .type = "tpacket", + .is_pmd = true, + .construct = netdev_linux_construct, + .destruct = netdev_linux_destruct, + .get_stats = netdev_linux_get_stats, + .get_features = netdev_linux_get_features, + .get_status = netdev_linux_get_status, + .set_config = netdev_tpacket_set_config, + .get_config = netdev_tpacket_get_config, + .reconfigure = netdev_tpacket_reconfigure, + .get_block_id = netdev_linux_get_block_id, + .get_numa_id = netdev_afxdp_get_numa_id, + .send = netdev_tpacket_batch_send, + .rxq_construct = netdev_linux_rxq_construct, + .rxq_destruct = netdev_linux_rxq_destruct, + .rxq_recv = netdev_tpacket_rxq_recv, +}; #define CODEL_N_QUEUES 0x0000 diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index f109c4e66f0d..518d1dc6e02c 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -833,6 +833,7 @@ extern const struct netdev_class netdev_bsd_class; extern const struct netdev_class netdev_windows_class; #else extern const struct netdev_class netdev_linux_class; +extern const struct netdev_class netdev_tpacket_class; #endif extern const struct netdev_class netdev_internal_class; extern const struct netdev_class netdev_tap_class; diff --git a/lib/netdev-tpacket.c b/lib/netdev-tpacket.c new file mode 100644 index 000000000000..798ce776838f --- /dev/null +++ b/lib/netdev-tpacket.c @@ -0,0 +1,487 @@ +/* + * Copyright (c) 2019 VMware, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "netdev-linux-private.h" +#include "netdev-linux.h" +#include "netdev-tpacket.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "coverage.h" +#include "dp-packet.h" +#include "dpif-netdev.h" +#include "fatal-signal.h" +#include "openvswitch/compiler.h" +#include "openvswitch/dynamic-string.h" +#include "openvswitch/list.h" +#include "openvswitch/thread.h" +#include "openvswitch/vlog.h" +#include "packets.h" +#include "socket-util.h" +#include "util.h" + +COVERAGE_DEFINE(tpacket_rx_busy); +COVERAGE_DEFINE(tpacket_tx_busy); + +VLOG_DEFINE_THIS_MODULE(netdev_tpacket); +//static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + +/* One block contains two frames. */ +#define TP_BLOCKSZ 4096 +#define TP_FRAMESZ 2048 +#define TP_NUM_DESCS 1024 +#define TP_BLOCKNR 1024 +#define TP_BLOCKNR_MASK (TP_BLOCKNR - 1) +#define TP_FRAMENR (TP_BLOCKNR * (TP_BLOCKSZ/TP_FRAMESZ)) +#define TP_FRAMENR_MASK (TP_FRAMENR -1) +#define BATCH_SIZE NETDEV_MAX_BURST + +#define barrier() __asm__ __volatile__("" : : : "memory") + +static struct tpacket_info *tpacket_configure(struct netdev_linux *dev); +static int tpacket_configure_all(struct netdev_linux *dev); +static void tpacket_destroy(struct tpacket_info *tp); +static void tpacket_destroy_all(struct netdev_linux *dev); + +static void +tpacket_fill_v3(struct tpacket_req3 *r) +{ + memset(r, 0, sizeof *r); + + r->tp_block_size = TP_BLOCKSZ; /* Minimal size of contiguous block. */ + r->tp_frame_size = TP_FRAMESZ; /* Size of frame. */ + r->tp_block_nr = TP_BLOCKNR; /* Number of blocks. */ + r->tp_frame_nr = TP_FRAMENR; /* Number of frames. */ + r->tp_retire_blk_tov = 0; /* Timeout in msecs. */ + r->tp_sizeof_priv = 0; /* Offset to private data area. */ + r->tp_feature_req_word = 0; + //r->tp_feature_req_word = TP_FT_REQ_FILL_RXHASH; +} + +int +netdev_tpacket_set_config(struct netdev *netdev, + const struct smap *args OVS_UNUSED, + char **errp OVS_UNUSED) +{ + netdev_request_reconfigure(netdev); + return 0; +} + +int +netdev_tpacket_get_config(const struct netdev *netdev OVS_UNUSED, + struct smap *args OVS_UNUSED) +{ + return 0; +} + +static struct tpacket_info * +tpacket_configure(struct netdev_linux *dev) +{ + struct tpacket_req3 req; + struct tpacket_info *tp; + struct sockaddr_ll ll; + int ver, fd, ifindex; + int error, i, noqdisc; + + tp = xmalloc(sizeof *tp); + if (!tp) { + ovs_mutex_unlock(&dev->mutex); + return NULL; + } + memset(tp, 0, sizeof *tp); + + tp->fd = fd = socket(PF_PACKET, SOCK_RAW, 0); + if (fd < 0) { + VLOG_ERR("tpacket: create PF_PACKET failed: %s", ovs_strerror(errno)); + error = errno; + goto error; + } + + ver = TPACKET_V3; + error = setsockopt(fd, SOL_PACKET, PACKET_VERSION, &ver, sizeof(ver)); + if (error) { + VLOG_ERR("tpacket: set version failed: %s", ovs_strerror(errno)); + goto error; + } + + tpacket_fill_v3(&req); + error = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &req, sizeof req); + if (error) { + VLOG_ERR("tpacket: set rx_ring failed: %s", ovs_strerror(errno)); + goto error; + } + error = setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &req, sizeof req); + if (error) { + VLOG_ERR("tpacket: set tx_ring failed: %s", ovs_strerror(errno)); + goto error; + } + tp->req = req; + + /* Configure rx/tx ring. */ + tp->rxring.mm_len = req.tp_block_size * req.tp_block_nr; + tp->rxring.mm = mmap(0, 2 * tp->rxring.mm_len, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_LOCKED | MAP_POPULATE, fd, 0); + if (!tp->rxring.mm) { + VLOG_ERR("tpacket: mmap rx_ring failed: %s", ovs_strerror(errno)); + goto error; + } + tp->txring.mm_len = tp->rxring.mm_len; + tp->txring.mm = tp->rxring.mm + tp->rxring.mm_len; + + tp->rxring.rd_num = tp->txring.rd_num = req.tp_block_nr; + tp->rxring.rd_len = tp->txring.rd_len = + req.tp_block_nr * sizeof *tp->rxring.rd; + + tp->rxring.rd = xmalloc(tp->rxring.rd_len); + if (!tp->rxring.rd) { + return NULL; + } + memset(tp->rxring.rd, 0, tp->rxring.rd_len); + + tp->txring.rd = xmalloc(tp->txring.rd_len); + if (!tp->txring.rd) { + return NULL; + } + memset(tp->txring.rd, 0, tp->txring.rd_len); + + for (i = 0; i < tp->rxring.rd_num; i++) { + tp->rxring.rd[i].iov_base = tp->rxring.mm + (i * req.tp_block_size); + tp->rxring.rd[i].iov_len = req.tp_block_size; + } + for (i = 0; i < tp->txring.rd_num; i++) { + tp->txring.rd[i].iov_base = tp->txring.mm + (i * req.tp_block_size); + tp->txring.rd[i].iov_len = req.tp_block_size; + } + + noqdisc = 1; + setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, + &noqdisc, sizeof(noqdisc)); + + ifindex = linux_get_ifindex(netdev_get_name(&dev->up)); + + ll.sll_family = PF_PACKET; + ll.sll_protocol = htons(ETH_P_ALL); + ll.sll_ifindex = ifindex; + ll.sll_hatype = 0; + ll.sll_pkttype = 0; + ll.sll_halen = 0; + + error = bind(fd, (struct sockaddr *)&ll, sizeof ll); + if (error) { + VLOG_ERR("tpacket: bind failed: %s", ovs_strerror(errno)); + goto error_unmap; + } + + return tp; + +error_unmap: + munmap(tp->rxring.mm, tp->rxring.mm_len * 2); +error: + if (tp) { + free(tp); + } + if (fd >= 0) { + close(fd); + } + + return NULL; +} + +static int +tpacket_configure_all(struct netdev_linux *dev) +{ + int n_tps, i; + + n_tps = dev->n_tps; + dev->tps = calloc(n_tps, sizeof(struct tpacket_info *)); + + for (i = 0; i < n_tps; i++) { + VLOG_INFO("tpacket: configure %dth queue.", i); + dev->tps[i] = tpacket_configure(dev); + if (!dev->tps[i]) { + VLOG_ERR("tpacket: configure %dth queue failed.", i); + goto error; + } + } + return 0; + +error: + tpacket_destroy_all(dev); + return EINVAL; +} + +static void +tpacket_destroy(struct tpacket_info *tp) +{ + if (!tp) { + return; + } + munmap(tp->rxring.mm, tp->rxring.mm_len * 2); /* Both rx and tx. */ + close(tp->fd); + free(tp->rxring.rd); + free(tp->txring.rd); + free(tp); +} + +static void +tpacket_destroy_all(struct netdev_linux *dev) +{ + int i; + + if (!dev->tps) { + return; + } + for (i = 0; i < dev->n_tps; i++) { + tpacket_destroy(dev->tps[i]); + } +} + +int +netdev_tpacket_reconfigure(struct netdev *netdev) +{ + struct netdev_linux *dev = netdev_linux_cast(netdev); + int err = 0; + + ovs_mutex_lock(&dev->mutex); + + netdev->n_rxq = 1; + dev->n_tps = netdev->n_rxq; + tpacket_destroy_all(dev); + + err = tpacket_configure_all(dev); + if (err) { + VLOG_ERR("%s: tpacket reconfiguration failed.", + netdev_get_name(netdev)); + } + netdev_change_seq_changed(netdev); + ovs_mutex_unlock(&dev->mutex); + + return err; +} + +static inline uint32_t +get_block_status(struct tpacket_block_desc *desc) +{ + barrier(); + return desc->hdr.bh1.block_status; +} + +static inline void +set_block_status(struct tpacket_block_desc *desc, uint32_t status) +{ + desc->hdr.bh1.block_status = status; + barrier(); +} + +static inline uint32_t +get_num_pkts(struct tpacket_block_desc *desc) +{ + return desc->hdr.bh1.num_pkts; +} + +static inline uint32_t +first_pkt_ofs(struct tpacket_block_desc *desc) +{ + return desc->hdr.bh1.offset_to_first_pkt; +} + +static uint64_t block_seq_num = 0; +static void OVS_UNUSED +check_seq_num(struct tpacket_block_desc *desc) +{ + uint64_t seq = desc->hdr.bh1.seq_num; + + if (block_seq_num + 1 != seq) { + VLOG_ERR("seq no %"PRIu64" + 1 != %"PRIu64, + block_seq_num, seq); + } else { + block_seq_num = seq; + } +} +int +netdev_tpacket_rxq_recv(struct netdev_rxq *rxq_, + struct dp_packet_batch *batch, + int *qfill) +{ + struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_); + struct netdev *netdev = rx->up.netdev; + struct netdev_linux *dev = netdev_linux_cast(netdev); + int qid = rxq_->queue_id; + struct tpacket_block_desc *desc; + struct tpacket_info *tp; + struct tp_ring *rxring; + unsigned int block_num, n_pkts = 0; + + tp = dev->tps[qid]; + if (!tp) { + return EAGAIN; + } + rx->fd = tp->fd; + rxring = &tp->rxring; + block_num = rxring->next_avail_block; + dp_packet_batch_init(batch); + + while (n_pkts < BATCH_SIZE) { + struct tpacket3_hdr *tphdr; + struct dp_packet *packet; + uint32_t num_pkts; + char *data; + int i; + + block_num = block_num & TP_BLOCKNR_MASK; + desc = (struct tpacket_block_desc *)rxring->rd[block_num].iov_base; + while ((get_block_status(desc) & TP_STATUS_USER) == 0) { + if (batch->count == 0) { +#if 0 + struct pollfd pfd; + memset(&pfd, 0, sizeof pfd); + pfd.fd = tp->fd; + pfd.events = POLLIN | POLLERR; + pfd.events = 0; + poll(&pfd, 1, 1); +#endif + COVERAGE_INC(tpacket_rx_busy); + return EAGAIN; + } else { + goto out; + } + } + + check_seq_num(desc); + num_pkts = get_num_pkts(desc); + tphdr = (struct tpacket3_hdr *) + ((char *)desc + first_pkt_ofs(desc)); + + /* A block might have multiple frames(packets). */ + for (i = 0; i < num_pkts; i++) { + data = (char *)tphdr + tphdr->tp_mac; + packet = dp_packet_clone_data_with_headroom(data, + tphdr->tp_snaplen, + DP_NETDEV_HEADROOM); + dp_packet_set_size(packet, tphdr->tp_snaplen); + dp_packet_set_rss_hash(packet, tphdr->hv1.tp_rxhash); + dp_packet_batch_add(batch, packet); + + tphdr = (struct tpacket3_hdr *)((char *)tphdr + + tphdr->tp_next_offset); + barrier(); + n_pkts++; + } + + block_num++; + rxring->next_avail_block++; + set_block_status(desc, TP_STATUS_KERNEL); + } + +out: + if (qfill) { + *qfill = 0; + } + + return 0; +} + +static inline struct tpacket3_hdr * +get_next_tx_frame(struct tp_ring *txring, int n) +{ + char *start = txring->rd[0].iov_base; + + return (struct tpacket3_hdr *)(start + (n * TP_FRAMESZ)); +} + +int +netdev_tpacket_batch_send(struct netdev *netdev, int qid, + struct dp_packet_batch *batch, + bool concurrent_txq OVS_UNUSED) +{ + struct netdev_linux *dev = netdev_linux_cast(netdev); + struct dp_packet *packet; + struct tpacket_info *tp; + struct tp_ring *txring; + unsigned int frame_num; + int error = 0; + int retries = 3; + + tp = dev->tps[qid]; + if (!tp) { + error = EAGAIN; + goto out; + } + txring = &tp->txring; + frame_num = txring->next_avail_block; + + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + struct tpacket3_hdr *tphdr; + int size; + + frame_num = frame_num & TP_FRAMENR_MASK; + tphdr = get_next_tx_frame(txring, frame_num); +#if 0 + if (!(tphdr->tp_status & TP_STATUS_AVAILABLE)) { + COVERAGE_INC(tpacket_tx_busy); + } +#endif + if (tphdr->tp_status & + (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING)) { + barrier(); + COVERAGE_INC(tpacket_tx_busy); + error = EAGAIN; + goto out; + } + + size = dp_packet_size(packet); + tphdr->tp_snaplen = size; + tphdr->tp_len = size; + tphdr->tp_next_offset = 0; + + memcpy((char *)tphdr + TPACKET3_HDRLEN - sizeof(struct sockaddr_ll), + dp_packet_data(packet), size); + + frame_num++; + txring->next_avail_block++; + barrier(); + tphdr->tp_status = TP_STATUS_SEND_REQUEST; + } + +kick_retry: + error = sendto(tp->fd, NULL, 0, MSG_DONTWAIT, NULL, 0); + if (error < 0) { + if (retries-- && errno == EAGAIN) { + COVERAGE_INC(tpacket_tx_busy); + goto kick_retry; + } else { + goto out; + } + } + + return 0; + +out: + dp_packet_delete_batch(batch, true); + return error; +} diff --git a/lib/netdev-tpacket.h b/lib/netdev-tpacket.h new file mode 100644 index 000000000000..2a80f962e0b7 --- /dev/null +++ b/lib/netdev-tpacket.h @@ -0,0 +1,43 @@ +/* + * Copyright (c) 2018, 2019 VMware, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef NETDEV_TPACKET_H +#define NETDEV_TPACKET_H 1 + +#include +#include + +struct dp_packet; +struct dp_packet_batch; +struct netdev; +struct netdev_custom_stats; +struct netdev_rxq; +struct netdev_stats; +struct smap; + +int netdev_tpacket_rxq_recv(struct netdev_rxq *rxq_, + struct dp_packet_batch *batch, + int *qfill); +int netdev_tpacket_batch_send(struct netdev *netdev_, int qid, + struct dp_packet_batch *batch, + bool concurrent_txq); +int netdev_tpacket_set_config(struct netdev *netdev, const struct smap *args, + char **errp); +int netdev_tpacket_get_config(const struct netdev *netdev, struct smap *args); +int netdev_tpacket_get_custom_stats(const struct netdev *netdev, + struct netdev_custom_stats *custom_stats); +int netdev_tpacket_reconfigure(struct netdev *netdev); +#endif /* netdev-tpacket.h */ diff --git a/lib/netdev.c b/lib/netdev.c index 405c98c687fa..3710834521d5 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -145,6 +145,7 @@ netdev_initialize(void) #ifdef __linux__ netdev_register_provider(&netdev_linux_class); + netdev_register_provider(&netdev_tpacket_class); netdev_register_provider(&netdev_internal_class); netdev_register_provider(&netdev_tap_class); netdev_vport_tunnel_register();