From patchwork Wed Sep 11 14:10:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Obrembski X-Patchwork-Id: 1161019 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46T3mH0GTvz9s4Y for ; Thu, 12 Sep 2019 00:13:51 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id D2880148C; Wed, 11 Sep 2019 14:11:59 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B34981488 for ; Wed, 11 Sep 2019 14:11:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 8CEB6894 for ; Wed, 11 Sep 2019 14:11:57 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2019 07:11:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,493,1559545200"; d="scan'208";a="200528250" Received: from mobrembx-mobl.ger.corp.intel.com ([10.103.104.26]) by fmsmga001.fm.intel.com with ESMTP; 11 Sep 2019 07:11:55 -0700 From: Michal Obrembski To: dev@openvswitch.org Date: Wed, 11 Sep 2019 16:10:01 +0200 Message-Id: <20190911141005.1346-4-michalx.obrembski@intel.com> X-Mailer: git-send-email 2.23.0.windows.1 In-Reply-To: <20190911141005.1346-1-michalx.obrembski@intel.com> References: <20190911141005.1346-1-michalx.obrembski@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v5 3/7] netdev-dpdk: Enable TSO when using multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Tiago Lam TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP network stack to delegate segmentation of a TCP segment to the hardware NIC, thus saving compute resources. This may improve performance significantly for TCP workload in virtualized environments. While a previous commit already added the necesary logic to netdev-dpdk to deal with packets marked for TSO, this set of changes enables TSO by default when using multi-segment mbufs. Thus, to enable TSO on the physical DPDK interfaces, only the following command needs to be issued before starting OvS: ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Co-authored-by: Mark Kavanagh Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam Signed-off-by: Michal Obrembski --- Documentation/automake.mk | 1 + Documentation/topics/dpdk/index.rst | 1 + Documentation/topics/dpdk/tso.rst | 99 +++++++++++++++++++++++++++++++++++++ NEWS | 1 + lib/netdev-dpdk.c | 70 ++++++++++++++++++++++++-- 5 files changed, 167 insertions(+), 5 deletions(-) create mode 100644 Documentation/topics/dpdk/tso.rst diff --git a/Documentation/automake.mk b/Documentation/automake.mk index 2a3214a..5955dd7 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -40,6 +40,7 @@ DOC_SOURCE = \ Documentation/topics/dpdk/index.rst \ Documentation/topics/dpdk/bridge.rst \ Documentation/topics/dpdk/jumbo-frames.rst \ + Documentation/topics/dpdk/tso.rst \ Documentation/topics/dpdk/memory.rst \ Documentation/topics/dpdk/pdump.rst \ Documentation/topics/dpdk/phy.rst \ diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst index cf24a7b..eb2a04d 100644 --- a/Documentation/topics/dpdk/index.rst +++ b/Documentation/topics/dpdk/index.rst @@ -40,4 +40,5 @@ The DPDK Datapath /topics/dpdk/qos /topics/dpdk/pdump /topics/dpdk/jumbo-frames + /topics/dpdk/tso /topics/dpdk/memory diff --git a/Documentation/topics/dpdk/tso.rst b/Documentation/topics/dpdk/tso.rst new file mode 100644 index 0000000..14f8c39 --- /dev/null +++ b/Documentation/topics/dpdk/tso.rst @@ -0,0 +1,99 @@ +.. + Copyright 2018, Red Hat, Inc. + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +=== +TSO +=== + +**Note:** This feature is considered experimental. + +TCP Segmentation Offload (TSO) is a mechanism which allows a TCP/IP stack to +offload the TCP segmentation into hardware, thus saving the cycles that would +be required to perform this same segmentation in software. + +TCP Segmentation Offload (TSO) enables a network stack to delegate segmentation +of an oversized TCP segment to the underlying physical NIC. Offload of frame +segmentation achieves computational savings in the core, freeing up CPU cycles +for more useful work. + +A common use case for TSO is when using virtualization, where traffic that's +coming in from a VM can offload the TCP segmentation, thus avoiding the +fragmentation in software. Additionally, if the traffic is headed to a VM +within the same host further optimization can be expected. As the traffic never +leaves the machine, no MTU needs to be accounted for, and thus no segmentation +and checksum calculations are required, which saves yet more cycles. Only when +the traffic actually leaves the host the segmentation needs to happen, in which +case it will be performed by the egress NIC. + +When using TSO with DPDK, the implementation relies on the multi-segment mbufs +feature, described in :doc:`/topics/dpdk/jumbo-frames`, where each mbuf +contains ~2KiB of the entire packet's data and is linked to the next mbuf that +contains the next portion of data. + +Enabling TSO +~~~~~~~~~~~~ +.. Important:: + + Once multi-segment mbufs is enabled, TSO will be enabled by default, if + there's support for it in the underlying physical NICs attached to + OvS-DPDK. + +When using :doc:`vHost User ports `, TSO may be enabled in one of +two ways, as follows. + +`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest +connection is established, `TSO` is thus advertised to the guest as an +available feature: + +1. QEMU Command Line Parameter:: + + $ sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ + ... + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,\ + csum=on,guest_csum=on,guest_tso4=on,guest_tso6=on\ + ... + +2. Ethtool. Assuming that the guest's OS also supports `TSO`, ethtool can be used to enable same:: + + $ ethtool -K eth0 sg on # scatter-gather is a prerequisite for TSO + $ ethtool -K eth0 tso on + $ ethtool -k eth0 + +To enable TSO in a guest, the underlying NIC must first support `TSO` - consult +your controller's datasheet for compatibility. Secondly, the NIC must have an +associated DPDK Poll Mode Driver (PMD) which supports `TSO`. + +~~~~~~~~~~~ +Limitations +~~~~~~~~~~~ +The current OvS `TSO` implementation supports flat and VLAN networks only (i.e. +no support for `TSO` over tunneled connection [VxLAN, GRE, IPinIP, etc.]). + +Also, as TSO is built on top of multi-segments mbufs, the constraints pointed +out in :doc:`/topics/dpdk/jumbo-frames` also apply for TSO. Thus, some +performance hits might be noticed when running specific functionality, like +the Userspace Connection tracker. And as mentioned in the same section, it is +paramount that a packet's headers is contained within the first mbuf (~2KiB in +size). diff --git a/NEWS b/NEWS index 1278ada..e219822 100644 --- a/NEWS +++ b/NEWS @@ -44,6 +44,7 @@ v2.12.0 - xx xxx xxxx specific subtables based on the miniflow attributes, enhancing the performance of the subtable search. * Add Linux AF_XDP support through a new experimental netdev type "afxdp". + * Add support for TSO (experimental, between DPDK interfaces only). - OVSDB: * OVSDB clients can now resynchronize with clustered servers much more quickly after a brief disconnection, saving bandwidth and CPU time. diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2304f28..7552caa 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -345,7 +345,8 @@ struct ingress_policer { enum dpdk_hw_ol_features { NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, NETDEV_RX_HW_CRC_STRIP = 1 << 1, - NETDEV_RX_HW_SCATTER = 1 << 2 + NETDEV_RX_HW_SCATTER = 1 << 2, + NETDEV_TX_TSO_OFFLOAD = 1 << 3, }; /* @@ -996,8 +997,18 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) return -ENOTSUP; } + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_TSO; + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_CKSUM; + conf.txmode.offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM; + } + txconf = info.default_txconf; txconf.offloads = conf.txmode.offloads; + } else if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + VLOG_WARN("Failed to set Tx TSO offload in %s. Requires option " + "`dpdk-multi-seg-mbufs` to be enabled.", dev->up.name); } conf.intr_conf.lsc = dev->lsc_interrupt_mode; @@ -1114,6 +1125,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM | DEV_RX_OFFLOAD_IPV4_CKSUM; + uint32_t tx_tso_offload_capa = DEV_TX_OFFLOAD_TCP_TSO | + DEV_TX_OFFLOAD_TCP_CKSUM | + DEV_TX_OFFLOAD_IPV4_CKSUM; rte_eth_dev_info_get(dev->port_id, &info); @@ -1140,6 +1154,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER; } + if (dpdk_multi_segment_mbufs) { + if (info.tx_offload_capa & tx_tso_offload_capa) { + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + } else { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + VLOG_WARN("Tx TSO offload is not supported on port " + DPDK_PORT_ID_FMT, dev->port_id); + } + } else { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + } + n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); n_txq = MIN(info.max_tx_queues, dev->up.n_txq); @@ -1727,6 +1753,11 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) } else { smap_add(args, "rx_csum_offload", "false"); } + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + smap_add(args, "tx_tso_offload", "true"); + } else { + smap_add(args, "tx_tso_offload", "false"); + } smap_add(args, "lsc_interrupt_mode", dev->lsc_interrupt_mode ? "true" : "false"); } @@ -2445,9 +2476,21 @@ netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, return cnt; } +/* Filters a DPDK packet by the following criteria: + * - A packet is marked for TSO but the egress dev doesn't + * support TSO; + * - A packet pkt_len is bigger than the pre-defined + * max_packet_len, and the packet isn't marked for TSO. + * + * If any of the above case applies, the packet is then freed + * from 'pkts'. Otherwise the packet is kept in 'pkts' + * untouched. + * + * Returns the number of unfiltered packets left in 'pkts'. + */ static int -netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, - int pkt_cnt) +netdev_dpdk_filter_packet(struct netdev_dpdk *dev, struct rte_mbuf **pkts, + int pkt_cnt) { int i = 0; int cnt = 0; @@ -2457,6 +2500,15 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; + /* Drop TSO packet if there's no TSO support on egress port. */ + if ((pkt->ol_flags & PKT_TX_TCP_SEG) && + !(dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD)) { + VLOG_WARN_RL(&rl, "%s: TSO is disabled on port, TSO packet dropped" + "%" PRIu32 " ", dev->up.name, pkt->pkt_len); + rte_pktmbuf_free(pkt); + continue; + } + if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) { if (!(pkt->ol_flags & PKT_TX_TCP_SEG)) { VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " @@ -2528,7 +2580,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, rte_spinlock_lock(&dev->tx_q[qid].tx_lock); - cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt); + cnt = netdev_dpdk_filter_packet(dev, cur_pkts, cnt); /* Check has QoS has been configured for the netdev */ cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true); dropped = total_pkts - cnt; @@ -2747,7 +2799,7 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, int batch_cnt = dp_packet_batch_size(batch); struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets; - tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt); + tx_cnt = netdev_dpdk_filter_packet(dev, pkts, batch_cnt); tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true); dropped = batch_cnt - tx_cnt; @@ -4445,6 +4497,14 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) dev->tx_q[0].map = 0; } + if (dpdk_multi_segment_mbufs) { + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + + VLOG_DBG("%s: TSO enabled on vhost port", dev->up.name); + } else { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + } + netdev_dpdk_remap_txqs(dev); err = netdev_dpdk_mempool_configure(dev);