From patchwork Thu Jan 10 16:58:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 1023063 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43bC001PLZz9sMp for ; Fri, 11 Jan 2019 03:59:28 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 6C5F5B6A; Thu, 10 Jan 2019 16:58:51 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 80DE23EE for ; Thu, 10 Jan 2019 16:58:50 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 18CD5866 for ; Thu, 10 Jan 2019 16:58:50 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jan 2019 08:58:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,462,1539673200"; d="scan'208";a="290548753" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by orsmga005.jf.intel.com with ESMTP; 10 Jan 2019 08:58:48 -0800 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Thu, 10 Jan 2019 16:58:40 +0000 Message-Id: <1547139522-31154-2-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1547139522-31154-1-git-send-email-tiago.lam@intel.com> References: <1547139522-31154-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Given that multi-segment mbufs might be sent between interfaces that support different capabilities, and may even support different layouts of mbufs, outgoing packets should be validated before sent on the egress interface. Thus, netdev_dpdk_eth_tx_burst() now calls DPDK's rte_eth_tx_prepare() function, if and only multi-segments is enbaled, in order to validate the following (taken from the DPDK documentation), on a device specific manner: - Check if packet meets devices requirements for tx offloads. - Check limitations about number of segments. - Check additional requirements when debug is enabled. - Update and/or reset required checksums when tx offload is set for packet. Signed-off-by: Tiago Lam --- lib/netdev-dpdk.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index d6114ee..77d04fc 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2029,6 +2029,10 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of * 'pkts', even in case of failure. + * In case multi-segment mbufs / TSO is being used, it also prepares. In such + * cases, only the prepared packets will be sent to Tx burst, meaning that if + * an invalid packet appears in 'pkts'[3] only the validated packets in indices + * 0, 1 and 2 will be sent. * * Returns the number of packets that weren't transmitted. */ static inline int @@ -2036,11 +2040,24 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, struct rte_mbuf **pkts, int cnt) { uint32_t nb_tx = 0; + uint16_t nb_prep = cnt; - while (nb_tx != cnt) { + if (dpdk_multi_segment_mbufs) { + /* Validate the burst of packets for Tx. */ + nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); + if (nb_prep != cnt) { + VLOG_WARN_RL(&rl, "%s: Preparing packet tx burst failed (%u/%u " + "packets valid): %s", dev->up.name, nb_prep, cnt, + rte_strerror(rte_errno)); + } + } + + /* Tx the validated burst of packets only. */ + while (nb_tx != nb_prep) { uint32_t ret; - ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx); + ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, + nb_prep - nb_tx); if (!ret) { break; } From patchwork Thu Jan 10 16:58:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 1023064 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43bC0b02F7z9sCs for ; Fri, 11 Jan 2019 03:59:58 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 138ADCA1; Thu, 10 Jan 2019 16:58:55 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id EE885CA1 for ; Thu, 10 Jan 2019 16:58:53 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id F3FD9870 for ; Thu, 10 Jan 2019 16:58:52 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jan 2019 08:58:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,462,1539673200"; d="scan'208";a="290548766" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by orsmga005.jf.intel.com with ESMTP; 10 Jan 2019 08:58:51 -0800 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Thu, 10 Jan 2019 16:58:41 +0000 Message-Id: <1547139522-31154-3-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1547139522-31154-1-git-send-email-tiago.lam@intel.com> References: <1547139522-31154-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Consider packets marked for TSO. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Previously, TSO was being explicity disabled on vhost interfaces, meaning the guests wouldn't have TSO support negotiated in. With TSO negotiated and enabled, packets are now marked for TSO, through the PKT_TX_TCP_SEG flag. In order to deal with this type of packets, a new function, netdev_dpdk_prep_tso_packet(), has been introduced, with the main purpose of setting correctly the l2, l3 and l4 length members of the mbuf struct, and the appropriate ol_flags. This function supports TSO both in IPv4 and IPv6. netdev_dpdk_prep_tso_packet() is then only called when packets are marked with the PKT_TX_TCP_SEG flag, meaning they have been marked for TSO, and when the packet will be traversing the NIC. Additionally, if a packet is marked for TSO but the egress netdev doesn't support it, the packet is dropped. Co-authored-by: Mark Kavanagh Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.h | 14 +++++++ lib/netdev-bsd.c | 11 ++++- lib/netdev-dpdk.c | 121 ++++++++++++++++++++++++++++++++++++++++++----------- lib/netdev-dummy.c | 11 ++++- lib/netdev-linux.c | 15 +++++++ 5 files changed, 146 insertions(+), 26 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 970aaf2..c384416 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -104,6 +104,8 @@ static inline void dp_packet_set_size(struct dp_packet *, uint32_t); static inline uint16_t dp_packet_get_allocated(const struct dp_packet *); static inline void dp_packet_set_allocated(struct dp_packet *, uint16_t); +static inline bool dp_packet_is_tso(struct dp_packet *b); + void *dp_packet_resize_l2(struct dp_packet *, int increment); void *dp_packet_resize_l2_5(struct dp_packet *, int increment); static inline void *dp_packet_eth(const struct dp_packet *); @@ -761,6 +763,12 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s) b->mbuf.buf_len = s; } +static inline bool +dp_packet_is_tso(struct dp_packet *b) +{ + return (b->mbuf.ol_flags & PKT_TX_TCP_SEG) ? true : false; +} + static inline void dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src) { @@ -972,6 +980,12 @@ dp_packet_get_allocated(const struct dp_packet *b) return b->allocated_; } +static inline bool +dp_packet_is_tso(struct dp_packet *b) +{ + return false; +} + static inline void dp_packet_set_allocated(struct dp_packet *b, uint16_t s) { diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c index 278c8a9..5e8c5cc 100644 --- a/lib/netdev-bsd.c +++ b/lib/netdev-bsd.c @@ -700,13 +700,22 @@ netdev_bsd_send(struct netdev *netdev_, int qid OVS_UNUSED, } DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + size_t size = dp_packet_size(packet); + + /* TSO not supported in BSD netdev */ + if (dp_packet_is_tso(packet)) { + VLOG_WARN_RL(&rl, "%s: No TSO enabled on port, TSO packet dropped " + "%" PRIu32 " ", name, size); + + continue; + } + /* We need the whole data to send the packet on the device */ if (!dp_packet_is_linear(packet)) { dp_packet_linearize(packet); } const void *data = dp_packet_data(packet); - size_t size = dp_packet_size(packet); while (!error) { ssize_t retval; diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 77d04fc..ad7223a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1375,14 +1375,16 @@ netdev_dpdk_vhost_construct(struct netdev *netdev) goto out; } - err = rte_vhost_driver_disable_features(dev->vhost_id, - 1ULL << VIRTIO_NET_F_HOST_TSO4 - | 1ULL << VIRTIO_NET_F_HOST_TSO6 - | 1ULL << VIRTIO_NET_F_CSUM); - if (err) { - VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user " - "port: %s\n", name); - goto out; + if (!dpdk_multi_segment_mbufs) { + err = rte_vhost_driver_disable_features(dev->vhost_id, + 1ULL << VIRTIO_NET_F_HOST_TSO4 + | 1ULL << VIRTIO_NET_F_HOST_TSO6 + | 1ULL << VIRTIO_NET_F_CSUM); + if (err) { + VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user " + "client port: %s\n", dev->up.name); + goto out; + } } err = rte_vhost_driver_start(dev->vhost_id); @@ -2027,6 +2029,44 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) rte_free(rx); } +/* Should only be called if PKT_TX_TCP_SEG is set in ol_flags. + * Furthermore, it also sets the PKT_TX_TCP_CKSUM and PKT_TX_IP_CKSUM flags, + * and PKT_TX_IPV4 and PKT_TX_IPV6 in case the packet is IPv4 or IPv6, + * respectiveoly. */ +static void +netdev_dpdk_prep_tso_packet(struct rte_mbuf *mbuf, int mtu) +{ + struct dp_packet *pkt; + struct tcp_header *th; + + pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); + mbuf->l2_len = (char *) dp_packet_l3(pkt) - (char *) dp_packet_eth(pkt); + mbuf->l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt); + th = dp_packet_l4(pkt); + /* There's no layer 4 in the packet */ + if (!th) { + return; + } + mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4; + mbuf->outer_l2_len = 0; + mbuf->outer_l3_len = 0; + + /* Reset packet RX RSS flag to reuse in egress */ + dp_packet_mbuf_rss_flag_reset(pkt); + + if (!(mbuf->ol_flags & PKT_TX_TCP_SEG)) { + return; + } + + /* Prepare packet for egress. */ + mbuf->ol_flags |= PKT_TX_TCP_SEG; + mbuf->ol_flags |= PKT_TX_TCP_CKSUM; + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + + /* Set the size of each TCP segment, based on the MTU of the device */ + mbuf->tso_segsz = mtu - mbuf->l3_len - mbuf->l4_len; +} + /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of * 'pkts', even in case of failure. * In case multi-segment mbufs / TSO is being used, it also prepares. In such @@ -2328,13 +2368,29 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, int cnt = 0; struct rte_mbuf *pkt; + /* Filter oversized packets, unless are marked for TSO. */ for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; + if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) { - VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " max_packet_len %d", - dev->up.name, pkt->pkt_len, dev->max_packet_len); - rte_pktmbuf_free(pkt); - continue; + if (!(pkt->ol_flags & PKT_TX_TCP_SEG)) { + VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " + "max_packet_len %d", + dev->up.name, pkt->pkt_len, dev->max_packet_len); + rte_pktmbuf_free(pkt); + continue; + } else { + if (dev->type != DPDK_DEV_VHOST) { + netdev_dpdk_prep_tso_packet(pkt, dev->mtu); + } + + /* Else the frames will not actually traverse the NIC, but + * rather travel between VMs on the same host. */ + } + } else { + if (dev->type != DPDK_DEV_VHOST) { + netdev_dpdk_prep_tso_packet(pkt, dev->mtu); + } } if (OVS_UNLIKELY(i != cnt)) { @@ -2465,6 +2521,12 @@ dpdk_copy_dp_packet_to_mbuf(struct dp_packet *packet, struct rte_mbuf **head, fmbuf->nb_segs = nb_segs; fmbuf->pkt_len = size; + struct dp_packet *pkt = CONTAINER_OF(fmbuf, struct dp_packet, mbuf); + pkt->l2_pad_size = packet->l2_pad_size; + pkt->l2_5_ofs = packet->l2_5_ofs; + pkt->l3_ofs = packet->l3_ofs; + pkt->l4_ofs = packet->l4_ofs; + dp_packet_mbuf_write(fmbuf, 0, size, dp_packet_data(packet)); return 0; @@ -2499,14 +2561,17 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) for (i = 0; i < cnt; i++) { struct dp_packet *packet = batch->packets[i]; + struct rte_mbuf *pkt = &batch->packets[i]->mbuf; uint32_t size = dp_packet_size(packet); int err = 0; if (OVS_UNLIKELY(size > dev->max_packet_len)) { - VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", - size, dev->max_packet_len); - dropped++; - continue; + if (!(pkt->ol_flags & PKT_TX_TCP_SEG)) { + VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", + size, dev->max_packet_len); + dropped++; + continue; + } } err = dpdk_copy_dp_packet_to_mbuf(packet, &pkts[txcnt], @@ -2522,6 +2587,12 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) } dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet); + if (dev->type != DPDK_DEV_VHOST) { + /* If packet is non-DPDK, at the very least, we need to update the + * mbuf length members, even if TSO is not to be performed. */ + netdev_dpdk_prep_tso_packet(pkts[txcnt], dev->mtu); + } + txcnt++; } @@ -4263,14 +4334,16 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) goto unlock; } - err = rte_vhost_driver_disable_features(dev->vhost_id, - 1ULL << VIRTIO_NET_F_HOST_TSO4 - | 1ULL << VIRTIO_NET_F_HOST_TSO6 - | 1ULL << VIRTIO_NET_F_CSUM); - if (err) { - VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user " - "client port: %s\n", dev->up.name); - goto unlock; + if (!dpdk_multi_segment_mbufs) { + err = rte_vhost_driver_disable_features(dev->vhost_id, + 1ULL << VIRTIO_NET_F_HOST_TSO4 + | 1ULL << VIRTIO_NET_F_HOST_TSO6 + | 1ULL << VIRTIO_NET_F_CSUM); + if (err) { + VLOG_ERR("rte_vhost_driver_disable_features failed for vhost " + "user client port: %s\n", dev->up.name); + goto unlock; + } } err = rte_vhost_driver_start(dev->vhost_id); diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index c56c86b..8452ab6 100644 --- a/lib/netdev-dummy.c +++ b/lib/netdev-dummy.c @@ -1093,13 +1093,22 @@ netdev_dummy_send(struct netdev *netdev, int qid OVS_UNUSED, struct dp_packet *packet; DP_PACKET_BATCH_FOR_EACH(i, packet, batch) { + size_t size = dp_packet_size(packet); + + /* TSO not supported in Dummy netdev */ + if (dp_packet_is_tso(packet)) { + VLOG_WARN("%s: No TSO enabled on port, TSO packet dropped %ld", + netdev_get_name(netdev), size); + + continue; + } + /* We need the whole data to send the packet on the device */ if (!dp_packet_is_linear(packet)) { dp_packet_linearize(packet); } const void *buffer = dp_packet_data(packet); - size_t size = dp_packet_size(packet); if (packet->packet_type != htonl(PT_ETH)) { error = EPFNOSUPPORT; diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index fa79b2a..1476096 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -1379,6 +1379,13 @@ netdev_linux_sock_batch_send(int sock, int ifindex, struct dp_packet *packet; DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + /* TSO not supported in Linux netdev */ + if (dp_packet_is_tso(packet)) { + VLOG_WARN_RL(&rl, "%d: No TSO enabled on port, TSO packet dropped " + "%ld", sock, size); + continue; + } + /* We need the whole data to send the packet on the device */ if (!dp_packet_is_linear(packet)) { dp_packet_linearize(packet); @@ -1437,6 +1444,14 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, ssize_t retval; int error; + /* TSO not supported in Linux netdev */ + if (dp_packet_is_tso(packet)) { + VLOG_WARN_RL(&rl, "%s: No TSO enabled on port, TSO packet dropped " + "%ld", netdev_get_name(netdev_), size); + + continue; + } + /* We need the whole data to send the packet on the device */ if (!dp_packet_is_linear(packet)) { dp_packet_linearize(packet); From patchwork Thu Jan 10 16:58:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 1023065 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43bC1G4J8Jz9sCs for ; Fri, 11 Jan 2019 04:00:34 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id B42BCCC4; Thu, 10 Jan 2019 16:58:56 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A65FCCA8 for ; Thu, 10 Jan 2019 16:58:55 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 9C9EF866 for ; Thu, 10 Jan 2019 16:58:54 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jan 2019 08:58:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,462,1539673200"; d="scan'208";a="290548773" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by orsmga005.jf.intel.com with ESMTP; 10 Jan 2019 08:58:53 -0800 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Thu, 10 Jan 2019 16:58:42 +0000 Message-Id: <1547139522-31154-4-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1547139522-31154-1-git-send-email-tiago.lam@intel.com> References: <1547139522-31154-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v2 3/3] netdev-dpdk: Enable TSO when using multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP network stack to delegate segmentation of a TCP segment to the hardware NIC, thus saving compute resources. This may improve performance significantly for TCP workload in virtualized environments. While a previous commit already added the necesary logic to netdev-dpdk to deal with packets marked for TSO, this set of changes enables TSO by default when using multi-segment mbufs. Thus, to enable TSO on the physical DPDK interfaces, only the following command needs to be issued before starting OvS: ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Co-authored-by: Mark Kavanagh Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- Documentation/automake.mk | 1 + Documentation/topics/dpdk/index.rst | 1 + Documentation/topics/dpdk/tso.rst | 111 ++++++++++++++++++++++++++++++++++++ NEWS | 1 + lib/netdev-dpdk.c | 63 +++++++++++++++++--- 5 files changed, 168 insertions(+), 9 deletions(-) create mode 100644 Documentation/topics/dpdk/tso.rst diff --git a/Documentation/automake.mk b/Documentation/automake.mk index 082438e..a20deb8 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -39,6 +39,7 @@ DOC_SOURCE = \ Documentation/topics/dpdk/index.rst \ Documentation/topics/dpdk/bridge.rst \ Documentation/topics/dpdk/jumbo-frames.rst \ + Documentation/topics/dpdk/tso.rst \ Documentation/topics/dpdk/memory.rst \ Documentation/topics/dpdk/pdump.rst \ Documentation/topics/dpdk/phy.rst \ diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst index cf24a7b..eb2a04d 100644 --- a/Documentation/topics/dpdk/index.rst +++ b/Documentation/topics/dpdk/index.rst @@ -40,4 +40,5 @@ The DPDK Datapath /topics/dpdk/qos /topics/dpdk/pdump /topics/dpdk/jumbo-frames + /topics/dpdk/tso /topics/dpdk/memory diff --git a/Documentation/topics/dpdk/tso.rst b/Documentation/topics/dpdk/tso.rst new file mode 100644 index 0000000..503354f --- /dev/null +++ b/Documentation/topics/dpdk/tso.rst @@ -0,0 +1,111 @@ +.. + Copyright 2018, Red Hat, Inc. + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +=== +TSO +=== + +.. versionadded:: 2.11.0 + +TCP Segmentation Offload (TSO) is a mechanism which allows a TCP/IP stack to +offload the TCP segmentation into hardware, thus saving the cycles that would +be required to perform this same segmentation in Software. + +TCP Segmentation Offload (TSO) enables a network stack to delegate segmentation +of an oversized TCP segment to the underlying physical NIC. Offload of frame +segmentation achieves computational savings in the core, freeing up CPU cycles +for more useful work. + +A common use case for TSO is when using virtualization, where traffic that's +coming in from a VM can offload the TCP segmentation, thus avoiding the +fragmentation in Software. Additionally, if the traffic is headed to a VM +within the same host further optimization can be expected. As the traffic never +leaves the machine, no MTU needs to be accounted for, and thus no segmentation +and checksum calculations are required, which saves yet more cycles. Only when +the traffic actually leaves the host the segmentation needs to happen, in which +case it will be performed by the egress NIC. + +When using TSO with DPDK, the implementation relies on the multi-segment mbufs +feature, described in :doc:`/topics/dpdk/jumbo-frames`, where each mbuf +contains ~2KiB of the entire packet's data and is linked to the next mbuf that +contains the next portion of data. + +Enabling TSO +~~~~~~~~~~~~ +Once multi-segment mbufs is enabled, TSO will be enabled by default, if there's +support for it in the underlying physical NICs attached to OvS-DPDK. + +When using :doc:`vHost User ports `, TSO may be enabled in one of +two ways, as follows. + +`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest +connection is established, `TSO` is thus advertised to the guest as an +available feature: + + 1. QEMU Command Line Parameter: + ``` + sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ + ... + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,\ + mrg_rxbuf=on,csum=on,gso=on,guest_csum=on,guest_tso4=on,\ + guest_tso6=on,guest_ecn=on \ + ... + ``` + + 2. ethtool + +Assuming that the guest's OS also supports `TSO`, ethtool can be used to enable +same: + + ``` + ethtool -K eth0 sg on # scatter-gather is a prerequisite for TSO + ethtool -K eth0 tso on + ethtool -k eth0 + ``` + + Note: In both methods, `mergeable buffers` are required: + ``` + sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ + ... + mrg_rxbuf=on,\ + ... + ``` + +To enable TSO in a guest, the underlying NIC must first support `TSO` - consult +your controller's datasheet for compatibility. Secondly, the NIC must have an +associated DPDK Poll Mode Driver (PMD) which supports `TSO`. + +~~~~~~~~~~~ +Limitations +~~~~~~~~~~~ +The current OvS `TSO` implementation supports flat and VLAN networks only (i.e. +no support for `TSO` over tunneled connection [VxLAN, GRE, IPinIP, etc.]). + +Also, as TSO is built on top of multi-segments mbufs, the constraints pointed +out in :doc:`/topics/dpdk/jumbo-frames` also apply for TSO. Thus, some +performance hits might be noticed when running specific functionalitry, like +the Userspace Connection tracker. And as mentioned in the same section, it is +paramount that a packet's headers is contained within the first mbuf (~2KiB in +size). diff --git a/NEWS b/NEWS index 98f5a9b..c7cfd17 100644 --- a/NEWS +++ b/NEWS @@ -23,6 +23,7 @@ Post-v2.10.0 * Add option for simple round-robin based Rxq to PMD assignment. It can be set with pmd-rxq-assign. * Add support for DPDK 18.11 + * Add support for TSO (between DPDK interfaces only). - Add 'symmetric_l3' hash function. - OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured. - ovs-vswitchd: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ad7223a..7af37ee 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -374,7 +374,8 @@ struct ingress_policer { enum dpdk_hw_ol_features { NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, NETDEV_RX_HW_CRC_STRIP = 1 << 1, - NETDEV_RX_HW_SCATTER = 1 << 2 + NETDEV_RX_HW_SCATTER = 1 << 2, + NETDEV_TX_TSO_OFFLOAD = 1 << 3, }; /* @@ -1005,19 +1006,26 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) /* Multi-segment-mbuf-specific setup. */ if (dpdk_multi_segment_mbufs) { if (info.tx_offload_capa & DEV_TX_OFFLOAD_MULTI_SEGS) { - /* Enable multi-seg mbufs. DPDK PMDs typically attempt to use - * simple or vectorized transmit functions, neither of which are - * compatible with multi-segment mbufs. */ + /* DPDK PMDs typically attempt to use simple or vectorized + * transmit functions, neither of which are compatible with + * multi-segment mbufs. Ensure that these are disabled when + * multi-segment mbufs are enabled. + */ conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS; - } else { - VLOG_WARN("Interface %s doesn't support multi-segment mbufs", - dev->up.name); - conf.txmode.offloads &= ~DEV_TX_OFFLOAD_MULTI_SEGS; + } + + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_TSO; + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_CKSUM; + conf.txmode.offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM; } txconf = info.default_txconf; - //txconf.txq_flags = ETH_TXQ_FLAGS_IGNORE; txconf.offloads = conf.txmode.offloads; + } else if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + VLOG_WARN("Failed to set Tx TSO offload in %s. Requires option " + "`dpdk-multi-seg-mbufs` to be enabled.", dev->up.name); } conf.intr_conf.lsc = dev->lsc_interrupt_mode; @@ -1134,6 +1142,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM | DEV_RX_OFFLOAD_IPV4_CKSUM; + uint32_t tx_tso_offload_capa = DEV_TX_OFFLOAD_TCP_TSO | + DEV_TX_OFFLOAD_TCP_CKSUM | + DEV_TX_OFFLOAD_IPV4_CKSUM; rte_eth_dev_info_get(dev->port_id, &info); @@ -1160,6 +1171,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER; } + if (dpdk_multi_segment_mbufs) { + if (info.tx_offload_capa & tx_tso_offload_capa) { + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + } else { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + VLOG_WARN("Tx TSO offload is not supported on port " + DPDK_PORT_ID_FMT, dev->port_id); + } + } else { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + } + n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); n_txq = MIN(info.max_tx_queues, dev->up.n_txq); @@ -1684,6 +1707,11 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) } else { smap_add(args, "rx_csum_offload", "false"); } + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + smap_add(args, "tx_tso_offload", "true"); + } else { + smap_add(args, "tx_tso_offload", "false"); + } smap_add(args, "lsc_interrupt_mode", dev->lsc_interrupt_mode ? "true" : "false"); } @@ -2372,6 +2400,15 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; + /* Drop TSO packet if there's no TSO support on egress port */ + if ((pkt->ol_flags & PKT_TX_TCP_SEG) && + !(dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD)) { + VLOG_WARN_RL(&rl, "%s: TSO is disabled on port, TSO packet dropped" + "%" PRIu32 " ", dev->up.name, pkt->pkt_len); + rte_pktmbuf_free(pkt); + continue; + } + if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) { if (!(pkt->ol_flags & PKT_TX_TCP_SEG)) { VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " @@ -4245,6 +4282,14 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) dev->tx_q[0].map = 0; } + if (dpdk_multi_segment_mbufs) { + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + + VLOG_DBG("%s: TSO enabled on vhost port", dev->up.name); + } else { + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + } + netdev_dpdk_remap_txqs(dev); err = netdev_dpdk_mempool_configure(dev);