[{"id":1763438,"web_url":"http://patchwork.ozlabs.org/comment/1763438/","msgid":"<74F120C019F4A64C9B78E802F6AD4CC278E024CD@IRSMSX106.ger.corp.intel.com>","list_archive_url":null,"date":"2017-09-05T15:11:34","subject":"Re: [ovs-dev] [PATCH v4] netdev-dpdk: Implement TCP/UDP TX cksum in\n\tovs-dpdk side","submitter":{"id":67255,"url":"http://patchwork.ozlabs.org/api/people/67255/","name":"Ciara Loftus","email":"ciara.loftus@intel.com"},"content":"> \n> Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.\n> So L4 packets's cksum were calculated in VM side but performance is not\n> good.\n> Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput in\n> VM->phy->phy->VM situation. And it makes virtio-net frontend-driver\n> support NETIF_F_SG(feature scatter-gather) as well.\n> \n> Signed-off-by: Zhenyu Gao <sysugaozhenyu@gmail.com>\n> ---\n> \n> Here is some performance number:\n\nHi Zhenyu,\n\nThanks for the code changes since v3.\nI tested a VM to VM case using iperf and observed a performance degradation when the tx cksum was offloaded to the host:\n\nchecksum in VM\n0.0-30.0 sec  10.9 GBytes  3.12 Gbits/sec\n0.0-30.0 sec  10.9 GBytes  3.11 Gbits/sec\n0.0-30.0 sec  11.0 GBytes  3.16 Gbits/sec\n\nchecksum in ovs dpdk\n0.0-30.0 sec  7.52 GBytes  2.15 Gbits/sec\n0.0-30.0 sec  7.12 GBytes  2.04 Gbits/sec\n0.0-30.0 sec  8.17 GBytes  2.34 Gbits/sec\n\nI think for this feature to enabled we need performance to be roughly the same or better for all use cases. For now the gap here is too big I think.\n\nIf you wish to reproduce:\n\n1 host, 2 VMs each with 1 vhost port and flows set up to switch packets from each vhost port to the other.\n\nVM1:\nifconfig eth1 1.1.1.1/24 up\nethtool -K eth2 tx on/off\niperf -c 1.1.1.2 -i 1 -t 30\n\nVM2:\nifconfig eth1 1.1.1.2/24 up\nethtool -K eth1 tx on/off\niperf -s -i 1\n\nThanks,\nCiara\n\n> \n> Setup:\n> \n>  qperf client\n> +---------+\n> |   VM    |\n> +---------+\n>      |\n>      |                          qperf server\n> +--------------+              +------------+\n> | vswitch+dpdk |              | bare-metal |\n> +--------------+              +------------+\n>        |                            |\n>        |                            |\n>       pNic---------PhysicalSwitch----\n> \n> do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on'\n> in VM side.\n>                       It offload cksum job to ovs-dpdk side.\n> \n> do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM\n> side.\n>                 VM calculate cksum for tcp/udp packets.\n> \n> We can see huge improvment in TCP throughput if we leverage ovs-dpdk\n> cksum.\n> \n> [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2  host-qperf-server01\n> tcp_bw tcp_lat udp_bw udp_lat\n>   do cksum in ovs-dpdk          do cksum in VM             without this patch\n> tcp_bw:\n>     bw  =  1.9 MB/sec         bw  =  1.92 MB/sec        bw  =  1.95 MB/sec\n> tcp_bw:\n>     bw  =  3.97 MB/sec        bw  =  3.99 MB/sec        bw  =  3.98 MB/sec\n> tcp_bw:\n>     bw  =  7.75 MB/sec        bw  =  7.79 MB/sec        bw  =  7.89 MB/sec\n> tcp_bw:\n>     bw  =  14.7 MB/sec        bw  =  14.7 MB/sec        bw  =  14.9 MB/sec\n> tcp_bw:\n>     bw  =  27.7 MB/sec        bw  =  27.4 MB/sec        bw  =  28 MB/sec\n> tcp_bw:\n>     bw  =  51.1 MB/sec        bw  =  51.3 MB/sec        bw  =  51.8 MB/sec\n> tcp_bw:\n>     bw  =  86.2 MB/sec        bw  =  84.4 MB/sec        bw  =  87.6 MB/sec\n> tcp_bw:\n>     bw  =  141 MB/sec         bw  =  142 MB/sec        bw  =  141 MB/sec\n> tcp_bw:\n>     bw  =  203 MB/sec         bw  =  201 MB/sec        bw  =  211 MB/sec\n> tcp_bw:\n>     bw  =  267 MB/sec         bw  =  250 MB/sec        bw  =  260 MB/sec\n> tcp_bw:\n>     bw  =  324 MB/sec         bw  =  295 MB/sec        bw  =  302 MB/sec\n> tcp_bw:\n>     bw  =  397 MB/sec         bw  =  363 MB/sec        bw  =  347 MB/sec\n> tcp_bw:\n>     bw  =  765 MB/sec         bw  =  510 MB/sec        bw  =  383 MB/sec\n> tcp_bw:\n>     bw  =  850 MB/sec         bw  =  710 MB/sec        bw  =  417 MB/sec\n> tcp_bw:\n>     bw  =  1.09 GB/sec        bw  =  860 MB/sec        bw  =  444 MB/sec\n> tcp_bw:\n>     bw  =  1.17 GB/sec        bw  =  979 MB/sec        bw  =  447 MB/sec\n> tcp_bw:\n>     bw  =  1.17 GB/sec        bw  =  1.07 GB/sec       bw  =  462 MB/sec\n> tcp_lat:\n>     latency  =  29.1 us       latency  =  28.7 us        latency  =  29.1 us\n> tcp_lat:\n>     latency  =  29 us         latency  =  28.8 us        latency  =  29 us\n> tcp_lat:\n>     latency  =  29 us         latency  =  28.8 us        latency  =  29 us\n> tcp_lat:\n>     latency  =  29 us         latency  =  28.9 us        latency  =  29 us\n> tcp_lat:\n>     latency  =  29.2 us       latency  =  28.9 us        latency  =  29.1 us\n> tcp_lat:\n>     latency  =  29.1 us       latency  =  29.1 us        latency  =  29.1 us\n> tcp_lat:\n>     latency  =  29.5 us       latency  =  29.5 us        latency  =  29.5 us\n> tcp_lat:\n>     latency  =  29.8 us       latency  =  29.8 us        latency  =  29.9 us\n> tcp_lat:\n>     latency  =  30.7 us       latency  =  30.7 us        latency  =  30.7 us\n> tcp_lat:\n>     latency  =  47.1 us       latency  =  46.2 us        latency  =  47.1 us\n> tcp_lat:\n>     latency  =  52.1 us       latency  =  52.3 us        latency  =  53.3 us\n> tcp_lat:\n>     latency  =  44 us         latency  =  43.8 us        latency  =  43.2 us\n> tcp_lat:\n>     latency  =  50 us         latency  =  46.6 us        latency  =  47.8 us\n> tcp_lat:\n>      latency  =  79.2 us      latency  =  77.9 us        latency  =  78.9 us\n> tcp_lat:\n>     latency  =  82.3 us       latency  =  81.7 us        latency  =  82.2 us\n> tcp_lat:\n>     latency  =  96.7 us       latency  =  90.8 us        latency  =  127 us\n> tcp_lat:\n>     latency  =  215 us        latency  =  177 us        latency  =  225 us\n> udp_bw:\n>     send_bw  =  422 KB/sec        send_bw  =  415 KB/sec        send_bw  =  405\n> KB/sec\n>     recv_bw  =  402 KB/sec        recv_bw  =  404 KB/sec        recv_bw  =  403\n> KB/sec\n> udp_bw:\n>     send_bw  =  845 KB/sec        send_bw  =  835 KB/sec        send_bw  =  802\n> KB/sec\n>     recv_bw  =  831 KB/sec        recv_bw  =  804 KB/sec        recv_bw  =  802\n> KB/sec\n> udp_bw:\n>     send_bw  =  1.69 MB/sec       send_bw  =  1.66 MB/sec        send_bw  =  1.62\n> MB/sec\n>     recv_bw  =  1.45 MB/sec       recv_bw  =  1.63 MB/sec        recv_bw  =   1.6\n> MB/sec\n> udp_bw:\n>     send_bw  =  3.38 MB/sec       send_bw  =  3.33 MB/sec        send_bw  =  3.24\n> MB/sec\n>     recv_bw  =  3.32 MB/sec       recv_bw  =  3.25 MB/sec        recv_bw  =  3.24\n> MB/sec\n> udp_bw:\n>     send_bw  =  6.76 MB/sec       send_bw  =  6.63 MB/sec        send_bw  =  6.47\n> MB/sec\n>     recv_bw  =  6.42 MB/sec       recv_bw  =  5.59 MB/sec        recv_bw  =  6.45\n> MB/sec\n> udp_bw:\n>     send_bw  =  13.5 MB/sec       send_bw  =  13.3 MB/sec        send_bw  =  13\n> MB/sec\n>     recv_bw  =  13.4 MB/sec       recv_bw  =  12.1 MB/sec        recv_bw  =  13\n> MB/sec\n> udp_bw:\n>     send_bw  =    27 MB/sec       send_bw  =  26.5 MB/sec        send_bw  =  25.9\n> MB/sec\n>     recv_bw  =  26.4 MB/sec       recv_bw  =  21.5 MB/sec        recv_bw  =  25.9\n> MB/sec\n> udp_bw:\n>     send_bw  =  53.8 MB/sec       send_bw  =  52.9 MB/sec        send_bw  =  51.7\n> MB/sec\n>     recv_bw  =  49.1 MB/sec       recv_bw  =  47.6 MB/sec        recv_bw  =  51.1\n> MB/sec\n> udp_bw:\n>     send_bw  =   108 MB/sec       send_bw  =  105 MB/sec         send_bw  =  102\n> MB/sec\n>     recv_bw  =  91.1 MB/sec       recv_bw  =  101 MB/sec         recv_bw  =  100\n> MB/sec\n> udp_bw:\n>     send_bw  =  212 MB/sec        send_bw  =  208 MB/sec         send_bw  =  203\n> MB/sec\n>     recv_bw  =  204 MB/sec        recv_bw  =  204 MB/sec         recv_bw  =  169\n> MB/sec\n> udp_bw:\n>     send_bw  =  414 MB/sec        send_bw  =  407 MB/sec         send_bw  =  398\n> MB/sec\n>     recv_bw  =  403 MB/sec        recv_bw  =  312 MB/sec         recv_bw  =  343\n> MB/sec\n> udp_bw:\n>     send_bw  =  555 MB/sec        send_bw  =  561 MB/sec         send_bw  =  557\n> MB/sec\n>     recv_bw  =  354 MB/sec        recv_bw  =  368 MB/sec         recv_bw  =  360\n> MB/sec\n> udp_bw:\n>     send_bw  =  877 MB/sec        send_bw  =  880 MB/sec         send_bw  =  868\n> MB/sec\n>     recv_bw  =  551 MB/sec        recv_bw  =  542 MB/sec         recv_bw  =  562\n> MB/sec\n> udp_bw:\n>     send_bw  =  1.1 GB/sec        send_bw  =  1.08 GB/sec        send_bw  =  1.09\n> GB/sec\n>     recv_bw  =  805 MB/sec        recv_bw  =   785 MB/sec        recv_bw  =   766\n> MB/sec\n> udp_bw:\n>     send_bw  =  1.21 GB/sec       send_bw  =  1.19 GB/sec        send_bw  =  1.22\n> GB/sec\n>     recv_bw  =   899 MB/sec       recv_bw  =   715 MB/sec        recv_bw  =   700\n> MB/sec\n> udp_bw:\n>     send_bw  =  1.31 GB/sec       send_bw  =  1.31 GB/sec        send_bw  =  1.31\n> GB/sec\n>     recv_bw  =   614 MB/sec       recv_bw  =   622 MB/sec        recv_bw  =   661\n> MB/sec\n> udp_bw:\n>     send_bw  =  0 bytes/sec       send_bw  =  0 bytes/sec        send_bw  =  0\n> bytes/sec\n>     recv_bw  =  0 bytes/sec       recv_bw  =  0 bytes/sec        recv_bw  =  0\n> bytes/sec\n> udp_lat:\n>     latency  =  25.9 us        latency  =  26.5 us        latency  =  26.5 us\n> udp_lat:\n>     latency  =  26.3 us        latency  =  26.4 us        latency  =  26.5 us\n> udp_lat:\n>     latency  =  26 us          latency  =  26.4 us        latency  =  26.6 us\n> udp_lat:\n>     latency  =  26.1 us        latency  =  26.2 us        latency  =  26.4 us\n> udp_lat:\n>     latency  =  26.3 us        latency  =  26.5 us        latency  =  26.7 us\n> udp_lat:\n>     latency  =  26.3 us        latency  =  26.4 us        latency  =  26.5 us\n> udp_lat:\n>     latency  =  26.3 us        latency  =  26.7 us        latency  =  26.9 us\n> udp_lat:\n>     latency  =  27.1 us        latency  =  27.1 us        latency  =  27.2 us\n> udp_lat:\n>     latency  =  27.5 us        latency  =  27.8 us        latency  =  28.1 us\n> udp_lat:\n>     latency  =  28.7 us        latency  =  28.9 us        latency  =  29.1 us\n> udp_lat:\n>     latency  =  30.4 us        latency  =  30.5 us        latency  =  30.9 us\n> udp_lat:\n>     latency  =  41.2 us        latency  =  41.3 us        latency  =  41.1 us\n> udp_lat:\n>     latency  =  41.3 us        latency  =  41.5 us        latency  =  41.5 us\n> udp_lat:\n>     latency  =  64.4 us        latency  =  64.5 us        latency  =  64.2 us\n> udp_lat:\n>     latency  =  71.5 us        latency  =  71.5 us        latency  =  71.7 us\n> udp_lat:\n>     latency  =  120 us         latency  =  120 us         latency  =  120 us\n> udp_lat:\n>     latency  =  0 ns           latency  =  0 ns           latency  =  0 ns\n> \n>  lib/netdev-dpdk.c | 79\n> ++++++++++++++++++++++++++++++++++++++++++++++++++++---\n>  1 file changed, 75 insertions(+), 4 deletions(-)\n> \n> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c\n> index f58e9be..0f91def 100644\n> --- a/lib/netdev-dpdk.c\n> +++ b/lib/netdev-dpdk.c\n> @@ -31,6 +31,7 @@\n>  #include <rte_errno.h>\n>  #include <rte_eth_ring.h>\n>  #include <rte_ethdev.h>\n> +#include <rte_ip.h>\n>  #include <rte_malloc.h>\n>  #include <rte_mbuf.h>\n>  #include <rte_meter.h>\n> @@ -992,8 +993,7 @@ netdev_dpdk_vhost_construct(struct netdev\n> *netdev)\n> \n>      err = rte_vhost_driver_disable_features(dev->vhost_id,\n>                                  1ULL << VIRTIO_NET_F_HOST_TSO4\n> -                                | 1ULL << VIRTIO_NET_F_HOST_TSO6\n> -                                | 1ULL << VIRTIO_NET_F_CSUM);\n> +                                | 1ULL << VIRTIO_NET_F_HOST_TSO6);\n>      if (err) {\n>          VLOG_ERR(\"rte_vhost_driver_disable_features failed for vhost user \"\n>                   \"port: %s\\n\", name);\n> @@ -1455,6 +1455,76 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq\n> *rxq)\n>      rte_free(rx);\n>  }\n> \n> +static inline void\n> +netdev_dpdk_vhost_refill_l4_cksum(const char *data, struct dp_packet\n> *pkt,\n> +                                  uint8_t l4_proto, bool is_ipv4)\n> +{\n> +    void *l3hdr = (void *)(data + pkt->mbuf.l2_len);\n> +\n> +    if (l4_proto == IPPROTO_TCP) {\n> +        struct tcp_header *tcp_hdr = (struct tcp_header *)(data +\n> +                                         pkt->mbuf.l2_len + pkt->mbuf.l3_len);\n> +\n> +        tcp_hdr->tcp_csum = 0;\n> +        if (is_ipv4) {\n> +            tcp_hdr->tcp_csum = rte_ipv4_udptcp_cksum(l3hdr, tcp_hdr);\n> +        } else {\n> +            tcp_hdr->tcp_csum = rte_ipv6_udptcp_cksum(l3hdr, tcp_hdr);\n> +        }\n> +    } else if (l4_proto == IPPROTO_UDP) {\n> +        struct udp_header *udp_hdr = (struct udp_header *)(data +\n> +                                         pkt->mbuf.l2_len + pkt->mbuf.l3_len);\n> +        /* do not recalculate udp cksum if it was 0 */\n> +        if (udp_hdr->udp_csum != 0) {\n> +            udp_hdr->udp_csum = 0;\n> +            if (is_ipv4) {\n> +                /*do not calculate udp cksum if it was a fragment IP*/\n> +                if (IP_IS_FRAGMENT(((struct ipv4_hdr *)l3hdr)->\n> +                                      fragment_offset)) {\n> +                    return;\n> +                }\n> +\n> +                udp_hdr->udp_csum = rte_ipv4_udptcp_cksum(l3hdr, udp_hdr);\n> +            } else {\n> +                udp_hdr->udp_csum = rte_ipv6_udptcp_cksum(l3hdr, udp_hdr);\n> +            }\n> +        }\n> +    }\n> +\n> +    pkt->mbuf.ol_flags &= ~PKT_TX_L4_MASK;\n> +}\n> +\n> +static inline void\n> +netdev_dpdk_vhost_tx_csum(struct dp_packet **pkts, int pkt_cnt)\n> +{\n> +    int i;\n> +\n> +    for (i = 0; i < pkt_cnt; i++) {\n> +        ovs_be16 dl_type;\n> +        struct dp_packet *pkt = (struct dp_packet *)pkts[i];\n> +        const char *data = dp_packet_data(pkt);\n> +        void *l3hdr = (char *)(data + pkt->mbuf.l2_len);\n> +\n> +        if (!(pkt->mbuf.ol_flags & PKT_TX_L4_MASK)) {\n> +            /* DPDK vhost tags PKT_TX_L4_MASK if a L4 packet need cksum. */\n> +            continue;\n> +        }\n> +\n> +        if (OVS_UNLIKELY(pkt->mbuf.l2_len == 0 || pkt->mbuf.l3_len == 0)) {\n> +            continue;\n> +        }\n> +\n> +        dl_type = *(ovs_be16 *)(data + pkt->mbuf.l2_len - sizeof dl_type);\n> +        if (dl_type == htons(ETH_TYPE_IP)) {\n> +            uint8_t l4_proto = ((struct ipv4_hdr *)l3hdr)->next_proto_id;\n> +            netdev_dpdk_vhost_refill_l4_cksum(data, pkt, l4_proto, true);\n> +        } else if (dl_type == htons(ETH_TYPE_IPV6)) {\n> +            uint8_t l4_proto = ((struct ipv6_hdr *)l3hdr)->proto;\n> +            netdev_dpdk_vhost_refill_l4_cksum(data, pkt, l4_proto, false);\n> +        }\n> +    }\n> +}\n> +\n>  /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes ownership of\n>   * 'pkts', even in case of failure.\n>   *\n> @@ -1646,6 +1716,8 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq\n> *rxq,\n> \n>      dp_packet_batch_init_cutlen(batch);\n>      batch->count = (int) nb_rx;\n> +    netdev_dpdk_vhost_tx_csum(batch->packets, batch->count);\n> +\n>      return 0;\n>  }\n> \n> @@ -3288,8 +3360,7 @@ netdev_dpdk_vhost_client_reconfigure(struct\n> netdev *netdev)\n> \n>          err = rte_vhost_driver_disable_features(dev->vhost_id,\n>                                      1ULL << VIRTIO_NET_F_HOST_TSO4\n> -                                    | 1ULL << VIRTIO_NET_F_HOST_TSO6\n> -                                    | 1ULL << VIRTIO_NET_F_CSUM);\n> +                                    | 1ULL << VIRTIO_NET_F_HOST_TSO6);\n>          if (err) {\n>              VLOG_ERR(\"rte_vhost_driver_disable_features failed for vhost user \"\n>                       \"client port: %s\\n\", dev->up.name);\n> --\n> 1.8.3.1","headers":{"Return-Path":"<ovs-dev-bounces@openvswitch.org>","X-Original-To":["incoming@patchwork.ozlabs.org","dev@openvswitch.org"],"Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","ovs-dev@mail.linuxfoundation.org"],"Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=openvswitch.org\n\t(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;\n\tenvelope-from=ovs-dev-bounces@openvswitch.org;\n\treceiver=<UNKNOWN>)","Received":["from mail.linuxfoundation.org (mail.linuxfoundation.org\n\t[140.211.169.12])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xmqtn4tl2z9s76\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed,  6 Sep 2017 01:11:44 +1000 (AEST)","from mail.linux-foundation.org (localhost [127.0.0.1])\n\tby mail.linuxfoundation.org (Postfix) with ESMTP id C751DA70;\n\tTue,  5 Sep 2017 15:11:41 +0000 (UTC)","from smtp1.linuxfoundation.org (smtp1.linux-foundation.org\n\t[172.17.192.35])\n\tby mail.linuxfoundation.org (Postfix) with ESMTPS id 2894A989\n\tfor <dev@openvswitch.org>; Tue,  5 Sep 2017 15:11:41 +0000 (UTC)","from mga09.intel.com (mga09.intel.com [134.134.136.24])\n\tby smtp1.linuxfoundation.org (Postfix) with ESMTPS id 34EBD432\n\tfor <dev@openvswitch.org>; Tue,  5 Sep 2017 15:11:40 +0000 (UTC)","from orsmga004.jf.intel.com ([10.7.209.38])\n\tby orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t05 Sep 2017 08:11:39 -0700","from irsmsx151.ger.corp.intel.com ([163.33.192.59])\n\tby orsmga004.jf.intel.com with ESMTP; 05 Sep 2017 08:11:35 -0700","from irsmsx112.ger.corp.intel.com (10.108.20.5) by\n\tIRSMSX151.ger.corp.intel.com (163.33.192.59) with Microsoft SMTP\n\tServer (TLS) id 14.3.319.2; Tue, 5 Sep 2017 16:11:35 +0100","from irsmsx106.ger.corp.intel.com ([169.254.8.36]) by\n\tirsmsx112.ger.corp.intel.com ([169.254.1.110]) with mapi id\n\t14.03.0319.002; Tue, 5 Sep 2017 16:11:34 +0100"],"X-Greylist":"domain auto-whitelisted by SQLgrey-1.7.6","X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.41,480,1498546800\"; d=\"scan'208\";a=\"125748410\"","From":"\"Loftus, Ciara\" <ciara.loftus@intel.com>","To":"Zhenyu Gao <sysugaozhenyu@gmail.com>","Thread-Topic":"[PATCH v4] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk\n\tside","Thread-Index":"AQHTI6i4tIBHZMg/4Ui7u6JjZohezqKmZtkQ","Date":"Tue, 5 Sep 2017 15:11:34 +0000","Message-ID":"<74F120C019F4A64C9B78E802F6AD4CC278E024CD@IRSMSX106.ger.corp.intel.com>","References":"<20170902050234.17169-1-sysugaozhenyu@gmail.com>","In-Reply-To":"<20170902050234.17169-1-sysugaozhenyu@gmail.com>","Accept-Language":"en-GB, en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-titus-metadata-40":"eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYjE4OTIxNGUtNWU1MS00ZTgxLTllNWQtNTJlZTM2ZGZlMGM5IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6ImZBSTJqSDVQeXNzSGlxOUVWNXkzck9kZEUyd3BSV29QUUlpVXFGUUwwaWM9In0=","x-ctpclassification":"CTP_IC","dlp-product":"dlpe-windows","dlp-version":"11.0.0.116","dlp-reaction":"no-action","x-originating-ip":"[163.33.239.181]","MIME-Version":"1.0","X-Spam-Status":"No, score=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,\n\tRP_MATCHES_RCVD autolearn=disabled version=3.3.1","X-Spam-Checker-Version":"SpamAssassin 3.3.1 (2010-03-16) on\n\tsmtp1.linux-foundation.org","Cc":"\"dev@openvswitch.org\" <dev@openvswitch.org>","Subject":"Re: [ovs-dev] [PATCH v4] netdev-dpdk: Implement TCP/UDP TX cksum in\n\tovs-dpdk side","X-BeenThere":"ovs-dev@openvswitch.org","X-Mailman-Version":"2.1.12","Precedence":"list","List-Id":"<ovs-dev.openvswitch.org>","List-Unsubscribe":"<https://mail.openvswitch.org/mailman/options/ovs-dev>,\n\t<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>","List-Archive":"<http://mail.openvswitch.org/pipermail/ovs-dev/>","List-Post":"<mailto:ovs-dev@openvswitch.org>","List-Help":"<mailto:ovs-dev-request@openvswitch.org?subject=help>","List-Subscribe":"<https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,\n\t<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"7bit","Sender":"ovs-dev-bounces@openvswitch.org","Errors-To":"ovs-dev-bounces@openvswitch.org"}},{"id":1764043,"web_url":"http://patchwork.ozlabs.org/comment/1764043/","msgid":"<CAHzJG=_rDZ+-8XYGKx=wvmMLTSuVyE3UWR075xBN11QwG7DY6A@mail.gmail.com>","list_archive_url":null,"date":"2017-09-06T11:43:30","subject":"Re: [ovs-dev] [PATCH v4] netdev-dpdk: Implement TCP/UDP TX cksum in\n\tovs-dpdk side","submitter":{"id":70513,"url":"http://patchwork.ozlabs.org/api/people/70513/","name":"Gao Zhenyu","email":"sysugaozhenyu@gmail.com"},"content":"Thanks for your testing and I reproduce it on my own machine.\n\nI did the testing:\n\n10% times to get about 8.5Gb/s thoughput when \"ethtool -K eth0 tx on\" , 90%\nsituation get 3.5Gb/s.\n10% times to get about 3.5Gb/s thoughput when \"ethtool -K eth0 tx off\", 90%\nsituation I get 8.5Gb/s\n\nAnd this wierd thing always happen in changing tx mode, which just like:\n\n[root@localhost ~]# ethtool -K eth0 tx off\n[root@localhost ~]# ethtool -K eth0 tx on  <---------------------tx cksum\nis on now\n[root@localhost ~]# iperf3 -c 10.100.85.246 -i 1 -t 30\nConnecting to host 10.100.85.246, port 5201\n[  4] local 10.100.85.245 port 56004 connected to 10.100.85.246 port 5201\n[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd\n[  4]   0.00-1.00   sec   821 MBytes  6.89 Gbits/sec  846    214\nKBytes\n[  4]   1.00-2.00   sec  1.00 GBytes  8.63 Gbits/sec  676    305\nKBytes\n[  4]   2.00-3.00   sec  1.00 GBytes  8.62 Gbits/sec  839    402\nKBytes\n[  4]   3.00-4.00   sec  1.01 GBytes  8.69 Gbits/sec  787    403\nKBytes\n[  4]   4.00-5.00   sec   815 MBytes  6.84 Gbits/sec  1190    284\nKBytes\n[  4]   5.00-6.00   sec  1.01 GBytes  8.64 Gbits/sec  1247    547\nKBytes\n[  4]   6.00-7.00   sec  1.01 GBytes  8.65 Gbits/sec  765    260\nKBytes\n[  4]   7.00-8.00   sec  1.01 GBytes  8.65 Gbits/sec  1009    325\nKBytes\n[  4]   8.00-9.00   sec  1.01 GBytes  8.64 Gbits/sec  882    356\nKBytes\n[  4]   9.00-10.00  sec  1.00 GBytes  8.60 Gbits/sec  1102    327\nKBytes\n[  4]  10.00-11.00  sec  1022 MBytes  8.57 Gbits/sec  1250    370\nKBytes\n[  4]  11.00-12.00  sec  1022 MBytes  8.57 Gbits/sec  1128    337\nKBytes\n[  4]  12.00-13.00  sec  1.00 GBytes  8.60 Gbits/sec  923    407\nKBytes\n[  4]  13.00-14.00  sec  1.01 GBytes  8.65 Gbits/sec  678    334\nKBytes\n[  4]  14.00-15.00  sec  1.01 GBytes  8.64 Gbits/sec  1102    291\nKBytes\n[  4]  15.00-16.00  sec  1.01 GBytes  8.64 Gbits/sec  761    385\nKBytes\n[  4]  16.00-17.00  sec  1019 MBytes  8.55 Gbits/sec  1160   8.48\nKBytes\n[  4]  17.00-18.00  sec  1.01 GBytes  8.65 Gbits/sec  1264    516\nKBytes\n[  4]  18.00-19.00  sec  1.00 GBytes  8.60 Gbits/sec  1010    387\nKBytes\n[  4]  19.00-20.00  sec  1.01 GBytes  8.64 Gbits/sec  1047    445\nKBytes\n[  4]  20.00-21.00  sec  1.00 GBytes  8.61 Gbits/sec  986    321\nKBytes\n[  4]  21.00-22.00  sec  1.01 GBytes  8.64 Gbits/sec  1107    385\nKBytes\n[  4]  22.00-23.00  sec  1.01 GBytes  8.64 Gbits/sec  1036    530\nKBytes\n[  4]  23.00-24.00  sec  1.00 GBytes  8.63 Gbits/sec  1471    426\nKBytes\n[  4]  24.00-25.00  sec  1.01 GBytes  8.64 Gbits/sec  1392    386\nKBytes\n[  4]  25.00-26.00  sec  1.00 GBytes  8.61 Gbits/sec  1029    225\nKBytes\n[  4]  26.00-27.00  sec  1.01 GBytes  8.64 Gbits/sec  1246    420\nKBytes\n[  4]  27.00-28.00  sec  1024 MBytes  8.59 Gbits/sec  986    392\nKBytes\n[  4]  28.00-29.00  sec   821 MBytes  6.89 Gbits/sec  1124    325\nKBytes\n[  4]  29.00-30.00  sec  1.00 GBytes  8.60 Gbits/sec  1005    290\nKBytes\n- - - - - - - - - - - - - - - - - - - - - - - - -\n[ ID] Interval           Transfer     Bandwidth       Retr\n[  4]   0.00-30.00  sec  29.5 GBytes  8.45 Gbits/sec  31048\nsender\n[  4]   0.00-30.00  sec  29.5 GBytes  8.45 Gbits/sec\nreceiver\n\niperf Done.\n\nAnd I test it again immediately, but performance become bad again.:\n\n[root@localhost ~]# iperf3 -c 10.100.85.246 -i 1 -t 30\nConnecting to host 10.100.85.246, port 5201\n[  4] local 10.100.85.245 port 56008 connected to 10.100.85.246 port 5201\n[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd\n[  4]   0.00-1.00   sec   393 MBytes  3.29 Gbits/sec  607    335\nKBytes\n[  4]   1.00-2.00   sec  80.8 MBytes   678 Mbits/sec  817    372\nKBytes\n[  4]   2.00-3.00   sec   259 MBytes  2.18 Gbits/sec  582    544\nKBytes\n[  4]   3.00-4.00   sec   195 MBytes  1.63 Gbits/sec  403    370\nKBytes\n[  4]   4.00-5.00   sec   294 MBytes  2.46 Gbits/sec  587    346\nKBytes\n[  4]   5.00-6.00   sec   409 MBytes  3.43 Gbits/sec  719    409\nKBytes\n[  4]   6.00-7.00   sec   301 MBytes  2.52 Gbits/sec  762    411\nKBytes\n[  4]   7.00-8.00   sec   515 MBytes  4.32 Gbits/sec  654    205\nKBytes\n[  4]   8.00-9.00   sec   611 MBytes  5.12 Gbits/sec  756    431\nKBytes\n[  4]   9.00-10.00  sec   418 MBytes  3.51 Gbits/sec  646    436\nKBytes\n[  4]  10.00-11.00  sec   357 MBytes  2.99 Gbits/sec  651    337\nKBytes\n[  4]  11.00-12.00  sec   440 MBytes  3.69 Gbits/sec  575    404\nKBytes\n[  4]  12.00-13.00  sec   239 MBytes  2.00 Gbits/sec  480    399\nKBytes\n[  4]  13.00-14.00  sec   408 MBytes  3.42 Gbits/sec  634    400\nKBytes\n[  4]  14.00-15.00  sec   678 MBytes  5.69 Gbits/sec  869    462\nKBytes\n[  4]  15.00-16.00  sec   707 MBytes  5.93 Gbits/sec  987    335\nKBytes\n[  4]  16.00-17.00  sec   496 MBytes  4.16 Gbits/sec  742    332\nKBytes\n[  4]  17.00-18.00  sec   549 MBytes  4.60 Gbits/sec  468    385\nKBytes\n[  4]  18.00-19.00  sec   511 MBytes  4.28 Gbits/sec  721    291\nKBytes\n[  4]  19.00-20.00  sec   515 MBytes  4.32 Gbits/sec  957    386\nKBytes\n[  4]  20.00-21.00  sec   479 MBytes  4.02 Gbits/sec  595    373\nKBytes\n[  4]  21.00-22.00  sec   132 MBytes  1.11 Gbits/sec  442    455\nKBytes\n[  4]  22.00-23.00  sec   146 MBytes  1.22 Gbits/sec  575    345\nKBytes\n[  4]  23.00-24.00  sec   250 MBytes  2.10 Gbits/sec  822    365\nKBytes\n[  4]  24.00-25.00  sec   412 MBytes  3.46 Gbits/sec  448    399\nKBytes\n[  4]  25.00-26.00  sec   704 MBytes  5.91 Gbits/sec  674    346\nKBytes\n[  4]  26.00-27.00  sec   946 MBytes  7.93 Gbits/sec  741    281\nKBytes\n[  4]  27.00-28.00  sec   563 MBytes  4.72 Gbits/sec  732    311\nKBytes\n[  4]  28.00-29.00  sec   426 MBytes  3.57 Gbits/sec  936    527\nKBytes\n[  4]  29.00-30.00  sec  70.0 MBytes   587 Mbits/sec  366    246\nKBytes\n- - - - - - - - - - - - - - - - - - - - - - - - -\n[ ID] Interval           Transfer     Bandwidth       Retr\n[  4]   0.00-30.00  sec  12.2 GBytes  3.50 Gbits/sec  19948\nsender\n[  4]   0.00-30.00  sec  12.2 GBytes  3.49 Gbits/sec\nreceiver\n\n\nI pinned the VM's cpu and disable the irqbalance service in above testing.\n\nSo I suspect the root cause may in  vhost-implementation(may be change tx\nmode will clear some cache or ...?) / TCP congestion control?\nI will do more testing on it.\n\n\nThanks\nZhenyu Gao\n\n\n\n2017-09-05 23:11 GMT+08:00 Loftus, Ciara <ciara.loftus@intel.com>:\n\n> >\n> > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.\n> > So L4 packets's cksum were calculated in VM side but performance is not\n> > good.\n> > Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput in\n> > VM->phy->phy->VM situation. And it makes virtio-net frontend-driver\n> > support NETIF_F_SG(feature scatter-gather) as well.\n> >\n> > Signed-off-by: Zhenyu Gao <sysugaozhenyu@gmail.com>\n> > ---\n> >\n> > Here is some performance number:\n>\n> Hi Zhenyu,\n>\n> Thanks for the code changes since v3.\n> I tested a VM to VM case using iperf and observed a performance\n> degradation when the tx cksum was offloaded to the host:\n>\n> checksum in VM\n> 0.0-30.0 sec  10.9 GBytes  3.12 Gbits/sec\n> 0.0-30.0 sec  10.9 GBytes  3.11 Gbits/sec\n> 0.0-30.0 sec  11.0 GBytes  3.16 Gbits/sec\n>\n> checksum in ovs dpdk\n> 0.0-30.0 sec  7.52 GBytes  2.15 Gbits/sec\n> 0.0-30.0 sec  7.12 GBytes  2.04 Gbits/sec\n> 0.0-30.0 sec  8.17 GBytes  2.34 Gbits/sec\n>\n> I think for this feature to enabled we need performance to be roughly the\n> same or better for all use cases. For now the gap here is too big I think.\n>\n> If you wish to reproduce:\n>\n> 1 host, 2 VMs each with 1 vhost port and flows set up to switch packets\n> from each vhost port to the other.\n>\n> VM1:\n> ifconfig eth1 1.1.1.1/24 up\n> ethtool -K eth2 tx on/off\n> iperf -c 1.1.1.2 -i 1 -t 30\n>\n> VM2:\n> ifconfig eth1 1.1.1.2/24 up\n> ethtool -K eth1 tx on/off\n> iperf -s -i 1\n>\n> Thanks,\n> Ciara\n>\n> >\n> > Setup:\n> >\n> >  qperf client\n> > +---------+\n> > |   VM    |\n> > +---------+\n> >      |\n> >      |                          qperf server\n> > +--------------+              +------------+\n> > | vswitch+dpdk |              | bare-metal |\n> > +--------------+              +------------+\n> >        |                            |\n> >        |                            |\n> >       pNic---------PhysicalSwitch----\n> >\n> > do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx\n> on'\n> > in VM side.\n> >                       It offload cksum job to ovs-dpdk side.\n> >\n> > do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off'\n> in VM\n> > side.\n> >                 VM calculate cksum for tcp/udp packets.\n> >\n> > We can see huge improvment in TCP throughput if we leverage ovs-dpdk\n> > cksum.\n> >\n> > [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2\n> host-qperf-server01\n> > tcp_bw tcp_lat udp_bw udp_lat\n> >   do cksum in ovs-dpdk          do cksum in VM             without this\n> patch\n> > tcp_bw:\n> >     bw  =  1.9 MB/sec         bw  =  1.92 MB/sec        bw  =  1.95\n> MB/sec\n> > tcp_bw:\n> >     bw  =  3.97 MB/sec        bw  =  3.99 MB/sec        bw  =  3.98\n> MB/sec\n> > tcp_bw:\n> >     bw  =  7.75 MB/sec        bw  =  7.79 MB/sec        bw  =  7.89\n> MB/sec\n> > tcp_bw:\n> >     bw  =  14.7 MB/sec        bw  =  14.7 MB/sec        bw  =  14.9\n> MB/sec\n> > tcp_bw:\n> >     bw  =  27.7 MB/sec        bw  =  27.4 MB/sec        bw  =  28 MB/sec\n> > tcp_bw:\n> >     bw  =  51.1 MB/sec        bw  =  51.3 MB/sec        bw  =  51.8\n> MB/sec\n> > tcp_bw:\n> >     bw  =  86.2 MB/sec        bw  =  84.4 MB/sec        bw  =  87.6\n> MB/sec\n> > tcp_bw:\n> >     bw  =  141 MB/sec         bw  =  142 MB/sec        bw  =  141 MB/sec\n> > tcp_bw:\n> >     bw  =  203 MB/sec         bw  =  201 MB/sec        bw  =  211 MB/sec\n> > tcp_bw:\n> >     bw  =  267 MB/sec         bw  =  250 MB/sec        bw  =  260 MB/sec\n> > tcp_bw:\n> >     bw  =  324 MB/sec         bw  =  295 MB/sec        bw  =  302 MB/sec\n> > tcp_bw:\n> >     bw  =  397 MB/sec         bw  =  363 MB/sec        bw  =  347 MB/sec\n> > tcp_bw:\n> >     bw  =  765 MB/sec         bw  =  510 MB/sec        bw  =  383 MB/sec\n> > tcp_bw:\n> >     bw  =  850 MB/sec         bw  =  710 MB/sec        bw  =  417 MB/sec\n> > tcp_bw:\n> >     bw  =  1.09 GB/sec        bw  =  860 MB/sec        bw  =  444 MB/sec\n> > tcp_bw:\n> >     bw  =  1.17 GB/sec        bw  =  979 MB/sec        bw  =  447 MB/sec\n> > tcp_bw:\n> >     bw  =  1.17 GB/sec        bw  =  1.07 GB/sec       bw  =  462 MB/sec\n> > tcp_lat:\n> >     latency  =  29.1 us       latency  =  28.7 us        latency  =\n> 29.1 us\n> > tcp_lat:\n> >     latency  =  29 us         latency  =  28.8 us        latency  =  29\n> us\n> > tcp_lat:\n> >     latency  =  29 us         latency  =  28.8 us        latency  =  29\n> us\n> > tcp_lat:\n> >     latency  =  29 us         latency  =  28.9 us        latency  =  29\n> us\n> > tcp_lat:\n> >     latency  =  29.2 us       latency  =  28.9 us        latency  =\n> 29.1 us\n> > tcp_lat:\n> >     latency  =  29.1 us       latency  =  29.1 us        latency  =\n> 29.1 us\n> > tcp_lat:\n> >     latency  =  29.5 us       latency  =  29.5 us        latency  =\n> 29.5 us\n> > tcp_lat:\n> >     latency  =  29.8 us       latency  =  29.8 us        latency  =\n> 29.9 us\n> > tcp_lat:\n> >     latency  =  30.7 us       latency  =  30.7 us        latency  =\n> 30.7 us\n> > tcp_lat:\n> >     latency  =  47.1 us       latency  =  46.2 us        latency  =\n> 47.1 us\n> > tcp_lat:\n> >     latency  =  52.1 us       latency  =  52.3 us        latency  =\n> 53.3 us\n> > tcp_lat:\n> >     latency  =  44 us         latency  =  43.8 us        latency  =\n> 43.2 us\n> > tcp_lat:\n> >     latency  =  50 us         latency  =  46.6 us        latency  =\n> 47.8 us\n> > tcp_lat:\n> >      latency  =  79.2 us      latency  =  77.9 us        latency  =\n> 78.9 us\n> > tcp_lat:\n> >     latency  =  82.3 us       latency  =  81.7 us        latency  =\n> 82.2 us\n> > tcp_lat:\n> >     latency  =  96.7 us       latency  =  90.8 us        latency  =  127\n> us\n> > tcp_lat:\n> >     latency  =  215 us        latency  =  177 us        latency  =  225\n> us\n> > udp_bw:\n> >     send_bw  =  422 KB/sec        send_bw  =  415 KB/sec        send_bw\n> =  405\n> > KB/sec\n> >     recv_bw  =  402 KB/sec        recv_bw  =  404 KB/sec        recv_bw\n> =  403\n> > KB/sec\n> > udp_bw:\n> >     send_bw  =  845 KB/sec        send_bw  =  835 KB/sec        send_bw\n> =  802\n> > KB/sec\n> >     recv_bw  =  831 KB/sec        recv_bw  =  804 KB/sec        recv_bw\n> =  802\n> > KB/sec\n> > udp_bw:\n> >     send_bw  =  1.69 MB/sec       send_bw  =  1.66 MB/sec\n> send_bw  =  1.62\n> > MB/sec\n> >     recv_bw  =  1.45 MB/sec       recv_bw  =  1.63 MB/sec\n> recv_bw  =   1.6\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  3.38 MB/sec       send_bw  =  3.33 MB/sec\n> send_bw  =  3.24\n> > MB/sec\n> >     recv_bw  =  3.32 MB/sec       recv_bw  =  3.25 MB/sec\n> recv_bw  =  3.24\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  6.76 MB/sec       send_bw  =  6.63 MB/sec\n> send_bw  =  6.47\n> > MB/sec\n> >     recv_bw  =  6.42 MB/sec       recv_bw  =  5.59 MB/sec\n> recv_bw  =  6.45\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  13.5 MB/sec       send_bw  =  13.3 MB/sec\n> send_bw  =  13\n> > MB/sec\n> >     recv_bw  =  13.4 MB/sec       recv_bw  =  12.1 MB/sec\n> recv_bw  =  13\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =    27 MB/sec       send_bw  =  26.5 MB/sec\n> send_bw  =  25.9\n> > MB/sec\n> >     recv_bw  =  26.4 MB/sec       recv_bw  =  21.5 MB/sec\n> recv_bw  =  25.9\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  53.8 MB/sec       send_bw  =  52.9 MB/sec\n> send_bw  =  51.7\n> > MB/sec\n> >     recv_bw  =  49.1 MB/sec       recv_bw  =  47.6 MB/sec\n> recv_bw  =  51.1\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =   108 MB/sec       send_bw  =  105 MB/sec\n>  send_bw  =  102\n> > MB/sec\n> >     recv_bw  =  91.1 MB/sec       recv_bw  =  101 MB/sec\n>  recv_bw  =  100\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  212 MB/sec        send_bw  =  208 MB/sec\n>  send_bw  =  203\n> > MB/sec\n> >     recv_bw  =  204 MB/sec        recv_bw  =  204 MB/sec\n>  recv_bw  =  169\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  414 MB/sec        send_bw  =  407 MB/sec\n>  send_bw  =  398\n> > MB/sec\n> >     recv_bw  =  403 MB/sec        recv_bw  =  312 MB/sec\n>  recv_bw  =  343\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  555 MB/sec        send_bw  =  561 MB/sec\n>  send_bw  =  557\n> > MB/sec\n> >     recv_bw  =  354 MB/sec        recv_bw  =  368 MB/sec\n>  recv_bw  =  360\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  877 MB/sec        send_bw  =  880 MB/sec\n>  send_bw  =  868\n> > MB/sec\n> >     recv_bw  =  551 MB/sec        recv_bw  =  542 MB/sec\n>  recv_bw  =  562\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  1.1 GB/sec        send_bw  =  1.08 GB/sec\n> send_bw  =  1.09\n> > GB/sec\n> >     recv_bw  =  805 MB/sec        recv_bw  =   785 MB/sec\n> recv_bw  =   766\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  1.21 GB/sec       send_bw  =  1.19 GB/sec\n> send_bw  =  1.22\n> > GB/sec\n> >     recv_bw  =   899 MB/sec       recv_bw  =   715 MB/sec\n> recv_bw  =   700\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  1.31 GB/sec       send_bw  =  1.31 GB/sec\n> send_bw  =  1.31\n> > GB/sec\n> >     recv_bw  =   614 MB/sec       recv_bw  =   622 MB/sec\n> recv_bw  =   661\n> > MB/sec\n> > udp_bw:\n> >     send_bw  =  0 bytes/sec       send_bw  =  0 bytes/sec\n> send_bw  =  0\n> > bytes/sec\n> >     recv_bw  =  0 bytes/sec       recv_bw  =  0 bytes/sec\n> recv_bw  =  0\n> > bytes/sec\n> > udp_lat:\n> >     latency  =  25.9 us        latency  =  26.5 us        latency  =\n> 26.5 us\n> > udp_lat:\n> >     latency  =  26.3 us        latency  =  26.4 us        latency  =\n> 26.5 us\n> > udp_lat:\n> >     latency  =  26 us          latency  =  26.4 us        latency  =\n> 26.6 us\n> > udp_lat:\n> >     latency  =  26.1 us        latency  =  26.2 us        latency  =\n> 26.4 us\n> > udp_lat:\n> >     latency  =  26.3 us        latency  =  26.5 us        latency  =\n> 26.7 us\n> > udp_lat:\n> >     latency  =  26.3 us        latency  =  26.4 us        latency  =\n> 26.5 us\n> > udp_lat:\n> >     latency  =  26.3 us        latency  =  26.7 us        latency  =\n> 26.9 us\n> > udp_lat:\n> >     latency  =  27.1 us        latency  =  27.1 us        latency  =\n> 27.2 us\n> > udp_lat:\n> >     latency  =  27.5 us        latency  =  27.8 us        latency  =\n> 28.1 us\n> > udp_lat:\n> >     latency  =  28.7 us        latency  =  28.9 us        latency  =\n> 29.1 us\n> > udp_lat:\n> >     latency  =  30.4 us        latency  =  30.5 us        latency  =\n> 30.9 us\n> > udp_lat:\n> >     latency  =  41.2 us        latency  =  41.3 us        latency  =\n> 41.1 us\n> > udp_lat:\n> >     latency  =  41.3 us        latency  =  41.5 us        latency  =\n> 41.5 us\n> > udp_lat:\n> >     latency  =  64.4 us        latency  =  64.5 us        latency  =\n> 64.2 us\n> > udp_lat:\n> >     latency  =  71.5 us        latency  =  71.5 us        latency  =\n> 71.7 us\n> > udp_lat:\n> >     latency  =  120 us         latency  =  120 us         latency  =\n> 120 us\n> > udp_lat:\n> >     latency  =  0 ns           latency  =  0 ns           latency  =  0\n> ns\n> >\n> >  lib/netdev-dpdk.c | 79\n> > ++++++++++++++++++++++++++++++++++++++++++++++++++++---\n> >  1 file changed, 75 insertions(+), 4 deletions(-)\n> >\n> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c\n> > index f58e9be..0f91def 100644\n> > --- a/lib/netdev-dpdk.c\n> > +++ b/lib/netdev-dpdk.c\n> > @@ -31,6 +31,7 @@\n> >  #include <rte_errno.h>\n> >  #include <rte_eth_ring.h>\n> >  #include <rte_ethdev.h>\n> > +#include <rte_ip.h>\n> >  #include <rte_malloc.h>\n> >  #include <rte_mbuf.h>\n> >  #include <rte_meter.h>\n> > @@ -992,8 +993,7 @@ netdev_dpdk_vhost_construct(struct netdev\n> > *netdev)\n> >\n> >      err = rte_vhost_driver_disable_features(dev->vhost_id,\n> >                                  1ULL << VIRTIO_NET_F_HOST_TSO4\n> > -                                | 1ULL << VIRTIO_NET_F_HOST_TSO6\n> > -                                | 1ULL << VIRTIO_NET_F_CSUM);\n> > +                                | 1ULL << VIRTIO_NET_F_HOST_TSO6);\n> >      if (err) {\n> >          VLOG_ERR(\"rte_vhost_driver_disable_features failed for vhost\n> user \"\n> >                   \"port: %s\\n\", name);\n> > @@ -1455,6 +1455,76 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq\n> > *rxq)\n> >      rte_free(rx);\n> >  }\n> >\n> > +static inline void\n> > +netdev_dpdk_vhost_refill_l4_cksum(const char *data, struct dp_packet\n> > *pkt,\n> > +                                  uint8_t l4_proto, bool is_ipv4)\n> > +{\n> > +    void *l3hdr = (void *)(data + pkt->mbuf.l2_len);\n> > +\n> > +    if (l4_proto == IPPROTO_TCP) {\n> > +        struct tcp_header *tcp_hdr = (struct tcp_header *)(data +\n> > +                                         pkt->mbuf.l2_len +\n> pkt->mbuf.l3_len);\n> > +\n> > +        tcp_hdr->tcp_csum = 0;\n> > +        if (is_ipv4) {\n> > +            tcp_hdr->tcp_csum = rte_ipv4_udptcp_cksum(l3hdr, tcp_hdr);\n> > +        } else {\n> > +            tcp_hdr->tcp_csum = rte_ipv6_udptcp_cksum(l3hdr, tcp_hdr);\n> > +        }\n> > +    } else if (l4_proto == IPPROTO_UDP) {\n> > +        struct udp_header *udp_hdr = (struct udp_header *)(data +\n> > +                                         pkt->mbuf.l2_len +\n> pkt->mbuf.l3_len);\n> > +        /* do not recalculate udp cksum if it was 0 */\n> > +        if (udp_hdr->udp_csum != 0) {\n> > +            udp_hdr->udp_csum = 0;\n> > +            if (is_ipv4) {\n> > +                /*do not calculate udp cksum if it was a fragment IP*/\n> > +                if (IP_IS_FRAGMENT(((struct ipv4_hdr *)l3hdr)->\n> > +                                      fragment_offset)) {\n> > +                    return;\n> > +                }\n> > +\n> > +                udp_hdr->udp_csum = rte_ipv4_udptcp_cksum(l3hdr,\n> udp_hdr);\n> > +            } else {\n> > +                udp_hdr->udp_csum = rte_ipv6_udptcp_cksum(l3hdr,\n> udp_hdr);\n> > +            }\n> > +        }\n> > +    }\n> > +\n> > +    pkt->mbuf.ol_flags &= ~PKT_TX_L4_MASK;\n> > +}\n> > +\n> > +static inline void\n> > +netdev_dpdk_vhost_tx_csum(struct dp_packet **pkts, int pkt_cnt)\n> > +{\n> > +    int i;\n> > +\n> > +    for (i = 0; i < pkt_cnt; i++) {\n> > +        ovs_be16 dl_type;\n> > +        struct dp_packet *pkt = (struct dp_packet *)pkts[i];\n> > +        const char *data = dp_packet_data(pkt);\n> > +        void *l3hdr = (char *)(data + pkt->mbuf.l2_len);\n> > +\n> > +        if (!(pkt->mbuf.ol_flags & PKT_TX_L4_MASK)) {\n> > +            /* DPDK vhost tags PKT_TX_L4_MASK if a L4 packet need\n> cksum. */\n> > +            continue;\n> > +        }\n> > +\n> > +        if (OVS_UNLIKELY(pkt->mbuf.l2_len == 0 || pkt->mbuf.l3_len ==\n> 0)) {\n> > +            continue;\n> > +        }\n> > +\n> > +        dl_type = *(ovs_be16 *)(data + pkt->mbuf.l2_len - sizeof\n> dl_type);\n> > +        if (dl_type == htons(ETH_TYPE_IP)) {\n> > +            uint8_t l4_proto = ((struct ipv4_hdr\n> *)l3hdr)->next_proto_id;\n> > +            netdev_dpdk_vhost_refill_l4_cksum(data, pkt, l4_proto,\n> true);\n> > +        } else if (dl_type == htons(ETH_TYPE_IPV6)) {\n> > +            uint8_t l4_proto = ((struct ipv6_hdr *)l3hdr)->proto;\n> > +            netdev_dpdk_vhost_refill_l4_cksum(data, pkt, l4_proto,\n> false);\n> > +        }\n> > +    }\n> > +}\n> > +\n> >  /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes\n> ownership of\n> >   * 'pkts', even in case of failure.\n> >   *\n> > @@ -1646,6 +1716,8 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq\n> > *rxq,\n> >\n> >      dp_packet_batch_init_cutlen(batch);\n> >      batch->count = (int) nb_rx;\n> > +    netdev_dpdk_vhost_tx_csum(batch->packets, batch->count);\n> > +\n> >      return 0;\n> >  }\n> >\n> > @@ -3288,8 +3360,7 @@ netdev_dpdk_vhost_client_reconfigure(struct\n> > netdev *netdev)\n> >\n> >          err = rte_vhost_driver_disable_features(dev->vhost_id,\n> >                                      1ULL << VIRTIO_NET_F_HOST_TSO4\n> > -                                    | 1ULL << VIRTIO_NET_F_HOST_TSO6\n> > -                                    | 1ULL << VIRTIO_NET_F_CSUM);\n> > +                                    | 1ULL << VIRTIO_NET_F_HOST_TSO6);\n> >          if (err) {\n> >              VLOG_ERR(\"rte_vhost_driver_disable_features failed for\n> vhost user \"\n> >                       \"client port: %s\\n\", dev->up.name);\n> > --\n> > 1.8.3.1\n>\n>","headers":{"Return-Path":"<ovs-dev-bounces@openvswitch.org>","X-Original-To":["incoming@patchwork.ozlabs.org","dev@openvswitch.org"],"Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","ovs-dev@mail.linuxfoundation.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=openvswitch.org\n\t(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;\n\tenvelope-from=ovs-dev-bounces@openvswitch.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"iVSLih9Y\"; dkim-atps=neutral"],"Received":["from mail.linuxfoundation.org (mail.linuxfoundation.org\n\t[140.211.169.12])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xnMDG2yP9z9s9Y\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed,  6 Sep 2017 21:43:40 +1000 (AEST)","from mail.linux-foundation.org (localhost [127.0.0.1])\n\tby mail.linuxfoundation.org (Postfix) with ESMTP id 4F66BBBF;\n\tWed,  6 Sep 2017 11:43:36 +0000 (UTC)","from smtp1.linuxfoundation.org (smtp1.linux-foundation.org\n\t[172.17.192.35])\n\tby mail.linuxfoundation.org (Postfix) with ESMTPS id 5678ABAD\n\tfor <dev@openvswitch.org>; Wed,  6 Sep 2017 11:43:34 +0000 (UTC)","from mail-pf0-f193.google.com (mail-pf0-f193.google.com\n\t[209.85.192.193])\n\tby smtp1.linuxfoundation.org (Postfix) with ESMTPS id EE0788A\n\tfor <dev@openvswitch.org>; Wed,  6 Sep 2017 11:43:31 +0000 (UTC)","by mail-pf0-f193.google.com with SMTP id y68so2959961pfd.1\n\tfor <dev@openvswitch.org>; Wed, 06 Sep 2017 04:43:31 -0700 (PDT)","by 10.100.151.235 with HTTP; Wed, 6 Sep 2017 04:43:30 -0700 (PDT)"],"X-Greylist":"whitelisted by SQLgrey-1.7.6","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=mime-version:in-reply-to:references:from:date:message-id:subject:to\n\t:cc; bh=IubxY7xbFHyJT/Ujo1ACgjoNaNeLO5kBL+dJjLUoWm0=;\n\tb=iVSLih9YJMyrCEuHIgCc5+k4qRTre1WGL5J04pDOwT+cmrdKADAAlhnzk6Gvw9t4tW\n\txa4lvL+C9+zOyCjPJqMk/YWmp7QjnhK+aVwtsWZ5iP9AFTc7/l/PvqIJ1/wS0L98s5Bi\n\tMf/XZ7fDihc7Gty4e3ef49i6ad0jYuhby8GvxVYluNEx2ZdX0H3BWX3xhiIC3wIjmVVe\n\toaIDs2p+Qw4gUgotPXLjsy9M45HrWHlDNPjFQ10RJ4BQh5SsHqTXxQbKEuqu4XKh7OhG\n\tUM5GxbNx5MmNBsJxoUakD9HLsOtNkQ2Ftwe6/5UHik/aoqjZ9RdtLbL5ylAuabVTOxCJ\n\tnujA==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=IubxY7xbFHyJT/Ujo1ACgjoNaNeLO5kBL+dJjLUoWm0=;\n\tb=miwqiR4orpXRnnG3CqWZyklxJB+e941qE7RaBquEh6rj+ahhB2E7A7X5U77947/ltD\n\tKlqrZ8bv8HmWFqyn24PkfQi2FfHeLc7TrxMmr8t4BBB7uTqyJGgqw2U9xOOq57Bu/iyC\n\tfXjYgmkgo5e2WKyJOc/tFtdRqTpi1xJCfwfQbWS96RUlQ4plcMMKzRxYolmsLo3gLMJs\n\tp6GwueF0QdWOgsdEOcWyh6ikU0JsAFPoMa38LkUb0YJsbvM127QeqZUteKq3DIQlRGRb\n\tDAtQH6frCYO4hTdF4jJLRZZVIYxZ5JrbSaNgubRK5t9j+ul4nK6b97+Ls5aABFJ0FJjn\n\tEFxg==","X-Gm-Message-State":"AHPjjUi4X0yv51Pxf3ZVRb1Coz69rrcR5hlZsGC279MMCx+bn8EhK8SF\n\t8j6ra0ovNuHn5l0qaSMQw9R8i7JVCEW0XJ1IKqM=","X-Google-Smtp-Source":"ADKCNb5mMDuf4ZviljbDEnPFDVMYGc7vv7QYpE+ClDtcPyuFo82yulVKlm4QwiEFJcaiEd6kvWNUlhtWQmAx1u4b1lw=","X-Received":"by 10.99.109.142 with SMTP id i136mr7246066pgc.353.1504698211336;\n\tWed, 06 Sep 2017 04:43:31 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<74F120C019F4A64C9B78E802F6AD4CC278E024CD@IRSMSX106.ger.corp.intel.com>","References":"<20170902050234.17169-1-sysugaozhenyu@gmail.com>\n\t<74F120C019F4A64C9B78E802F6AD4CC278E024CD@IRSMSX106.ger.corp.intel.com>","From":"Gao Zhenyu <sysugaozhenyu@gmail.com>","Date":"Wed, 6 Sep 2017 19:43:30 +0800","Message-ID":"<CAHzJG=_rDZ+-8XYGKx=wvmMLTSuVyE3UWR075xBN11QwG7DY6A@mail.gmail.com>","To":"\"Loftus, Ciara\" <ciara.loftus@intel.com>","X-Spam-Status":"No, score=0.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,\n\tDKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE, NORMAL_HTTP_TO_IP,\n\tRCVD_IN_DNSWL_NONE, \n\tRCVD_IN_SORBS_SPAM autolearn=disabled version=3.3.1","X-Spam-Checker-Version":"SpamAssassin 3.3.1 (2010-03-16) on\n\tsmtp1.linux-foundation.org","X-Content-Filtered-By":"Mailman/MimeDel 2.1.12","Cc":"\"dev@openvswitch.org\" <dev@openvswitch.org>","Subject":"Re: [ovs-dev] [PATCH v4] netdev-dpdk: Implement TCP/UDP TX cksum in\n\tovs-dpdk side","X-BeenThere":"ovs-dev@openvswitch.org","X-Mailman-Version":"2.1.12","Precedence":"list","List-Id":"<ovs-dev.openvswitch.org>","List-Unsubscribe":"<https://mail.openvswitch.org/mailman/options/ovs-dev>,\n\t<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>","List-Archive":"<http://mail.openvswitch.org/pipermail/ovs-dev/>","List-Post":"<mailto:ovs-dev@openvswitch.org>","List-Help":"<mailto:ovs-dev-request@openvswitch.org?subject=help>","List-Subscribe":"<https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,\n\t<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"7bit","Sender":"ovs-dev-bounces@openvswitch.org","Errors-To":"ovs-dev-bounces@openvswitch.org"}}]