From patchwork Wed Jul 26 15:21:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 793978 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xHf3p45Z8z9s7M for ; Thu, 27 Jul 2017 01:22:14 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id A61E0B92; Wed, 26 Jul 2017 15:21:26 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 2F5DBB1B for ; Wed, 26 Jul 2017 15:21:25 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mailout3.w1.samsung.com (mailout3.w1.samsung.com [210.118.77.13]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id BFFDC12A for ; Wed, 26 Jul 2017 15:21:23 +0000 (UTC) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OTP00IVJENLJTA0@mailout3.w1.samsung.com> for ovs-dev@openvswitch.org; Wed, 26 Jul 2017 16:21:21 +0100 (BST) Received: from eusmges1.samsung.com (unknown [203.254.199.239]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20170726152120eucas1p21219eac7050ff7e7f827e48131d2785c~U6pAXIIj-3147731477eucas1p2J; Wed, 26 Jul 2017 15:21:20 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges1.samsung.com (EUCPMTA) with SMTP id 4F.E4.14140.273B8795; Wed, 26 Jul 2017 16:21:22 +0100 (BST) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170726152119eucas1p14f8fe7f983ed757da1df6759d2c370c0~U6o-nDSzm0958309583eucas1p19; Wed, 26 Jul 2017 15:21:19 +0000 (GMT) X-AuditID: cbfec7ef-f796a6d00000373c-fb-5978b3725c4e Received: from eusync4.samsung.com ( [203.254.199.214]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id 51.E3.17452.F63B8795; Wed, 26 Jul 2017 16:21:19 +0100 (BST) Received: from imaximets.rnd.samsung.ru ([106.109.129.180]) by eusync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OTP0072GENAYM00@eusync4.samsung.com>; Wed, 26 Jul 2017 16:21:19 +0100 (BST) From: Ilya Maximets To: ovs-dev@openvswitch.org, Bhanuprakash Bodireddy Date: Wed, 26 Jul 2017 18:21:05 +0300 Message-id: <1501082468-22006-2-git-send-email-i.maximets@samsung.com> X-Mailer: git-send-email 2.7.4 In-reply-to: <1501082468-22006-1-git-send-email-i.maximets@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrLIsWRmVeSWpSXmKPExsWy7djPc7pFmysiDRa8U7RY/YvTYuczZYtX kxsYLVr6ZzJbPH+xkNnizpWfbBbTPt9mt7jS/pPdYu2hD+wWcz89Z3Tg8li85yWTx7Ob/xk9 nl/rYfF4v+8qm0ffllWMHu/mv2ULYIvisklJzcksSy3St0vgypj9zqngo1/F/RcTGRsYWy26 GDk5JARMJK5ufcIOYYtJXLi3nq2LkYtDSGAZo8T+qetYIZzPjBINp6exwnTMW9XLDle1ZkYb M4TTzCTRvfAhM0gVm4COxKnVRxhBbBGBCIkH32eBdTALbGOSWPJ4ElhCWMBGYtX0qWA2i4Cq xL7dt8AO4RVwk7ix/B3UOjmJm+c6gYZycHAKuEucvqwAMkdCoJtd4lj3ZVaQuISArMSmA8wQ 5S4SNw8+hrKFJV4d3wL1m4xEZ8dBJojeZqB3Vl1ihHAmMEp8aV7OBFFlL3Hq5lUwm1mAT2LS tunMEAt4JTrahCBKPCS2PzvNAmE7SmzY9oYF4vtZjBKX3/SxTWCUWcDIsIpRJLW0ODc9tdhQ rzgxt7g0L10vOT93EyMw0k//O/5+B+PT5pBDjAIcjEo8vCumVEQKsSaWFVfmHmKU4GBWEuFN 7AQK8aYkVlalFuXHF5XmpBYfYpTmYFES5+U9dS1CSCA9sSQ1OzW1ILUIJsvEwSnVwNi0MGmu fqlHo+7iK6vnPf9vOJlDO1TkwPFF/g3Bdj38gl8+7d7XwqFjNuXJ4dZ78fvFW2qlpSTO8qYr fff4snsC55q815r1s0UTnI+c9Ouq/+o/8VPrzyPLgqMenV9uyeP9Qmf2POtpwgFuVe//XffU dLpocvXF5J3/Xze6NtieK5RvklHis1ZiKc5INNRiLipOBAABXEBn8AIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrNLMWRmVeSWpSXmKPExsVy+t/xa7r5mysiDR49VrJY/YvTYuczZYtX kxsYLVr6ZzJbPH+xkNnizpWfbBbTPt9mt7jS/pPdYu2hD+wWcz89Z3Tg8li85yWTx7Ob/xk9 nl/rYfF4v+8qm0ffllWMHu/mv2ULYItys8lITUxJLVJIzUvOT8nMS7dVCg1x07VQUshLzE21 VYrQ9Q0JUlIoS8wpBfKMDNCAg3OAe7CSvl2CW8bsd04FH/0q7r+YyNjA2GrRxcjJISFgIjFv VS87hC0mceHeerYuRi4OIYEljBLTJh2DclqZJE58ucwCUsUmoCNxavURRhBbRCBComXOekaQ ImaBbUwSjy98AisSFrCRWDV9KlgRi4CqxL7dt8BW8Aq4SdxY/o4VYp2cxM1zncxdjBwcnALu EqcvK4CEhYBKbq96wzKBkXcBI8MqRpHU0uLc9NxiQ73ixNzi0rx0veT83E2MwIDfduzn5h2M lzYGH2IU4GBU4uFdMaUiUog1say4MvcQowQHs5IIb2InUIg3JbGyKrUoP76oNCe1+BCjKdBN E5mlRJPzgdGYVxJvaGJobmloZGxhYW5kpCTOW/LhSriQQHpiSWp2ampBahFMHxMHp1QDo/zK 611bLpuY/uLpLc+W2XXlfLtSt5df+QWxsjl58jqrXveuu+PV7nwjhEX5ocsM37z0xW99nrEo N66PVP8mLGN2OHVn07vXf06kRAtf6a5K4OjRfndWpOqzEFMb5ypPYbWmcwsCX6w7+zmoL/Tx ZIVzDPNftR/1bnQxv+4+J7RszqywS2dPKrEUZyQaajEXFScCAEwsijiOAgAA X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20170726152119eucas1p14f8fe7f983ed757da1df6759d2c370c0 X-Msg-Generator: CA X-Sender-IP: 182.198.249.179 X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?= X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20170726152119eucas1p14f8fe7f983ed757da1df6759d2c370c0 X-RootMTR: 20170726152119eucas1p14f8fe7f983ed757da1df6759d2c370c0 References: <1501082468-22006-1-git-send-email-i.maximets@samsung.com> X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Heetae Ahn , Ilya Maximets Subject: [ovs-dev] [PATCH v2 1/4] dpif-netdev: Output packet batching. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org While processing incoming batch of packets they are scattered across many per-flow batches and sent separately. This becomes an issue while using more than a few flows. For example if we have balanced-tcp OvS bonding with 2 ports there will be 256 datapath internal flows for each dp_hash pattern. This will lead to scattering of a single recieved batch across all of that 256 per-flow batches and invoking send for each packet separately. This behaviour greatly degrades overall performance of netdev_send because of inability to use advantages of vectorized transmit functions. But the half (if 2 ports in bonding) of datapath flows will have the same output actions. This means that we can collect them in a single place back and send at once using single call to netdev_send. This patch introduces per-port packet batch for output packets for that purpose. 'output_pkts' batch is thread local and located in send port cache. Signed-off-by: Ilya Maximets --- lib/dpif-netdev.c | 104 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 82 insertions(+), 22 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 47a9fa0..075cfd2 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -498,6 +498,7 @@ struct tx_port { int qid; long long last_used; struct hmap_node node; + struct dp_packet_batch output_pkts; }; /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate @@ -629,9 +630,10 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, size_t actions_len, long long now); static void dp_netdev_input(struct dp_netdev_pmd_thread *, - struct dp_packet_batch *, odp_port_t port_no); + struct dp_packet_batch *, odp_port_t port_no, + long long now); static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, - struct dp_packet_batch *); + struct dp_packet_batch *, long long now); static void dp_netdev_disable_upcall(struct dp_netdev *); static void dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd); @@ -661,6 +663,9 @@ static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd, static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread *pmd, struct rxq_poll *poll) OVS_REQUIRES(pmd->port_mutex); +static void +dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd, + long long now); static void reconfigure_datapath(struct dp_netdev *dp) OVS_REQUIRES(dp->port_mutex); static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd); @@ -2794,6 +2799,7 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *pmd; struct dp_packet_batch pp; + long long now = time_msec(); if (dp_packet_size(execute->packet) < ETH_HEADER_LEN || dp_packet_size(execute->packet) > UINT16_MAX) { @@ -2836,8 +2842,8 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) dp_packet_batch_init_packet(&pp, execute->packet); dp_netdev_execute_actions(pmd, &pp, false, execute->flow, - execute->actions, execute->actions_len, - time_msec()); + execute->actions, execute->actions_len, now); + dp_netdev_pmd_flush_output_packets(pmd, now); if (pmd->core_id == NON_PMD_CORE_ID) { ovs_mutex_unlock(&dp->non_pmd_mutex); @@ -3086,6 +3092,37 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, non_atomic_ullong_add(&pmd->cycles.n[type], interval); } +static void +dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, + struct tx_port *p, long long now) +{ + int tx_qid; + bool dynamic_txqs; + + dynamic_txqs = p->port->dynamic_txqs; + if (dynamic_txqs) { + tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now); + } else { + tx_qid = pmd->static_tx_qid; + } + + netdev_send(p->port->netdev, tx_qid, &p->output_pkts, true, dynamic_txqs); + dp_packet_batch_init(&p->output_pkts); +} + +static void +dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd, + long long now) +{ + struct tx_port *p; + + HMAP_FOR_EACH (p, node, &pmd->send_port_cache) { + if (!dp_packet_batch_is_empty(&p->output_pkts)) { + dp_netdev_pmd_flush_output_on_port(pmd, p, now); + } + } +} + static int dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, struct netdev_rxq *rx, @@ -3098,10 +3135,13 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, dp_packet_batch_init(&batch); error = netdev_rxq_recv(rx, &batch); if (!error) { + long long now = time_msec(); + *recirc_depth_get() = 0; batch_cnt = batch.count; - dp_netdev_input(pmd, &batch, port_no); + dp_netdev_input(pmd, &batch, port_no, now); + dp_netdev_pmd_flush_output_packets(pmd, now); } else if (error != EAGAIN && error != EOPNOTSUPP) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); @@ -4378,6 +4418,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd, tx->port = port; tx->qid = -1; + dp_packet_batch_init(&tx->output_pkts); hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port->port_no)); pmd->need_reload = true; @@ -4798,7 +4839,8 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, static void dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, - bool md_is_valid, odp_port_t port_no) + bool md_is_valid, odp_port_t port_no, + long long now) { int cnt = packets->count; #if !defined(__CHECKER__) && !defined(_WIN32) @@ -4810,7 +4852,6 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct netdev_flow_key keys[PKT_ARRAY_SIZE]; struct packet_batch_per_flow batches[PKT_ARRAY_SIZE]; - long long now = time_msec(); size_t n_batches; odp_port_t in_port; @@ -4846,16 +4887,16 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, static void dp_netdev_input(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, - odp_port_t port_no) + odp_port_t port_no, long long now) { - dp_netdev_input__(pmd, packets, false, port_no); + dp_netdev_input__(pmd, packets, false, port_no, now); } static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *pmd, - struct dp_packet_batch *packets) + struct dp_packet_batch *packets, long long now) { - dp_netdev_input__(pmd, packets, true, 0); + dp_netdev_input__(pmd, packets, true, 0, now); } struct dp_netdev_execute_aux { @@ -5033,18 +5074,37 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, case OVS_ACTION_ATTR_OUTPUT: p = pmd_send_port_cache_lookup(pmd, nl_attr_get_odp_port(a)); if (OVS_LIKELY(p)) { - int tx_qid; - bool dynamic_txqs; + struct dp_packet *packet; + struct dp_packet_batch out; - dynamic_txqs = p->port->dynamic_txqs; - if (dynamic_txqs) { - tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now); - } else { - tx_qid = pmd->static_tx_qid; + if (!may_steal) { + dp_packet_batch_clone(&out, packets_); + dp_packet_batch_reset_cutlen(packets_); + packets_ = &out; + } + dp_packet_batch_apply_cutlen(packets_); + +#ifdef DPDK_NETDEV + if (OVS_UNLIKELY(!dp_packet_batch_is_empty(&p->output_pkts) + && packets_->packets[0]->source + != p->output_pkts.packets[0]->source)) { + /* XXX: netdev-dpdk assumes that all packets in a single + * outptut batch has the same source. Flush here to + * avoid memory access issues. */ + dp_netdev_pmd_flush_output_on_port(pmd, p, now); + } +#endif + + if (OVS_UNLIKELY(dp_packet_batch_size(&p->output_pkts) + + dp_packet_batch_size(packets_) > NETDEV_MAX_BURST)) { + /* Some packets was generated while input batch processing. + * Flush here to avoid overflow. */ + dp_netdev_pmd_flush_output_on_port(pmd, p, now); } - netdev_send(p->port->netdev, tx_qid, packets_, may_steal, - dynamic_txqs); + DP_PACKET_BATCH_FOR_EACH (packet, packets_) { + dp_packet_batch_add(&p->output_pkts, packet); + } return; } break; @@ -5085,7 +5145,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, } (*depth)++; - dp_netdev_recirculate(pmd, packets_); + dp_netdev_recirculate(pmd, packets_, now); (*depth)--; return; } @@ -5150,7 +5210,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, } (*depth)++; - dp_netdev_recirculate(pmd, packets_); + dp_netdev_recirculate(pmd, packets_, now); (*depth)--; return;