From patchwork Mon Jun 19 10:11:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fischetti, Antonio" X-Patchwork-Id: 777674 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wrmxw4N8Xz9s76 for ; Mon, 19 Jun 2017 20:12:52 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 0BB34BC5; Mon, 19 Jun 2017 10:12:05 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 95311B5F for ; Mon, 19 Jun 2017 10:12:01 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C897B1A8 for ; Mon, 19 Jun 2017 10:12:00 +0000 (UTC) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Jun 2017 03:12:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,360,1493708400"; d="scan'208";a="116595164" Received: from sivswdev01.ir.intel.com (HELO localhost.localdomain) ([10.237.217.45]) by fmsmga006.fm.intel.com with ESMTP; 19 Jun 2017 03:11:59 -0700 From: antonio.fischetti@intel.com To: dev@openvswitch.org Date: Mon, 19 Jun 2017 11:11:56 +0100 Message-Id: <1497867118-4195-2-git-send-email-antonio.fischetti@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: <1497867118-4195-1-git-send-email-antonio.fischetti@intel.com> References: <1497867118-4195-1-git-send-email-antonio.fischetti@intel.com> X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH RFC 2/4] dpif-netdev: Skip EMC lookup/insert for recirculated packets. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Antonio Fischetti When OVS is configured as a firewall, with thousands of active concurrent connections, the EMC gets quicly saturated and may come under heavy thrashing for the reason that original and recirculated packets keep overwrite existing active EMC entries due to its limited size(8k). This thrashing causes the EMC to be less efficient than the dcpls in terms of lookups and insertions. This patch allows to use the EMC efficiently by allowing only the 'original' packets to hit EMC. All recirculated packets are sent to classifier directly. An empirical threshold (EMC_FULL_THRESHOLD - of 50%) for EMC occupancy is set to trigger this logic. By doing so when EMC utilization exceeds EMC_FULL_THRESHOLD. - EMC Insertions are allowed just for original packets. EMC insertion and look up is skipped for recirculated packets. - Recirculated packets are sent to classifier. This patch depends on the previous one in this series. It's based on patch "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html Signed-off-by: Antonio Fischetti Signed-off-by: Bhanuprakash Bodireddy Co-authored-by: Bhanuprakash Bodireddy --- In our Connection Tracker testbench set up with table=0, priority=1 actions=drop table=0, priority=10,arp actions=NORMAL table=0, priority=100,ct_state=-trk,ip actions=ct(table=1) table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2 table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1 we saw the following performance improvement. Measured packet Rx rate (regardless of packet loss). Bidirectional test with 64B UDP packets. Each row is a test with a different number of traffic streams. The traffic generator is set so that each stream establishes one UDP connection. Mpps columns reports the Rx rates on the 2 sides. Traffic | Orig | Orig | +changes | +changes Streams | [Mpps] | [EMC entries] | [Mpps] | [EMC entries] ---------+------------+---------------+------------+--------------- 10 | 3.4, 3.4 | 20 | 3.4, 3.4 | 20 100 | 2.6, 2.7 | 200 | 2.6, 2.7 | 201 1,000 | 2.4, 2.4 | 2009 | 2.4, 2.4 | 1994 2,000 | 2.2, 2.2 | 3903 | 2.2, 2.2 | 3900 3,000 | 2.1, 2.1 | 5473 | 2.2, 2.2 | 4798 4,000 | 2.0, 2.0 | 6478 | 2.2, 2.2 | 5663 10,000 | 1.8, 1.9 | 8070 | 2.0, 2.0 | 7347 100,000 | 1.7, 1.7 | 8192 | 1.8, 1.8 | 8192 lib/dpif-netdev.c | 46 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index fd2ed52..64a3cd4 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4538,6 +4538,8 @@ dp_netdev_queue_batches(struct dp_packet *pkt, packet_batch_per_flow_update(batch, pkt, mf); } +#define EMC_FULL_THRESHOLD 0x0000F000 + /* Try to process all ('cnt') the 'packets' using only the exact match cache * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the * miniflow is copied into 'keys' and the packet pointer is moved at the @@ -4582,6 +4584,19 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, pkt_metadata_prefetch_init(&packets[i+1]->md); } + /* + * EMC lookup is skipped when one or both of the following + * two cases occurs: + * + * - EMC is disabled. This is detected from cur_min. + * + * - The EMC occupancy exceeds EMC_FULL_THRESHOLD and the + * packet to be classified is being recirculated. When this + * happens also EMC insertions are skipped for recirculated + * packets. So that EMC is used just to store entries which + * are hit from the 'original' packets. This way the EMC + * thrashing is mitigated with a benefit on performance. + */ if (!md_is_valid) { pkt_metadata_init(&packet->md, port_no); miniflow_extract(packet, &key->mf); @@ -4603,11 +4618,18 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, } else { /* Recirculated packets. */ miniflow_extract(packet, &key->mf); - if (OVS_LIKELY(cur_min)) { - key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf); - flow = emc_lookup(flow_cache, key); - } else { + if (flow_cache->n_entries & EMC_FULL_THRESHOLD) { + /* EMC occupancy is over the threshold. We skip EMC + * lookup for recirculated packets. */ flow = NULL; + } else { + if (OVS_LIKELY(cur_min)) { + key->hash = dpif_netdev_packet_get_rss_hash(packet, + &key->mf); + flow = emc_lookup(flow_cache, key); + } else { + flow = NULL; + } } } key->len = 0; /* Not computed yet. */ @@ -4695,7 +4717,13 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, add_actions->size); } ovs_mutex_unlock(&pmd->flow_mutex); - emc_probabilistic_insert(pmd, key, netdev_flow); + /* When EMC occupancy goes over a threshold we avoid inserting new + * entries for recirculated packets. */ + if (!packet->md.recirc_id) { + emc_probabilistic_insert(pmd, key, netdev_flow); + } else if (!(pmd->flow_cache.n_entries & EMC_FULL_THRESHOLD)) { + emc_probabilistic_insert(pmd, key, netdev_flow); + } } } @@ -4788,7 +4816,13 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, flow = dp_netdev_flow_cast(rules[i]); - emc_probabilistic_insert(pmd, &keys[i], flow); + /* When EMC occupancy goes over a threshold we avoid inserting new + * entries for recirculated packets. */ + if (!packet->md.recirc_id) { + emc_probabilistic_insert(pmd, &keys[i], flow); + } else if (!(pmd->flow_cache.n_entries & EMC_FULL_THRESHOLD)) { + emc_probabilistic_insert(pmd, &keys[i], flow); + } dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); }