From patchwork Wed Jul 17 13:00:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1133302 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45pcr56LTrz9s8m for ; Wed, 17 Jul 2019 23:02:45 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id C3678EF6; Wed, 17 Jul 2019 12:59:35 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 728CEEE5 for ; Wed, 17 Jul 2019 12:59:32 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 9C47271C for ; Wed, 17 Jul 2019 12:59:31 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Jul 2019 05:59:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,274,1559545200"; d="scan'208";a="187553608" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.222.100]) by fmsmga001.fm.intel.com with ESMTP; 17 Jul 2019 05:59:30 -0700 From: Harry van Haaren To: dev@openvswitch.org Date: Wed, 17 Jul 2019 14:00:33 +0100 Message-Id: <20190717130033.25114-6-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190717130033.25114-1-harry.van.haaren@intel.com> References: <20190715163636.51572-1-harry.van.haaren@intel.com> <20190717130033.25114-1-harry.van.haaren@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v12 5/5] dpif-netdev: Add specialized generic scalar functions X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This commit adds a number of specialized functions, that handle common miniflow fingerprints. This enables compiler optimization, resulting in higher performance. Below a quick description of how this optimization actually works; "Specialized functions" are "instances" of the generic implementation, but the compiler is given extra context when compiling. In the case of iterating miniflow datastructures, the most interesting value to enable compile time optimizations is the loop trip count per unit. In order to create a specialized function, there is a generic implementation, which uses a for() loop without the compiler knowing the loop trip count at compile time. The loop trip count is passed in as an argument to the function: uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count) { for(uint32_t i = 0; i < loop_count; i++) // do work } In order to "specialize" the function, we call the generic implementation with hard-coded numbers - these are compile time constants! uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count) { // use hard coded constant for compile-time constant-propogation return miniflow_impl_generic(mf, 5); } Given the compiler is aware of the loop trip count at compile time, it can perform an optimization known as "constant propogation". Combined with inlining of the miniflow_impl_generic() function, the compiler is now enabled to *compile time* unroll the loop 5x, and produce "flat" code. The last step to using the specialized functions is to utilize a function-pointer to choose the specialized (or generic) implementation. The selection of the function pointer is performed at subtable creation time, when miniflow fingerprint of the subtable is known. This technique is known as "multiple dispatch" in some literature, as it uses multiple items of information (miniflow bit counts) to select the dispatch function. By pointing the function pointer at the optimized implementation, OvS benefits from the compile time optimizations at runtime. Signed-off-by: Harry van Haaren Tested-by: Malvika Gupta --- v12: - Fix typo (Ian) - Fix missing . after comments (Ian) - Improve return value comments for probe function (Ian) - Spaces after number in optimized lookup declarations (Ilya) - Make VLOG level debug instead of info (Ilya) v11: - Use MACROs to declare and check optimized functions (Ilya) - Use captial letter for commit message (Ilya) - Rebase onto latest patchset changes - Added NEWS entry for data-path subtable specialization (Ian/Harry) - Checkpatch notes an "incorrect bracketing" in the MACROs, however I didn't find a solution that it does like. v10: - Rebase changes from previous patches. - Remove "restrict" keyword as windows CI failed, see here for details: https://ci.appveyor.com/project/istokes/ovs-q8fvv/builds/24398228 v8: - Rework to use blocks_cache from the dpcls instance, to avoid variable lenght arrays in the data-path. --- NEWS | 4 +++ lib/dpif-netdev-lookup-generic.c | 51 ++++++++++++++++++++++++++++++++ lib/dpif-netdev-private.h | 8 +++++ lib/dpif-netdev.c | 9 ++++-- 4 files changed, 70 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 81130e667..4cfffb1bc 100644 --- a/NEWS +++ b/NEWS @@ -34,6 +34,10 @@ Post-v2.11.0 * 'ovs-appctl exit' now implies cleanup of non-internal ports in userspace datapath regardless of '--cleanup' option. Use '--cleanup' to remove internal ports too. + * Datapath classifer code refactored to enable function pointers to select + the lookup implementation at runtime. This enables specialization of + specific subtables based on the miniflow attributes, enhancing the + performance of the subtable search. - OVSDB: * OVSDB clients can now resynchronize with clustered servers much more quickly after a brief disconnection, saving bandwidth and CPU time. diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c index b38adc2d5..83925cbaf 100644 --- a/lib/dpif-netdev-lookup-generic.c +++ b/lib/dpif-netdev-lookup-generic.c @@ -233,7 +233,58 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, const struct netdev_flow_key *keys[], struct dpcls_rule **rules) { + /* Here the runtime subtable->mf_bits counts are used, which forces the + * compiler to iterate normal for() loops. Due to this limitation in the + * compilers available optimizations, this function has lower performance + * than the below specialized functions. + */ return lookup_generic_impl(subtable, blocks_scratch, keys_map, keys, rules, subtable->mf_bits_set_unit0, subtable->mf_bits_set_unit1); } + +/* Expand out specialized functions with U0 and U1 bit attributes. */ +#define DECLARE_OPTIMIZED_LOOKUP_FUNCTION(U0, U1) \ + static uint32_t \ + dpcls_subtable_lookup_mf_u0w##U0##_u1w##U1( \ + struct dpcls_subtable *subtable, \ + uint64_t *blocks_scratch, \ + uint32_t keys_map, \ + const struct netdev_flow_key *keys[],\ + struct dpcls_rule **rules) \ + { \ + return lookup_generic_impl(subtable, blocks_scratch, keys_map, \ + keys, rules, U0, U1); \ + } \ + +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) + +/* Check if a specialized function is valid for the required subtable. */ +#define CHECK_LOOKUP_FUNCTION(U0,U1) \ + if (!f && u0_bits == U0 && u1_bits == U1) { \ + f = dpcls_subtable_lookup_mf_u0w##U0##_u1w##U1; \ + } + +/* Probe function to lookup an available specialized function. + * If capable to run the requested miniflow fingerprint, this function returns + * the most optimal implementation for that miniflow fingerprint. + * @retval Non-NULL A valid function to handle the miniflow bit pattern + * @retval NULL The requested miniflow is not supported by this implementation. + */ +dpcls_subtable_lookup_func +dpcls_subtable_generic_probe(uint32_t u0_bits, uint32_t u1_bits) +{ + dpcls_subtable_lookup_func f = NULL; + + CHECK_LOOKUP_FUNCTION(5, 1); + CHECK_LOOKUP_FUNCTION(4, 1); + CHECK_LOOKUP_FUNCTION(4, 0); + + if (f) { + VLOG_DBG("Subtable using Generic Optimized for u0 %d, u1 %d\n", + u0_bits, u1_bits); + } + return f; +} diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h index e1ceaa641..f541343f4 100644 --- a/lib/dpif-netdev-private.h +++ b/lib/dpif-netdev-private.h @@ -69,6 +69,14 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, const struct netdev_flow_key *keys[], struct dpcls_rule **rules); +/* Probe function to select a specialized version of the generic lookup + * implementation. This provides performance benefit due to compile-time + * optimizations such as loop-unrolling. These are enabled by the compile-time + * constants in the specific function implementations. + */ +dpcls_subtable_lookup_func +dpcls_subtable_generic_probe(uint32_t u0_bit_count, uint32_t u1_bit_count); + /* A set of rules that all have the same fields wildcarded. */ struct dpcls_subtable { /* The fields are only used by writers. */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1b92cd4c9..cceb8aa07 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7735,8 +7735,13 @@ dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask) cls->blocks_scratch_size = blocks_required_per_pkt; } - /* Assign the generic lookup - this works with any miniflow fingerprint. */ - subtable->lookup_func = dpcls_subtable_lookup_generic; + /* Probe for a specialized generic lookup function. */ + subtable->lookup_func = dpcls_subtable_generic_probe(unit0, unit1); + + /* If not set, assign generic lookup. Generic works for any miniflow. */ + if (!subtable->lookup_func) { + subtable->lookup_func = dpcls_subtable_lookup_generic; + } cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask->hash); /* Add the new subtable at the end of the pvector (with no hits yet) */