From patchwork Fri Feb 12 17:17:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439950 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgF91hbyz9sTD for ; Sat, 13 Feb 2021 04:18:21 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id A88AD87627; Fri, 12 Feb 2021 17:18:19 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f6LrFVC38wir; Fri, 12 Feb 2021 17:17:47 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id A356887530; Fri, 12 Feb 2021 17:17:45 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8D124C1D9F; Fri, 12 Feb 2021 17:17:45 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 22D3EC013A for ; Fri, 12 Feb 2021 17:17:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id E9B0D6F7B8 for ; Fri, 12 Feb 2021 17:17:44 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gjcP15W5XTBq for ; Fri, 12 Feb 2021 17:17:37 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 6D98E6F674; Fri, 12 Feb 2021 17:17:37 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 316646F674 for ; Fri, 12 Feb 2021 17:17:33 +0000 (UTC) IronPort-SDR: fDdvPuZN7xKGeqYaUNlm9TgSvJs9or96Pps/sfXICHDo67bw6hC/YqFg3jeXhqZL21a1K5janF 029tL3dci/Qw== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595209" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595209" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:32 -0800 IronPort-SDR: mB28iuTmDmrmrJoB+SpKzZ0/bFP2t1JZfjy+jmubbMVxgTEFVTHS/elhRdEH4RxPd+ybPDDgmk TXGk1OsuL07g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360484843" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:30 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:03 +0000 Message-Id: <20210212171718.2189798-2-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 01/16] dpif-netdev: Refactor to multiple header files. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Split the very large file dpif-netdev.c and the datastructures it contains into multiple header files. Each header file is responsible for the datastructures of that component. This logical split allows better reuse and modularity of the code, and reduces the very large file dpif-netdev.c to be more managable. Due to dependencies between components, it is not possible to move component in smaller granularities than this patch. To explain the dependencies better, eg: DPCLS has no deps (from dpif-netdev.c file) FLOW depends on DPCLS (struct dpcls_rule) DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key) THREAD depends on DFC (struct dfc_cache) DFC_PROC depends on THREAD (struct pmd_thread) DPCLS lookup.h/c require only DPCLS DPCLS implementations require only dpif-netdev-lookup.h. - This change was made in 2.12 release with function pointers - This commit only refactors the name to "private-dpcls.h" Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- lib/automake.mk | 4 + lib/dpif-netdev-lookup-autovalidator.c | 1 - lib/dpif-netdev-lookup-avx512-gather.c | 1 - lib/dpif-netdev-lookup-generic.c | 1 - lib/dpif-netdev-lookup.h | 2 +- lib/dpif-netdev-private-dfc.h | 244 ++++++++++++ lib/dpif-netdev-private-dpcls.h | 129 ++++++ lib/dpif-netdev-private-flow.h | 162 ++++++++ lib/dpif-netdev-private-thread.h | 206 ++++++++++ lib/dpif-netdev-private.h | 100 +---- lib/dpif-netdev.c | 519 +------------------------ 11 files changed, 760 insertions(+), 609 deletions(-) create mode 100644 lib/dpif-netdev-private-dfc.h create mode 100644 lib/dpif-netdev-private-dpcls.h create mode 100644 lib/dpif-netdev-private-flow.h create mode 100644 lib/dpif-netdev-private-thread.h diff --git a/lib/automake.mk b/lib/automake.mk index 39afbff9d..0e83145b5 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -111,6 +111,10 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-lookup-generic.c \ lib/dpif-netdev.c \ lib/dpif-netdev.h \ + lib/dpif-netdev-private-dfc.h \ + lib/dpif-netdev-private-dpcls.h \ + lib/dpif-netdev-private-flow.h \ + lib/dpif-netdev-private-thread.h \ lib/dpif-netdev-private.h \ lib/dpif-netdev-perf.c \ lib/dpif-netdev-perf.h \ diff --git a/lib/dpif-netdev-lookup-autovalidator.c b/lib/dpif-netdev-lookup-autovalidator.c index 97b59fdd0..475e1ab1e 100644 --- a/lib/dpif-netdev-lookup-autovalidator.c +++ b/lib/dpif-netdev-lookup-autovalidator.c @@ -17,7 +17,6 @@ #include #include "dpif-netdev.h" #include "dpif-netdev-lookup.h" -#include "dpif-netdev-private.h" #include "openvswitch/vlog.h" VLOG_DEFINE_THIS_MODULE(dpif_lookup_autovalidator); diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 5e3634249..8fc1cdfa5 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -21,7 +21,6 @@ #include "dpif-netdev.h" #include "dpif-netdev-lookup.h" -#include "dpif-netdev-private.h" #include "cmap.h" #include "flow.h" #include "pvector.h" diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c index b1a0cfc36..e3b6be4b6 100644 --- a/lib/dpif-netdev-lookup-generic.c +++ b/lib/dpif-netdev-lookup-generic.c @@ -17,7 +17,6 @@ #include #include "dpif-netdev.h" -#include "dpif-netdev-private.h" #include "dpif-netdev-lookup.h" #include "bitmap.h" diff --git a/lib/dpif-netdev-lookup.h b/lib/dpif-netdev-lookup.h index bd72aa29b..59f51faa0 100644 --- a/lib/dpif-netdev-lookup.h +++ b/lib/dpif-netdev-lookup.h @@ -19,7 +19,7 @@ #include #include "dpif-netdev.h" -#include "dpif-netdev-private.h" +#include "dpif-netdev-private-dpcls.h" /* Function to perform a probe for the subtable bit fingerprint. * Returns NULL if not valid, or a valid function pointer to call for this diff --git a/lib/dpif-netdev-private-dfc.h b/lib/dpif-netdev-private-dfc.h new file mode 100644 index 000000000..8f6a4899e --- /dev/null +++ b/lib/dpif-netdev-private-dfc.h @@ -0,0 +1,244 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_DFC_H +#define DPIF_NETDEV_PRIVATE_DFC_H 1 + +#include +#include + +#include "dpif.h" +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-flow.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* EMC cache and SMC cache compose the datapath flow cache (DFC) + * + * Exact match cache for frequently used flows + * + * The cache uses a 32-bit hash of the packet (which can be the RSS hash) to + * search its entries for a miniflow that matches exactly the miniflow of the + * packet. It stores the 'dpcls_rule' (rule) that matches the miniflow. + * + * A cache entry holds a reference to its 'dp_netdev_flow'. + * + * A miniflow with a given hash can be in one of EM_FLOW_HASH_SEGS different + * entries. The 32-bit hash is split into EM_FLOW_HASH_SEGS values (each of + * them is EM_FLOW_HASH_SHIFT bits wide and the remainder is thrown away). Each + * value is the index of a cache entry where the miniflow could be. + * + * + * Signature match cache (SMC) + * + * This cache stores a 16-bit signature for each flow without storing keys, and + * stores the corresponding 16-bit flow_table index to the 'dp_netdev_flow'. + * Each flow thus occupies 32bit which is much more memory efficient than EMC. + * SMC uses a set-associative design that each bucket contains + * SMC_ENTRY_PER_BUCKET number of entries. + * Since 16-bit flow_table index is used, if there are more than 2^16 + * dp_netdev_flow, SMC will miss them that cannot be indexed by a 16-bit value. + * + * + * Thread-safety + * ============= + * + * Each pmd_thread has its own private exact match cache. + * If dp_netdev_input is not called from a pmd thread, a mutex is used. + */ + +#define EM_FLOW_HASH_SHIFT 13 +#define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT) +#define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) +#define EM_FLOW_HASH_SEGS 2 + +/* SMC uses a set-associative design. A bucket contains a set of entries that + * a flow item can occupy. For now, it uses one hash function rather than two + * as for the EMC design. */ +#define SMC_ENTRY_PER_BUCKET 4 +#define SMC_ENTRIES (1u << 20) +#define SMC_BUCKET_CNT (SMC_ENTRIES / SMC_ENTRY_PER_BUCKET) +#define SMC_MASK (SMC_BUCKET_CNT - 1) + +/* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ +#define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 +#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ + DEFAULT_EM_FLOW_INSERT_INV_PROB) + +struct emc_entry { + struct dp_netdev_flow *flow; + struct netdev_flow_key key; /* key.hash used for emc hash value. */ +}; + +struct emc_cache { + struct emc_entry entries[EM_FLOW_HASH_ENTRIES]; + int sweep_idx; /* For emc_cache_slow_sweep(). */ +}; + +struct smc_bucket { + uint16_t sig[SMC_ENTRY_PER_BUCKET]; + uint16_t flow_idx[SMC_ENTRY_PER_BUCKET]; +}; + +/* Signature match cache, differentiate from EMC cache */ +struct smc_cache { + struct smc_bucket buckets[SMC_BUCKET_CNT]; +}; + +struct dfc_cache { + struct emc_cache emc_cache; + struct smc_cache smc_cache; +}; + +/* Iterate in the exact match cache through every entry that might contain a + * miniflow with hash 'HASH'. */ +#define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH) \ + for (uint32_t i__ = 0, srch_hash__ = (HASH); \ + (CURRENT_ENTRY) = &(EMC)->entries[srch_hash__ & EM_FLOW_HASH_MASK], \ + i__ < EM_FLOW_HASH_SEGS; \ + i__++, srch_hash__ >>= EM_FLOW_HASH_SHIFT) + +static inline bool +emc_entry_alive(struct emc_entry *ce) +{ + return ce->flow && !ce->flow->dead; +} + +static inline void +emc_clear_entry(struct emc_entry *ce) +{ + if (ce->flow) { + dp_netdev_flow_unref(ce->flow); + ce->flow = NULL; + } +} + +static inline void +smc_clear_entry(struct smc_bucket *b, int idx) +{ + b->flow_idx[idx] = UINT16_MAX; +} + +static inline void +emc_cache_init(struct emc_cache *flow_cache) +{ + int i; + + flow_cache->sweep_idx = 0; + for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { + flow_cache->entries[i].flow = NULL; + flow_cache->entries[i].key.hash = 0; + flow_cache->entries[i].key.len = sizeof(struct miniflow); + flowmap_init(&flow_cache->entries[i].key.mf.map); + } +} + +static inline void +smc_cache_init(struct smc_cache *smc_cache) +{ + int i, j; + for (i = 0; i < SMC_BUCKET_CNT; i++) { + for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { + smc_cache->buckets[i].flow_idx[j] = UINT16_MAX; + } + } +} + +static inline void +dfc_cache_init(struct dfc_cache *flow_cache) +{ + emc_cache_init(&flow_cache->emc_cache); + smc_cache_init(&flow_cache->smc_cache); +} + +static inline void +emc_cache_uninit(struct emc_cache *flow_cache) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { + emc_clear_entry(&flow_cache->entries[i]); + } +} + +static inline void +smc_cache_uninit(struct smc_cache *smc) +{ + int i, j; + + for (i = 0; i < SMC_BUCKET_CNT; i++) { + for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { + smc_clear_entry(&(smc->buckets[i]), j); + } + } +} + +static inline void +dfc_cache_uninit(struct dfc_cache *flow_cache) +{ + smc_cache_uninit(&flow_cache->smc_cache); + emc_cache_uninit(&flow_cache->emc_cache); +} + +/* Check and clear dead flow references slowly (one entry at each + * invocation). */ +static inline void +emc_cache_slow_sweep(struct emc_cache *flow_cache) +{ + struct emc_entry *entry = &flow_cache->entries[flow_cache->sweep_idx]; + + if (!emc_entry_alive(entry)) { + emc_clear_entry(entry); + } + flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) & EM_FLOW_HASH_MASK; +} + +/* Used to compare 'netdev_flow_key' in the exact match cache to a miniflow. + * The maps are compared bitwise, so both 'key->mf' and 'mf' must have been + * generated by miniflow_extract. */ +static inline bool +emc_flow_key_equal_mf(const struct netdev_flow_key *key, + const struct miniflow *mf) +{ + return !memcmp(&key->mf, mf, key->len); +} + +static inline struct dp_netdev_flow * +emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) +{ + struct emc_entry *current_entry; + + EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) { + if (current_entry->key.hash == key->hash + && emc_entry_alive(current_entry) + && emc_flow_key_equal_mf(¤t_entry->key, &key->mf)) { + + /* We found the entry with the 'key->mf' miniflow */ + return current_entry->flow; + } + } + + return NULL; +} + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-dfc.h */ diff --git a/lib/dpif-netdev-private-dpcls.h b/lib/dpif-netdev-private-dpcls.h new file mode 100644 index 000000000..5bc579bba --- /dev/null +++ b/lib/dpif-netdev-private-dpcls.h @@ -0,0 +1,129 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_DPCLS_H +#define DPIF_NETDEV_PRIVATE_DPCLS_H 1 + +#include +#include + +#include "dpif.h" +#include "cmap.h" +#include "openvswitch/thread.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* Forward declaration for lookup_func typedef. */ +struct dpcls_subtable; +struct dpcls_rule; + +/* Must be public as it is instantiated in subtable struct below. */ +struct netdev_flow_key { + uint32_t hash; /* Hash function differs for different users. */ + uint32_t len; /* Length of the following miniflow (incl. map). */ + struct miniflow mf; + uint64_t buf[FLOW_MAX_PACKET_U64S]; +}; + +/* A rule to be inserted to the classifier. */ +struct dpcls_rule { + struct cmap_node cmap_node; /* Within struct dpcls_subtable 'rules'. */ + struct netdev_flow_key *mask; /* Subtable's mask. */ + struct netdev_flow_key flow; /* Matching key. */ + /* 'flow' must be the last field, additional space is allocated here. */ +}; + +/* Lookup function for a subtable in the dpcls. This function is called + * by each subtable with an array of packets, and a bitmask of packets to + * perform the lookup on. Using a function pointer gives flexibility to + * optimize the lookup function based on subtable properties and the + * CPU instruction set available at runtime. + */ +typedef +uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable *subtable, + uint32_t keys_map, + const struct netdev_flow_key *keys[], + struct dpcls_rule **rules); + +/* A set of rules that all have the same fields wildcarded. */ +struct dpcls_subtable { + /* The fields are only used by writers. */ + struct cmap_node cmap_node OVS_GUARDED; /* Within dpcls 'subtables_map'. */ + + /* These fields are accessed by readers. */ + struct cmap rules; /* Contains "struct dpcls_rule"s. */ + uint32_t hit_cnt; /* Number of match hits in subtable in current + optimization interval. */ + + /* Miniflow fingerprint that the subtable matches on. The miniflow "bits" + * are used to select the actual dpcls lookup implementation at subtable + * creation time. + */ + uint8_t mf_bits_set_unit0; + uint8_t mf_bits_set_unit1; + + /* The lookup function to use for this subtable. If there is a known + * property of the subtable (eg: only 3 bits of miniflow metadata is + * used for the lookup) then this can point at an optimized version of + * the lookup function for this particular subtable. */ + dpcls_subtable_lookup_func lookup_func; + + /* Caches the masks to match a packet to, reducing runtime calculations. */ + uint64_t *mf_masks; + + struct netdev_flow_key mask; /* Wildcards for fields (const). */ + /* 'mask' must be the last field, additional space is allocated here. */ +}; + +/* Iterate through netdev_flow_key TNL u64 values specified by 'FLOWMAP'. */ +#define NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP(VALUE, KEY, FLOWMAP) \ + MINIFLOW_FOR_EACH_IN_FLOWMAP (VALUE, &(KEY)->mf, FLOWMAP) + +/* Generates a mask for each bit set in the subtable's miniflow. */ +void +netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, + uint64_t *mf_masks, + const uint32_t mf_bits_u0, + const uint32_t mf_bits_u1); + +/* Matches a dpcls rule against the incoming packet in 'target' */ +bool dpcls_rule_matches_key(const struct dpcls_rule *rule, + const struct netdev_flow_key *target); + +static inline uint32_t +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet, + const struct miniflow *mf) +{ + uint32_t hash; + + if (OVS_LIKELY(dp_packet_rss_valid(packet))) { + hash = dp_packet_get_rss_hash(packet); + } else { + hash = miniflow_hash_5tuple(mf, 0); + dp_packet_set_rss_hash(packet, hash); + } + + return hash; +} + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-dpcls.h */ diff --git a/lib/dpif-netdev-private-flow.h b/lib/dpif-netdev-private-flow.h new file mode 100644 index 000000000..ec52cf5ab --- /dev/null +++ b/lib/dpif-netdev-private-flow.h @@ -0,0 +1,162 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_FLOW_H +#define DPIF_NETDEV_PRIVATE_FLOW_H 1 + +#include +#include + +#include "dpif.h" +#include "dpif-netdev-private-dpcls.h" +#include "cmap.h" +#include "openvswitch/thread.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* Contained by struct dp_netdev_flow's 'stats' member. */ +struct dp_netdev_flow_stats { + atomic_llong used; /* Last used time, in monotonic msecs. */ + atomic_ullong packet_count; /* Number of packets matched. */ + atomic_ullong byte_count; /* Number of bytes matched. */ + atomic_uint16_t tcp_flags; /* Bitwise-OR of seen tcp_flags values. */ +}; + +/* Contained by struct dp_netdev_flow's 'last_attrs' member. */ +struct dp_netdev_flow_attrs { + atomic_bool offloaded; /* True if flow is offloaded to HW. */ + ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */ +}; + +/* A flow in 'dp_netdev_pmd_thread's 'flow_table'. + * + * + * Thread-safety + * ============= + * + * Except near the beginning or ending of its lifespan, rule 'rule' belongs to + * its pmd thread's classifier. The text below calls this classifier 'cls'. + * + * Motivation + * ---------- + * + * The thread safety rules described here for "struct dp_netdev_flow" are + * motivated by two goals: + * + * - Prevent threads that read members of "struct dp_netdev_flow" from + * reading bad data due to changes by some thread concurrently modifying + * those members. + * + * - Prevent two threads making changes to members of a given "struct + * dp_netdev_flow" from interfering with each other. + * + * + * Rules + * ----- + * + * A flow 'flow' may be accessed without a risk of being freed during an RCU + * grace period. Code that needs to hold onto a flow for a while + * should try incrementing 'flow->ref_cnt' with dp_netdev_flow_ref(). + * + * 'flow->ref_cnt' protects 'flow' from being freed. It doesn't protect the + * flow from being deleted from 'cls' and it doesn't protect members of 'flow' + * from modification. + * + * Some members, marked 'const', are immutable. Accessing other members + * requires synchronization, as noted in more detail below. + */ +struct dp_netdev_flow { + const struct flow flow; /* Unmasked flow that created this entry. */ + /* Hash table index by unmasked flow. */ + const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */ + /* 'flow_table'. */ + const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */ + const ovs_u128 ufid; /* Unique flow identifier. */ + const ovs_u128 mega_ufid; /* Unique mega flow identifier. */ + const unsigned pmd_id; /* The 'core_id' of pmd thread owning this */ + /* flow. */ + + /* Number of references. + * The classifier owns one reference. + * Any thread trying to keep a rule from being freed should hold its own + * reference. */ + struct ovs_refcount ref_cnt; + + bool dead; + uint32_t mark; /* Unique flow mark assigned to a flow */ + + /* Statistics. */ + struct dp_netdev_flow_stats stats; + + /* Statistics and attributes received from the netdev offload provider. */ + atomic_int netdev_flow_get_result; + struct dp_netdev_flow_stats last_stats; + struct dp_netdev_flow_attrs last_attrs; + + /* Actions. */ + OVSRCU_TYPE(struct dp_netdev_actions *) actions; + + /* While processing a group of input packets, the datapath uses the next + * member to store a pointer to the output batch for the flow. It is + * reset after the batch has been sent out (See dp_netdev_queue_batches(), + * packet_batch_per_flow_init() and packet_batch_per_flow_execute()). */ + struct packet_batch_per_flow *batch; + + /* Packet classification. */ + char *dp_extra_info; /* String to return in a flow dump/get. */ + struct dpcls_rule cr; /* In owning dp_netdev's 'cls'. */ + /* 'cr' must be the last member. */ +}; + +static inline uint32_t +dp_netdev_flow_hash(const ovs_u128 *ufid) +{ + return ufid->u32[0]; +} + +/* Given the number of bits set in miniflow's maps, returns the size of the + * 'netdev_flow_key.mf' */ +static inline size_t +netdev_flow_key_size(size_t flow_u64s) +{ + return sizeof(struct miniflow) + MINIFLOW_VALUES_SIZE(flow_u64s); +} + +/* forward declaration required for EMC to unref flows */ +void dp_netdev_flow_unref(struct dp_netdev_flow *); + +/* A set of datapath actions within a "struct dp_netdev_flow". + * + * + * Thread-safety + * ============= + * + * A struct dp_netdev_actions 'actions' is protected with RCU. */ +struct dp_netdev_actions { + /* These members are immutable: they do not change during the struct's + * lifetime. */ + unsigned int size; /* Size of 'actions', in bytes. */ + struct nlattr actions[]; /* Sequence of OVS_ACTION_ATTR_* attributes. */ +}; + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-flow.h */ diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h new file mode 100644 index 000000000..a5b3ae360 --- /dev/null +++ b/lib/dpif-netdev-private-thread.h @@ -0,0 +1,206 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_THREAD_H +#define DPIF_NETDEV_PRIVATE_THREAD_H 1 + +#include +#include + +#include "dpif.h" +#include "cmap.h" + +#include "dpif-netdev-private-dfc.h" +#include "dpif-netdev-perf.h" +#include "openvswitch/thread.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* PMD Thread Structures */ + +/* A set of properties for the current processing loop that is not directly + * associated with the pmd thread itself, but with the packets being + * processed or the short-term system configuration (for example, time). + * Contained by struct dp_netdev_pmd_thread's 'ctx' member. */ +struct dp_netdev_pmd_thread_ctx { + /* Latest measured time. See 'pmd_thread_ctx_time_update()'. */ + long long now; + /* RX queue from which last packet was received. */ + struct dp_netdev_rxq *last_rxq; + /* EMC insertion probability context for the current processing cycle. */ + uint32_t emc_insert_min; +}; + +/* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate + * the performance overhead of interrupt processing. Therefore netdev can + * not implement rx-wait for these devices. dpif-netdev needs to poll + * these device to check for recv buffer. pmd-thread does polling for + * devices assigned to itself. + * + * DPDK used PMD for accessing NIC. + * + * Note, instance with cpu core id NON_PMD_CORE_ID will be reserved for + * I/O of all non-pmd threads. There will be no actual thread created + * for the instance. + * + * Each struct has its own flow cache and classifier per managed ingress port. + * For packets received on ingress port, a look up is done on corresponding PMD + * thread's flow cache and in case of a miss, lookup is performed in the + * corresponding classifier of port. Packets are executed with the found + * actions in either case. + * */ +struct dp_netdev_pmd_thread { + struct dp_netdev *dp; + struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */ + struct cmap_node node; /* In 'dp->poll_threads'. */ + + /* Per thread exact-match cache. Note, the instance for cpu core + * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly + * need to be protected by 'non_pmd_mutex'. Every other instance + * will only be accessed by its own pmd thread. */ + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct dfc_cache flow_cache; + + /* Flow-Table and classifiers + * + * Writers of 'flow_table' must take the 'flow_mutex'. Corresponding + * changes to 'classifiers' must be made while still holding the + * 'flow_mutex'. + */ + struct ovs_mutex flow_mutex; + struct cmap flow_table OVS_GUARDED; /* Flow table. */ + + /* One classifier per in_port polled by the pmd */ + struct cmap classifiers; + /* Periodically sort subtable vectors according to hit frequencies */ + long long int next_optimization; + /* End of the next time interval for which processing cycles + are stored for each polled rxq. */ + long long int rxq_next_cycle_store; + + /* Last interval timestamp. */ + uint64_t intrvl_tsc_prev; + /* Last interval cycles. */ + atomic_ullong intrvl_cycles; + + /* Current context of the PMD thread. */ + struct dp_netdev_pmd_thread_ctx ctx; + + struct seq *reload_seq; + uint64_t last_reload_seq; + + /* These are atomic variables used as a synchronization and configuration + * points for thread reload/exit. + * + * 'reload' atomic is the main one and it's used as a memory + * synchronization point for all other knobs and data. + * + * For a thread that requests PMD reload: + * + * * All changes that should be visible to the PMD thread must be made + * before setting the 'reload'. These changes could use any memory + * ordering model including 'relaxed'. + * * Setting the 'reload' atomic should occur in the same thread where + * all other PMD configuration options updated. + * * Setting the 'reload' atomic should be done with 'release' memory + * ordering model or stricter. This will guarantee that all previous + * changes (including non-atomic and 'relaxed') will be visible to + * the PMD thread. + * * To check that reload is done, thread should poll the 'reload' atomic + * to become 'false'. Polling should be done with 'acquire' memory + * ordering model or stricter. This ensures that PMD thread completed + * the reload process. + * + * For the PMD thread: + * + * * PMD thread should read 'reload' atomic with 'acquire' memory + * ordering model or stricter. This will guarantee that all changes + * made before setting the 'reload' in the requesting thread will be + * visible to the PMD thread. + * * All other configuration data could be read with any memory + * ordering model (including non-atomic and 'relaxed') but *only after* + * reading the 'reload' atomic set to 'true'. + * * When the PMD reload done, PMD should (optionally) set all the below + * knobs except the 'reload' to their default ('false') values and + * (mandatory), as the last step, set the 'reload' to 'false' using + * 'release' memory ordering model or stricter. This will inform the + * requesting thread that PMD has completed a reload cycle. + */ + atomic_bool reload; /* Do we need to reload ports? */ + atomic_bool wait_for_reload; /* Can we busy wait for the next reload? */ + atomic_bool reload_tx_qid; /* Do we need to reload static_tx_qid? */ + atomic_bool exit; /* For terminating the pmd thread. */ + + pthread_t thread; + unsigned core_id; /* CPU core id of this pmd thread. */ + int numa_id; /* numa node id of this pmd thread. */ + bool isolated; + + /* Queue id used by this pmd thread to send packets on all netdevs if + * XPS disabled for this netdev. All static_tx_qid's are unique and less + * than 'cmap_count(dp->poll_threads)'. */ + uint32_t static_tx_qid; + + /* Number of filled output batches. */ + int n_output_batches; + + struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and 'tx_ports'. */ + /* List of rx queues to poll. */ + struct hmap poll_list OVS_GUARDED; + /* Map of 'tx_port's used for transmission. Written by the main thread, + * read by the pmd thread. */ + struct hmap tx_ports OVS_GUARDED; + + struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */ + /* Map of 'tx_bond's used for transmission. Written by the main thread + * and read by the pmd thread. */ + struct cmap tx_bonds; + + /* These are thread-local copies of 'tx_ports'. One contains only tunnel + * ports (that support push_tunnel/pop_tunnel), the other contains ports + * with at least one txq (that support send). A port can be in both. + * + * There are two separate maps to make sure that we don't try to execute + * OUTPUT on a device which has 0 txqs or PUSH/POP on a non-tunnel device. + * + * The instances for cpu core NON_PMD_CORE_ID can be accessed by multiple + * threads, and thusly need to be protected by 'non_pmd_mutex'. Every + * other instance will only be accessed by its own pmd thread. */ + struct hmap tnl_port_cache; + struct hmap send_port_cache; + + /* Keep track of detailed PMD performance statistics. */ + struct pmd_perf_stats perf_stats; + + /* Stats from previous iteration used by automatic pmd + * load balance logic. */ + uint64_t prev_stats[PMD_N_STATS]; + atomic_count pmd_overloaded; + + /* Set to true if the pmd thread needs to be reloaded. */ + bool need_reload; + + /* Next time when PMD should try RCU quiescing. */ + long long next_rcu_quiesce; +}; + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-thread.h */ diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h index 4fda1220b..d7b6fd7ec 100644 --- a/lib/dpif-netdev-private.h +++ b/lib/dpif-netdev-private.h @@ -18,95 +18,17 @@ #ifndef DPIF_NETDEV_PRIVATE_H #define DPIF_NETDEV_PRIVATE_H 1 -#include -#include - -#include "dpif.h" -#include "cmap.h" - -#ifdef __cplusplus -extern "C" { -#endif - -/* Forward declaration for lookup_func typedef. */ -struct dpcls_subtable; -struct dpcls_rule; - -/* Must be public as it is instantiated in subtable struct below. */ -struct netdev_flow_key { - uint32_t hash; /* Hash function differs for different users. */ - uint32_t len; /* Length of the following miniflow (incl. map). */ - struct miniflow mf; - uint64_t buf[FLOW_MAX_PACKET_U64S]; -}; - -/* A rule to be inserted to the classifier. */ -struct dpcls_rule { - struct cmap_node cmap_node; /* Within struct dpcls_subtable 'rules'. */ - struct netdev_flow_key *mask; /* Subtable's mask. */ - struct netdev_flow_key flow; /* Matching key. */ - /* 'flow' must be the last field, additional space is allocated here. */ -}; - -/* Lookup function for a subtable in the dpcls. This function is called - * by each subtable with an array of packets, and a bitmask of packets to - * perform the lookup on. Using a function pointer gives flexibility to - * optimize the lookup function based on subtable properties and the - * CPU instruction set available at runtime. +/* This header includes the various dpif-netdev components' header + * files in the appropriate order. Unfortunately there is a strict + * requirement in the include order due to dependences between components. + * E.g: + * DFC/EMC/SMC requires the netdev_flow_key struct + * PMD thread requires DFC_flow struct + * */ -typedef -uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable *subtable, - uint32_t keys_map, - const struct netdev_flow_key *keys[], - struct dpcls_rule **rules); - -/* A set of rules that all have the same fields wildcarded. */ -struct dpcls_subtable { - /* The fields are only used by writers. */ - struct cmap_node cmap_node OVS_GUARDED; /* Within dpcls 'subtables_map'. */ - - /* These fields are accessed by readers. */ - struct cmap rules; /* Contains "struct dpcls_rule"s. */ - uint32_t hit_cnt; /* Number of match hits in subtable in current - optimization interval. */ - - /* Miniflow fingerprint that the subtable matches on. The miniflow "bits" - * are used to select the actual dpcls lookup implementation at subtable - * creation time. - */ - uint8_t mf_bits_set_unit0; - uint8_t mf_bits_set_unit1; - - /* The lookup function to use for this subtable. If there is a known - * property of the subtable (eg: only 3 bits of miniflow metadata is - * used for the lookup) then this can point at an optimized version of - * the lookup function for this particular subtable. */ - dpcls_subtable_lookup_func lookup_func; - - /* Caches the masks to match a packet to, reducing runtime calculations. */ - uint64_t *mf_masks; - - struct netdev_flow_key mask; /* Wildcards for fields (const). */ - /* 'mask' must be the last field, additional space is allocated here. */ -}; - -/* Iterate through netdev_flow_key TNL u64 values specified by 'FLOWMAP'. */ -#define NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP(VALUE, KEY, FLOWMAP) \ - MINIFLOW_FOR_EACH_IN_FLOWMAP (VALUE, &(KEY)->mf, FLOWMAP) - -/* Generates a mask for each bit set in the subtable's miniflow. */ -void -netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, - uint64_t *mf_masks, - const uint32_t mf_bits_u0, - const uint32_t mf_bits_u1); - -/* Matches a dpcls rule against the incoming packet in 'target' */ -bool dpcls_rule_matches_key(const struct dpcls_rule *rule, - const struct netdev_flow_key *target); - -#ifdef __cplusplus -} -#endif +#include "dpif-netdev-private-flow.h" +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-dfc.h" +#include "dpif-netdev-private-thread.h" #endif /* netdev-private.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e3fd0a07f..395a5c29d 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -17,6 +17,7 @@ #include #include "dpif-netdev.h" #include "dpif-netdev-private.h" +#include "dpif-netdev-private-dfc.h" #include #include @@ -44,6 +45,7 @@ #include "dpif.h" #include "dpif-netdev-lookup.h" #include "dpif-netdev-perf.h" +#include "dpif-netdev-private-dfc.h" #include "dpif-provider.h" #include "dummy.h" #include "fat-rwlock.h" @@ -142,90 +144,6 @@ static struct odp_support dp_netdev_support = { .ct_orig_tuple6 = true, }; -/* EMC cache and SMC cache compose the datapath flow cache (DFC) - * - * Exact match cache for frequently used flows - * - * The cache uses a 32-bit hash of the packet (which can be the RSS hash) to - * search its entries for a miniflow that matches exactly the miniflow of the - * packet. It stores the 'dpcls_rule' (rule) that matches the miniflow. - * - * A cache entry holds a reference to its 'dp_netdev_flow'. - * - * A miniflow with a given hash can be in one of EM_FLOW_HASH_SEGS different - * entries. The 32-bit hash is split into EM_FLOW_HASH_SEGS values (each of - * them is EM_FLOW_HASH_SHIFT bits wide and the remainder is thrown away). Each - * value is the index of a cache entry where the miniflow could be. - * - * - * Signature match cache (SMC) - * - * This cache stores a 16-bit signature for each flow without storing keys, and - * stores the corresponding 16-bit flow_table index to the 'dp_netdev_flow'. - * Each flow thus occupies 32bit which is much more memory efficient than EMC. - * SMC uses a set-associative design that each bucket contains - * SMC_ENTRY_PER_BUCKET number of entries. - * Since 16-bit flow_table index is used, if there are more than 2^16 - * dp_netdev_flow, SMC will miss them that cannot be indexed by a 16-bit value. - * - * - * Thread-safety - * ============= - * - * Each pmd_thread has its own private exact match cache. - * If dp_netdev_input is not called from a pmd thread, a mutex is used. - */ - -#define EM_FLOW_HASH_SHIFT 13 -#define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT) -#define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) -#define EM_FLOW_HASH_SEGS 2 - -/* SMC uses a set-associative design. A bucket contains a set of entries that - * a flow item can occupy. For now, it uses one hash function rather than two - * as for the EMC design. */ -#define SMC_ENTRY_PER_BUCKET 4 -#define SMC_ENTRIES (1u << 20) -#define SMC_BUCKET_CNT (SMC_ENTRIES / SMC_ENTRY_PER_BUCKET) -#define SMC_MASK (SMC_BUCKET_CNT - 1) - -/* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ -#define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ - DEFAULT_EM_FLOW_INSERT_INV_PROB) - -struct emc_entry { - struct dp_netdev_flow *flow; - struct netdev_flow_key key; /* key.hash used for emc hash value. */ -}; - -struct emc_cache { - struct emc_entry entries[EM_FLOW_HASH_ENTRIES]; - int sweep_idx; /* For emc_cache_slow_sweep(). */ -}; - -struct smc_bucket { - uint16_t sig[SMC_ENTRY_PER_BUCKET]; - uint16_t flow_idx[SMC_ENTRY_PER_BUCKET]; -}; - -/* Signature match cache, differentiate from EMC cache */ -struct smc_cache { - struct smc_bucket buckets[SMC_BUCKET_CNT]; -}; - -struct dfc_cache { - struct emc_cache emc_cache; - struct smc_cache smc_cache; -}; - -/* Iterate in the exact match cache through every entry that might contain a - * miniflow with hash 'HASH'. */ -#define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH) \ - for (uint32_t i__ = 0, srch_hash__ = (HASH); \ - (CURRENT_ENTRY) = &(EMC)->entries[srch_hash__ & EM_FLOW_HASH_MASK], \ - i__ < EM_FLOW_HASH_SEGS; \ - i__++, srch_hash__ >>= EM_FLOW_HASH_SHIFT) /* Simple non-wildcarding single-priority classifier. */ @@ -486,119 +404,10 @@ struct dp_netdev_port { char *rxq_affinity_list; /* Requested affinity of rx queues. */ }; -/* Contained by struct dp_netdev_flow's 'stats' member. */ -struct dp_netdev_flow_stats { - atomic_llong used; /* Last used time, in monotonic msecs. */ - atomic_ullong packet_count; /* Number of packets matched. */ - atomic_ullong byte_count; /* Number of bytes matched. */ - atomic_uint16_t tcp_flags; /* Bitwise-OR of seen tcp_flags values. */ -}; - -/* Contained by struct dp_netdev_flow's 'last_attrs' member. */ -struct dp_netdev_flow_attrs { - atomic_bool offloaded; /* True if flow is offloaded to HW. */ - ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */ -}; - -/* A flow in 'dp_netdev_pmd_thread's 'flow_table'. - * - * - * Thread-safety - * ============= - * - * Except near the beginning or ending of its lifespan, rule 'rule' belongs to - * its pmd thread's classifier. The text below calls this classifier 'cls'. - * - * Motivation - * ---------- - * - * The thread safety rules described here for "struct dp_netdev_flow" are - * motivated by two goals: - * - * - Prevent threads that read members of "struct dp_netdev_flow" from - * reading bad data due to changes by some thread concurrently modifying - * those members. - * - * - Prevent two threads making changes to members of a given "struct - * dp_netdev_flow" from interfering with each other. - * - * - * Rules - * ----- - * - * A flow 'flow' may be accessed without a risk of being freed during an RCU - * grace period. Code that needs to hold onto a flow for a while - * should try incrementing 'flow->ref_cnt' with dp_netdev_flow_ref(). - * - * 'flow->ref_cnt' protects 'flow' from being freed. It doesn't protect the - * flow from being deleted from 'cls' and it doesn't protect members of 'flow' - * from modification. - * - * Some members, marked 'const', are immutable. Accessing other members - * requires synchronization, as noted in more detail below. - */ -struct dp_netdev_flow { - const struct flow flow; /* Unmasked flow that created this entry. */ - /* Hash table index by unmasked flow. */ - const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */ - /* 'flow_table'. */ - const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */ - const ovs_u128 ufid; /* Unique flow identifier. */ - const ovs_u128 mega_ufid; /* Unique mega flow identifier. */ - const unsigned pmd_id; /* The 'core_id' of pmd thread owning this */ - /* flow. */ - - /* Number of references. - * The classifier owns one reference. - * Any thread trying to keep a rule from being freed should hold its own - * reference. */ - struct ovs_refcount ref_cnt; - - bool dead; - uint32_t mark; /* Unique flow mark assigned to a flow */ - - /* Statistics. */ - struct dp_netdev_flow_stats stats; - - /* Statistics and attributes received from the netdev offload provider. */ - atomic_int netdev_flow_get_result; - struct dp_netdev_flow_stats last_stats; - struct dp_netdev_flow_attrs last_attrs; - - /* Actions. */ - OVSRCU_TYPE(struct dp_netdev_actions *) actions; - - /* While processing a group of input packets, the datapath uses the next - * member to store a pointer to the output batch for the flow. It is - * reset after the batch has been sent out (See dp_netdev_queue_batches(), - * packet_batch_per_flow_init() and packet_batch_per_flow_execute()). */ - struct packet_batch_per_flow *batch; - - /* Packet classification. */ - char *dp_extra_info; /* String to return in a flow dump/get. */ - struct dpcls_rule cr; /* In owning dp_netdev's 'cls'. */ - /* 'cr' must be the last member. */ -}; - -static void dp_netdev_flow_unref(struct dp_netdev_flow *); static bool dp_netdev_flow_ref(struct dp_netdev_flow *); static int dpif_netdev_flow_from_nlattrs(const struct nlattr *, uint32_t, struct flow *, bool); -/* A set of datapath actions within a "struct dp_netdev_flow". - * - * - * Thread-safety - * ============= - * - * A struct dp_netdev_actions 'actions' is protected with RCU. */ -struct dp_netdev_actions { - /* These members are immutable: they do not change during the struct's - * lifetime. */ - unsigned int size; /* Size of 'actions', in bytes. */ - struct nlattr actions[]; /* Sequence of OVS_ACTION_ATTR_* attributes. */ -}; - struct dp_netdev_actions *dp_netdev_actions_create(const struct nlattr *, size_t); struct dp_netdev_actions *dp_netdev_flow_get_actions( @@ -645,171 +454,6 @@ struct tx_bond { struct member_entry member_buckets[BOND_BUCKETS]; }; -/* A set of properties for the current processing loop that is not directly - * associated with the pmd thread itself, but with the packets being - * processed or the short-term system configuration (for example, time). - * Contained by struct dp_netdev_pmd_thread's 'ctx' member. */ -struct dp_netdev_pmd_thread_ctx { - /* Latest measured time. See 'pmd_thread_ctx_time_update()'. */ - long long now; - /* RX queue from which last packet was received. */ - struct dp_netdev_rxq *last_rxq; - /* EMC insertion probability context for the current processing cycle. */ - uint32_t emc_insert_min; -}; - -/* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate - * the performance overhead of interrupt processing. Therefore netdev can - * not implement rx-wait for these devices. dpif-netdev needs to poll - * these device to check for recv buffer. pmd-thread does polling for - * devices assigned to itself. - * - * DPDK used PMD for accessing NIC. - * - * Note, instance with cpu core id NON_PMD_CORE_ID will be reserved for - * I/O of all non-pmd threads. There will be no actual thread created - * for the instance. - * - * Each struct has its own flow cache and classifier per managed ingress port. - * For packets received on ingress port, a look up is done on corresponding PMD - * thread's flow cache and in case of a miss, lookup is performed in the - * corresponding classifier of port. Packets are executed with the found - * actions in either case. - * */ -struct dp_netdev_pmd_thread { - struct dp_netdev *dp; - struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */ - struct cmap_node node; /* In 'dp->poll_threads'. */ - - /* Per thread exact-match cache. Note, the instance for cpu core - * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly - * need to be protected by 'non_pmd_mutex'. Every other instance - * will only be accessed by its own pmd thread. */ - OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct dfc_cache flow_cache; - - /* Flow-Table and classifiers - * - * Writers of 'flow_table' must take the 'flow_mutex'. Corresponding - * changes to 'classifiers' must be made while still holding the - * 'flow_mutex'. - */ - struct ovs_mutex flow_mutex; - struct cmap flow_table OVS_GUARDED; /* Flow table. */ - - /* One classifier per in_port polled by the pmd */ - struct cmap classifiers; - /* Periodically sort subtable vectors according to hit frequencies */ - long long int next_optimization; - /* End of the next time interval for which processing cycles - are stored for each polled rxq. */ - long long int rxq_next_cycle_store; - - /* Last interval timestamp. */ - uint64_t intrvl_tsc_prev; - /* Last interval cycles. */ - atomic_ullong intrvl_cycles; - - /* Current context of the PMD thread. */ - struct dp_netdev_pmd_thread_ctx ctx; - - struct seq *reload_seq; - uint64_t last_reload_seq; - - /* These are atomic variables used as a synchronization and configuration - * points for thread reload/exit. - * - * 'reload' atomic is the main one and it's used as a memory - * synchronization point for all other knobs and data. - * - * For a thread that requests PMD reload: - * - * * All changes that should be visible to the PMD thread must be made - * before setting the 'reload'. These changes could use any memory - * ordering model including 'relaxed'. - * * Setting the 'reload' atomic should occur in the same thread where - * all other PMD configuration options updated. - * * Setting the 'reload' atomic should be done with 'release' memory - * ordering model or stricter. This will guarantee that all previous - * changes (including non-atomic and 'relaxed') will be visible to - * the PMD thread. - * * To check that reload is done, thread should poll the 'reload' atomic - * to become 'false'. Polling should be done with 'acquire' memory - * ordering model or stricter. This ensures that PMD thread completed - * the reload process. - * - * For the PMD thread: - * - * * PMD thread should read 'reload' atomic with 'acquire' memory - * ordering model or stricter. This will guarantee that all changes - * made before setting the 'reload' in the requesting thread will be - * visible to the PMD thread. - * * All other configuration data could be read with any memory - * ordering model (including non-atomic and 'relaxed') but *only after* - * reading the 'reload' atomic set to 'true'. - * * When the PMD reload done, PMD should (optionally) set all the below - * knobs except the 'reload' to their default ('false') values and - * (mandatory), as the last step, set the 'reload' to 'false' using - * 'release' memory ordering model or stricter. This will inform the - * requesting thread that PMD has completed a reload cycle. - */ - atomic_bool reload; /* Do we need to reload ports? */ - atomic_bool wait_for_reload; /* Can we busy wait for the next reload? */ - atomic_bool reload_tx_qid; /* Do we need to reload static_tx_qid? */ - atomic_bool exit; /* For terminating the pmd thread. */ - - pthread_t thread; - unsigned core_id; /* CPU core id of this pmd thread. */ - int numa_id; /* numa node id of this pmd thread. */ - bool isolated; - - /* Queue id used by this pmd thread to send packets on all netdevs if - * XPS disabled for this netdev. All static_tx_qid's are unique and less - * than 'cmap_count(dp->poll_threads)'. */ - uint32_t static_tx_qid; - - /* Number of filled output batches. */ - int n_output_batches; - - struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and 'tx_ports'. */ - /* List of rx queues to poll. */ - struct hmap poll_list OVS_GUARDED; - /* Map of 'tx_port's used for transmission. Written by the main thread, - * read by the pmd thread. */ - struct hmap tx_ports OVS_GUARDED; - - struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */ - /* Map of 'tx_bond's used for transmission. Written by the main thread - * and read by the pmd thread. */ - struct cmap tx_bonds; - - /* These are thread-local copies of 'tx_ports'. One contains only tunnel - * ports (that support push_tunnel/pop_tunnel), the other contains ports - * with at least one txq (that support send). A port can be in both. - * - * There are two separate maps to make sure that we don't try to execute - * OUTPUT on a device which has 0 txqs or PUSH/POP on a non-tunnel device. - * - * The instances for cpu core NON_PMD_CORE_ID can be accessed by multiple - * threads, and thusly need to be protected by 'non_pmd_mutex'. Every - * other instance will only be accessed by its own pmd thread. */ - struct hmap tnl_port_cache; - struct hmap send_port_cache; - - /* Keep track of detailed PMD performance statistics. */ - struct pmd_perf_stats perf_stats; - - /* Stats from previous iteration used by automatic pmd - * load balance logic. */ - uint64_t prev_stats[PMD_N_STATS]; - atomic_count pmd_overloaded; - - /* Set to true if the pmd thread needs to be reloaded. */ - bool need_reload; - - /* Next time when PMD should try RCU quiescing. */ - long long next_rcu_quiesce; -}; - /* Interface to netdev-based datapath. */ struct dpif_netdev { struct dpif dpif; @@ -914,90 +558,12 @@ static inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port); -static inline bool emc_entry_alive(struct emc_entry *ce); -static void emc_clear_entry(struct emc_entry *ce); -static void smc_clear_entry(struct smc_bucket *b, int idx); - static void dp_netdev_request_reconfigure(struct dp_netdev *dp); static inline bool pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd); static void queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow); -static void -emc_cache_init(struct emc_cache *flow_cache) -{ - int i; - - flow_cache->sweep_idx = 0; - for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { - flow_cache->entries[i].flow = NULL; - flow_cache->entries[i].key.hash = 0; - flow_cache->entries[i].key.len = sizeof(struct miniflow); - flowmap_init(&flow_cache->entries[i].key.mf.map); - } -} - -static void -smc_cache_init(struct smc_cache *smc_cache) -{ - int i, j; - for (i = 0; i < SMC_BUCKET_CNT; i++) { - for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { - smc_cache->buckets[i].flow_idx[j] = UINT16_MAX; - } - } -} - -static void -dfc_cache_init(struct dfc_cache *flow_cache) -{ - emc_cache_init(&flow_cache->emc_cache); - smc_cache_init(&flow_cache->smc_cache); -} - -static void -emc_cache_uninit(struct emc_cache *flow_cache) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { - emc_clear_entry(&flow_cache->entries[i]); - } -} - -static void -smc_cache_uninit(struct smc_cache *smc) -{ - int i, j; - - for (i = 0; i < SMC_BUCKET_CNT; i++) { - for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { - smc_clear_entry(&(smc->buckets[i]), j); - } - } -} - -static void -dfc_cache_uninit(struct dfc_cache *flow_cache) -{ - smc_cache_uninit(&flow_cache->smc_cache); - emc_cache_uninit(&flow_cache->emc_cache); -} - -/* Check and clear dead flow references slowly (one entry at each - * invocation). */ -static void -emc_cache_slow_sweep(struct emc_cache *flow_cache) -{ - struct emc_entry *entry = &flow_cache->entries[flow_cache->sweep_idx]; - - if (!emc_entry_alive(entry)) { - emc_clear_entry(entry); - } - flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) & EM_FLOW_HASH_MASK; -} - /* Updates the time in PMD threads context and should be called in three cases: * * 1. PMD structure initialization: @@ -2346,19 +1912,13 @@ dp_netdev_flow_free(struct dp_netdev_flow *flow) free(flow); } -static void dp_netdev_flow_unref(struct dp_netdev_flow *flow) +void dp_netdev_flow_unref(struct dp_netdev_flow *flow) { if (ovs_refcount_unref_relaxed(&flow->ref_cnt) == 1) { ovsrcu_postpone(dp_netdev_flow_free, flow); } } -static uint32_t -dp_netdev_flow_hash(const ovs_u128 *ufid) -{ - return ufid->u32[0]; -} - static inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port) @@ -2975,14 +2535,6 @@ static bool dp_netdev_flow_ref(struct dp_netdev_flow *flow) * single memcmp(). * - These functions can be inlined by the compiler. */ -/* Given the number of bits set in miniflow's maps, returns the size of the - * 'netdev_flow_key.mf' */ -static inline size_t -netdev_flow_key_size(size_t flow_u64s) -{ - return sizeof(struct miniflow) + MINIFLOW_VALUES_SIZE(flow_u64s); -} - static inline bool netdev_flow_key_equal(const struct netdev_flow_key *a, const struct netdev_flow_key *b) @@ -2991,16 +2543,6 @@ netdev_flow_key_equal(const struct netdev_flow_key *a, return a->hash == b->hash && !memcmp(&a->mf, &b->mf, a->len); } -/* Used to compare 'netdev_flow_key' in the exact match cache to a miniflow. - * The maps are compared bitwise, so both 'key->mf' and 'mf' must have been - * generated by miniflow_extract. */ -static inline bool -netdev_flow_key_equal_mf(const struct netdev_flow_key *key, - const struct miniflow *mf) -{ - return !memcmp(&key->mf, mf, key->len); -} - static inline void netdev_flow_key_clone(struct netdev_flow_key *dst, const struct netdev_flow_key *src) @@ -3067,21 +2609,6 @@ netdev_flow_key_init_masked(struct netdev_flow_key *dst, (dst_u64 - miniflow_get_values(&dst->mf)) * 8); } -static inline bool -emc_entry_alive(struct emc_entry *ce) -{ - return ce->flow && !ce->flow->dead; -} - -static void -emc_clear_entry(struct emc_entry *ce) -{ - if (ce->flow) { - dp_netdev_flow_unref(ce->flow); - ce->flow = NULL; - } -} - static inline void emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow, const struct netdev_flow_key *key) @@ -3147,24 +2674,6 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, } } -static inline struct dp_netdev_flow * -emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) -{ - struct emc_entry *current_entry; - - EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) { - if (current_entry->key.hash == key->hash - && emc_entry_alive(current_entry) - && netdev_flow_key_equal_mf(¤t_entry->key, &key->mf)) { - - /* We found the entry with the 'key->mf' miniflow */ - return current_entry->flow; - } - } - - return NULL; -} - static inline const struct cmap_node * smc_entry_get(struct dp_netdev_pmd_thread *pmd, const uint32_t hash) { @@ -3185,12 +2694,6 @@ smc_entry_get(struct dp_netdev_pmd_thread *pmd, const uint32_t hash) return NULL; } -static void -smc_clear_entry(struct smc_bucket *b, int idx) -{ - b->flow_idx[idx] = UINT16_MAX; -} - /* Insert the flow_table index into SMC. Insertion may fail when 1) SMC is * turned off, 2) the flow_table index is larger than uint16_t can handle. * If there is already an SMC entry having same signature, the index will be @@ -6847,22 +6350,6 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, actions, wc, put_actions, dp->upcall_aux); } -static inline uint32_t -dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet, - const struct miniflow *mf) -{ - uint32_t hash; - - if (OVS_LIKELY(dp_packet_rss_valid(packet))) { - hash = dp_packet_get_rss_hash(packet); - } else { - hash = miniflow_hash_5tuple(mf, 0); - dp_packet_set_rss_hash(packet, hash); - } - - return hash; -} - static inline uint32_t dpif_netdev_packet_get_rss_hash(struct dp_packet *packet, const struct miniflow *mf) From patchwork Fri Feb 12 17:17:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439945 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgDR3Knlz9sVR for ; Sat, 13 Feb 2021 04:17:42 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 0FD7F87626; Fri, 12 Feb 2021 17:17:41 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 05FqeL0Dwd1N; Fri, 12 Feb 2021 17:17:40 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 1CFE787617; Fri, 12 Feb 2021 17:17:40 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 11485C0891; Fri, 12 Feb 2021 17:17:40 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 56977C0891 for ; Fri, 12 Feb 2021 17:17:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3A6B36F785 for ; Fri, 12 Feb 2021 17:17:38 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5YoiRFZuFtqx for ; Fri, 12 Feb 2021 17:17:36 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 8571B6F74F; Fri, 12 Feb 2021 17:17:36 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 9EA506F74F for ; Fri, 12 Feb 2021 17:17:34 +0000 (UTC) IronPort-SDR: krQX6ZAxgTqigyWjo/veDDem67a3LndjCIUPl+i1jTSHMaa4TJ/RPyjS+yPCItGF+0JFUz3NZB j2C35VU3jD6w== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595211" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595211" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:34 -0800 IronPort-SDR: AXdwbh2RyHOEAoNM4wWlo93yeu4AvLbEkRJxw4Mk3OBR+cl53C09rOUOir2JT/YzVw5ew6oWRu 9CB18K36Ocug== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360484869" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:32 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:04 +0000 Message-Id: <20210212171718.2189798-3-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 02/16] dpif-netdev: Split HWOL out to own header file. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit moves the datapath lookup functions required for hardware offload to a seperate file. This allows other DPIF implementations to access the lookup functions, encouraging code reuse. Signed-off-by: Harry van Haaren --- lib/automake.mk | 1 + lib/dpif-netdev-private-hwol.h | 63 ++++++++++++++++++++++++++++++++++ lib/dpif-netdev.c | 39 ++------------------- 3 files changed, 67 insertions(+), 36 deletions(-) create mode 100644 lib/dpif-netdev-private-hwol.h diff --git a/lib/automake.mk b/lib/automake.mk index 0e83145b5..9b3e06db6 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -114,6 +114,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-private-dfc.h \ lib/dpif-netdev-private-dpcls.h \ lib/dpif-netdev-private-flow.h \ + lib/dpif-netdev-private-hwol.h \ lib/dpif-netdev-private-thread.h \ lib/dpif-netdev-private.h \ lib/dpif-netdev-perf.c \ diff --git a/lib/dpif-netdev-private-hwol.h b/lib/dpif-netdev-private-hwol.h new file mode 100644 index 000000000..447010ab8 --- /dev/null +++ b/lib/dpif-netdev-private-hwol.h @@ -0,0 +1,63 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2020 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_HWOL_H +#define DPIF_NETDEV_PRIVATE_HWOL_H 1 + +#include "dpif-netdev-private-flow.h" + +#define MAX_FLOW_MARK (UINT32_MAX - 1) +#define INVALID_FLOW_MARK 0 +/* Zero flow mark is used to indicate the HW to remove the mark. A packet + * marked with zero mark is received in SW without a mark at all, so it + * cannot be used as a valid mark. + */ + +struct megaflow_to_mark_data { + const struct cmap_node node; + ovs_u128 mega_ufid; + uint32_t mark; +}; + +struct flow_mark { + struct cmap megaflow_to_mark; + struct cmap mark_to_flow; + struct id_pool *pool; +}; + +/* allocated in dpif-netdev.c */ +extern struct flow_mark flow_mark; + +static inline struct dp_netdev_flow * +mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, + const uint32_t mark) +{ + struct dp_netdev_flow *flow; + + CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0), + &flow_mark.mark_to_flow) { + if (flow->mark == mark && flow->pmd_id == pmd->core_id && + flow->dead == false) { + return flow; + } + } + + return NULL; +} + + +#endif /* dpif-netdev-private-hwol.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 395a5c29d..840298f01 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -84,6 +84,8 @@ #include "util.h" #include "uuid.h" +#include "dpif-netdev-private-hwol.h" + VLOG_DEFINE_THIS_MODULE(dpif_netdev); /* Auto Load Balancing Defaults */ @@ -1953,26 +1955,8 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread *pmd, return cls; } -#define MAX_FLOW_MARK (UINT32_MAX - 1) -#define INVALID_FLOW_MARK 0 -/* Zero flow mark is used to indicate the HW to remove the mark. A packet - * marked with zero mark is received in SW without a mark at all, so it - * cannot be used as a valid mark. - */ - -struct megaflow_to_mark_data { - const struct cmap_node node; - ovs_u128 mega_ufid; - uint32_t mark; -}; - -struct flow_mark { - struct cmap megaflow_to_mark; - struct cmap mark_to_flow; - struct id_pool *pool; -}; -static struct flow_mark flow_mark = { +struct flow_mark flow_mark = { .megaflow_to_mark = CMAP_INITIALIZER, .mark_to_flow = CMAP_INITIALIZER, }; @@ -2141,23 +2125,6 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd) } } -static struct dp_netdev_flow * -mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, - const uint32_t mark) -{ - struct dp_netdev_flow *flow; - - CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0), - &flow_mark.mark_to_flow) { - if (flow->mark == mark && flow->pmd_id == pmd->core_id && - flow->dead == false) { - return flow; - } - } - - return NULL; -} - static struct dp_flow_offload_item * dp_netdev_alloc_flow_offload(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow, From patchwork Fri Feb 12 17:17:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439946 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgDW4SZmz9sTD for ; Sat, 13 Feb 2021 04:17:47 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 0E9908746B; Fri, 12 Feb 2021 17:17:46 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TyGHd2lLuYAn; Fri, 12 Feb 2021 17:17:42 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id C66E68746E; Fri, 12 Feb 2021 17:17:41 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id BB5BFC1834; Fri, 12 Feb 2021 17:17:41 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 611F7C0891 for ; Fri, 12 Feb 2021 17:17:40 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 22B6C6F7A2 for ; Fri, 12 Feb 2021 17:17:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BmkyqG8Dfetb for ; Fri, 12 Feb 2021 17:17:38 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id B16706F785; Fri, 12 Feb 2021 17:17:38 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 211E66F75B for ; Fri, 12 Feb 2021 17:17:36 +0000 (UTC) IronPort-SDR: wS5a/q4YasCVXBjEbDOLUboQ3PCqZ/a48GQswwXw3H06Zh9vwLG/lYkdCth8RlU6sdf/MNm3sQ Dxuz4EZJuLww== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595213" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595213" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:35 -0800 IronPort-SDR: QeEMzVwQoAd1C8VpgtnO0QIk5ZHKo/R1XyPoOXsPOD1qudpDxuC5DAB7gBPqbV5ng95LJ15zTW 1jTOyRXK0wbw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360484892" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:34 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:05 +0000 Message-Id: <20210212171718.2189798-4-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 03/16] dpif-netdev: Add function pointer for netdev input. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit adds a function pointer to the pmd thread data structure, giving the pmd thread flexibility in its dpif-input function choice. This allows choosing of the implementation based on ISA capabilities of the runtime CPU, leading to optimizations and higher performance. Signed-off-by: Harry van Haaren --- lib/dpif-netdev-private-thread.h | 12 ++++++++++++ lib/dpif-netdev.c | 7 ++++++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h index a5b3ae360..089223aaf 100644 --- a/lib/dpif-netdev-private-thread.h +++ b/lib/dpif-netdev-private-thread.h @@ -47,6 +47,13 @@ struct dp_netdev_pmd_thread_ctx { uint32_t emc_insert_min; }; +/* Forward declaration for typedef */ +struct dp_netdev_pmd_thread; + +typedef void (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t port_no); + /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can * not implement rx-wait for these devices. dpif-netdev needs to poll @@ -101,6 +108,11 @@ struct dp_netdev_pmd_thread { /* Current context of the PMD thread. */ struct dp_netdev_pmd_thread_ctx ctx; + /* Function pointer to call for dp_netdev_input() functionality. */ + dp_netdev_input_func netdev_input_func; + /* Pointer for per-DPIF implementation scratch space. */ + void *netdev_input_func_userdata; + struct seq *reload_seq; uint64_t last_reload_seq; diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 840298f01..c0cf44852 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4220,8 +4220,9 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, } } } + /* Process packet batch. */ - dp_netdev_input(pmd, &batch, port_no); + pmd->netdev_input_func(pmd, &batch, port_no); /* Assign processing cycles to rx queue. */ cycles = cycle_timer_stop(&pmd->perf_stats, &timer); @@ -6005,6 +6006,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, hmap_init(&pmd->tnl_port_cache); hmap_init(&pmd->send_port_cache); cmap_init(&pmd->tx_bonds); + + /* Initialize the DPIF function pointer to the default scalar version */ + pmd->netdev_input_func = dp_netdev_input; + /* init the 'flow_cache' since there is no * actual thread created for NON_PMD_CORE_ID. */ if (core_id == NON_PMD_CORE_ID) { From patchwork Fri Feb 12 17:17:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439949 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgF14JH9z9sTD for ; Sat, 13 Feb 2021 04:18:13 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id EB4C887641; Fri, 12 Feb 2021 17:18:11 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5ovLcqp3ax09; Fri, 12 Feb 2021 17:17:58 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 3A58186854; Fri, 12 Feb 2021 17:17:54 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1E51CC1825; Fri, 12 Feb 2021 17:17:54 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 9B1C2C013A for ; Fri, 12 Feb 2021 17:17:52 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 816D66F7CA for ; Fri, 12 Feb 2021 17:17:52 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2QOk2jhk9thU for ; Fri, 12 Feb 2021 17:17:48 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 4668D6F8AA; Fri, 12 Feb 2021 17:17:48 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 3F1616F78A for ; Fri, 12 Feb 2021 17:17:38 +0000 (UTC) IronPort-SDR: 5zcuk/Gx5FNgAANX1j3A4sbSyxcuYABt5qd66Hh2oIKUj6Ag2HmHGeOxCXRTsDegKgz5hmVnmJ kVygdAeSjzhA== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595216" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595216" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:37 -0800 IronPort-SDR: 6s/Z2TUd1GTUZDY3FL9Ga81OjKfLbxytCZKKybdmqecmDhneNfelIQjqh/KUSmzpd0C8VhmTGY U90wgTx/fItQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360484926" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:36 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:06 +0000 Message-Id: <20210212171718.2189798-5-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 04/16] dpif-avx512: Add ISA implementation of dpif. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit adds the AVX512 implementation of DPIF functionality, specifically the dp_netdev_input_outer_avx512 function. This function only handles outer (no re-circulations), and is optimized to use the AVX512 ISA for packet batching and other DPIF work. Sparse is not able to handle the AVX512 intrinsics, causing compile time failures, so it is disabled for this file. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- v8: - Fixup AVX512 mask to uint32_t conversion compilation warning. --- lib/automake.mk | 5 +- lib/dpif-netdev-avx512.c | 264 +++++++++++++++++++++++++++++++ lib/dpif-netdev-private-dfc.h | 8 + lib/dpif-netdev-private-dpif.h | 32 ++++ lib/dpif-netdev-private-thread.h | 11 +- lib/dpif-netdev-private.h | 25 +++ lib/dpif-netdev.c | 70 ++++++-- 7 files changed, 399 insertions(+), 16 deletions(-) create mode 100644 lib/dpif-netdev-avx512.c create mode 100644 lib/dpif-netdev-private-dpif.h diff --git a/lib/automake.mk b/lib/automake.mk index 9b3e06db6..d945d935e 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \ -mavx512f \ -mavx512bw \ -mavx512dq \ + -mbmi \ -mbmi2 \ -fPIC \ $(AM_CFLAGS) lib_libopenvswitchavx512_la_SOURCES = \ - lib/dpif-netdev-lookup-avx512-gather.c + lib/dpif-netdev-lookup-avx512-gather.c \ + lib/dpif-netdev-avx512.c lib_libopenvswitchavx512_la_LDFLAGS = \ -static endif @@ -113,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev.h \ lib/dpif-netdev-private-dfc.h \ lib/dpif-netdev-private-dpcls.h \ + lib/dpif-netdev-private-dpif.h \ lib/dpif-netdev-private-flow.h \ lib/dpif-netdev-private-hwol.h \ lib/dpif-netdev-private-thread.h \ diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c new file mode 100644 index 000000000..10228aeb0 --- /dev/null +++ b/lib/dpif-netdev-avx512.c @@ -0,0 +1,264 @@ +/* + * Copyright (c) 2020 Intel. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifdef __x86_64__ +/* Sparse cannot handle the AVX512 instructions */ +#if !defined(__CHECKER__) + +#include + +#include "dpif-netdev.h" +#include "dpif-netdev-perf.h" + +#include "dpif-netdev-private.h" +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-flow.h" +#include "dpif-netdev-private-thread.h" + +#include "dp-packet.h" +#include "netdev.h" + +#include "immintrin.h" + +/* Structure to contain per-packet metadata that must be attributed to the + * dp netdev flow. This is unfortunate to have to track per packet, however + * it's a bit awkward to maintain them in a performant way. This structure + * helps to keep two variables on a single cache line per packet. + */ +struct pkt_flow_meta { + uint16_t bytes; + uint16_t tcp_flags; +}; + +/* Structure of heap allocated memory for DPIF internals. */ +struct dpif_userdata { + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) + struct netdev_flow_key keys[NETDEV_MAX_BURST]; + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) + struct netdev_flow_key *key_ptrs[NETDEV_MAX_BURST]; + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) + struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST]; +}; + +int32_t +dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port) +{ + /* Allocate DPIF userdata. */ + if (OVS_UNLIKELY(!pmd->netdev_input_func_userdata)) { + pmd->netdev_input_func_userdata = + xmalloc_pagealign(sizeof(struct dpif_userdata)); + } + + struct dpif_userdata *ud = pmd->netdev_input_func_userdata; + struct netdev_flow_key *keys = ud->keys; + struct netdev_flow_key **key_ptrs = ud->key_ptrs; + struct pkt_flow_meta *pkt_meta = ud->pkt_meta; + + /* Stores the computed output: a rule pointer for each packet */ + /* The AVX512 DPIF implementation handles rules in a way that is optimized + * for reducing data-movement between HWOL/EMC/SMC and DPCLS. This is + * achieved by separating the rule arrays. Bitmasks are kept for each + * packet, indicating if it matched in the HWOL/EMC/SMC array or DPCLS + * array. Later the two arrays are merged by AVX-512 expand instructions. + */ + struct dpcls_rule *rules[NETDEV_MAX_BURST]; + struct dpcls_rule *dpcls_rules[NETDEV_MAX_BURST]; + uint32_t dpcls_key_idx = 0; + + for (uint32_t i = 0; i < NETDEV_MAX_BURST; i += 8) { + _mm512_storeu_si512(&rules[i], _mm512_setzero_si512()); + _mm512_storeu_si512(&dpcls_rules[i], _mm512_setzero_si512()); + } + + /* Prefetch each packet's metadata */ + const size_t batch_size = dp_packet_batch_size(packets); + for (int i = 0; i < batch_size; i++) { + struct dp_packet *packet = packets->packets[i]; + OVS_PREFETCH(dp_packet_data(packet)); + pkt_metadata_prefetch_init(&packet->md); + } + + /* Check if EMC or SMC are enabled */ + struct dfc_cache *cache = &pmd->flow_cache; + const uint32_t emc_enabled = pmd->ctx.emc_insert_min != 0; + const uint32_t smc_enabled = pmd->ctx.smc_enable_db; + + uint32_t emc_hits = 0; + uint32_t smc_hits = 0; + + /* a 1 bit in this mask indidcates a hit, so no DPCLS lookup on the pkt. */ + uint32_t hwol_emc_smc_hitmask = 0; + + /* Perform first packet interation */ + uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; + uint32_t iter = lookup_pkts_bitmask; + while (iter) { + uint32_t i = __builtin_ctz(iter); + iter = _blsr_u64(iter); + + /* Initialize packet md and do miniflow extract */ + struct dp_packet *packet = packets->packets[i]; + pkt_metadata_init(&packet->md, in_port); + struct netdev_flow_key *key = &keys[i]; + miniflow_extract(packet, &key->mf); + + /* Cache TCP and byte values for all packets */ + pkt_meta[i].bytes = dp_packet_size(packet); + pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(&key->mf); + + key->len = netdev_flow_key_size(miniflow_n_values(&key->mf)); + key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, &key->mf); + + struct dp_netdev_flow *f = NULL; + + if (emc_enabled) { + f = emc_lookup(&cache->emc_cache, key); + + if (f) { + rules[i] = &f->cr; + emc_hits++; + hwol_emc_smc_hitmask |= (1 << i); + continue; + } + }; + + if (smc_enabled && !f) { + f = smc_lookup_single(pmd, packet, key); + if (f) { + rules[i] = &f->cr; + smc_hits++; + hwol_emc_smc_hitmask |= (1 << i); + continue; + } + } + + /* The flow pointer was not found in HWOL/EMC/SMC, so add it to the + * dpcls input keys array for batch lookup later. + */ + key_ptrs[dpcls_key_idx] = &keys[i]; + dpcls_key_idx++; + } + + + /* DPCLS handles any packets missed by HWOL/EMC/SMC. It operates on the + * key_ptrs[] for input miniflows to match, storing results in the + * dpcls_rules[] array. + */ + if (dpcls_key_idx > 0) { + struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); + if (OVS_UNLIKELY(!cls)) { + return -1; + } + int any_miss = !dpcls_lookup(cls, + (const struct netdev_flow_key **) key_ptrs, + dpcls_rules, dpcls_key_idx, NULL); + if (OVS_UNLIKELY(any_miss)) { + return -1; + } + + /* Merge DPCLS rules and HWOL/EMC/SMC rules. */ + uint32_t dpcls_idx = 0; + for (int i = 0; i < NETDEV_MAX_BURST; i += 8) { + /* Indexing here is somewhat complicated due to DPCLS output rule + * load index depending on the hitmask of HWOL/EMC/SMC. More + * packets from HWOL/EMC/SMC bitmask means less DPCLS rules are + * used. + */ + __m512i v_cache_rules = _mm512_loadu_si512(&rules[i]); + __m512i v_merged_rules = + _mm512_mask_expandloadu_epi64(v_cache_rules, + ~hwol_emc_smc_hitmask, + &dpcls_rules[dpcls_idx]); + _mm512_storeu_si512(&rules[i], v_merged_rules); + + /* Update DPCLS load index and bitmask for HWOL/EMC/SMC hits. + * There are 8 output pointer per register, subtract the + * HWOL/EMC/SMC lanes equals the number of DPCLS rules consumed. + */ + uint32_t hitmask_FF = (hwol_emc_smc_hitmask & 0xFF); + dpcls_idx += 8 - __builtin_popcountll(hitmask_FF); + hwol_emc_smc_hitmask = (hwol_emc_smc_hitmask >> 8); + } + } + + /* At this point we don't return error anymore, so commit stats here. */ + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_RECV, batch_size); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, emc_hits); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, smc_hits); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, + dpcls_key_idx); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, + dpcls_key_idx); + + /* Initialize the "Action Batch" for each flow handled below. */ + struct dp_packet_batch action_batch; + action_batch.trunc = 0; + action_batch.do_not_steal = false; + + while (lookup_pkts_bitmask) { + uint32_t rule_pkt_idx = __builtin_ctz(lookup_pkts_bitmask); + uint64_t needle = (uintptr_t) rules[rule_pkt_idx]; + + /* Parallel compare 8 flow* 's to the needle, create a bitmask. */ + uint32_t batch_bitmask = 0; + for (uint32_t j = 0; j < NETDEV_MAX_BURST; j += 8) { + /* Pre-calculate store addr */ + uint32_t num_pkts_in_batch = __builtin_popcountll(batch_bitmask); + void *store_addr = &action_batch.packets[num_pkts_in_batch]; + + /* Search for identical flow* in burst, update bitmask. */ + __m512i v_needle = _mm512_set1_epi64(needle); + __m512i v_hay = _mm512_loadu_si512(&rules[j]); + __mmask8 k_cmp_bits = _mm512_cmpeq_epi64_mask(v_needle, v_hay); + uint32_t cmp_bits = k_cmp_bits; + batch_bitmask |= cmp_bits << j; + + /* Compress & Store the batched packets. */ + struct dp_packet **packets_ptrs = &packets->packets[j]; + __m512i v_pkt_ptrs = _mm512_loadu_si512(packets_ptrs); + _mm512_mask_compressstoreu_epi64(store_addr, cmp_bits, v_pkt_ptrs); + } + + /* Strip all packets in this batch from the lookup_pkts_bitmask. */ + lookup_pkts_bitmask &= (~batch_bitmask); + action_batch.count = __builtin_popcountll(batch_bitmask); + + /* Loop over all packets in this batch, to gather the byte and tcp_flag + * values, and pass them to the execute function. It would be nice to + * optimize this away, however it is not easy to refactor in dpif. + */ + uint32_t bytes = 0; + uint16_t tcp_flags = 0; + uint32_t bitmask_iter = batch_bitmask; + for (int i = 0; i < action_batch.count; i++) { + uint32_t idx = __builtin_ctzll(bitmask_iter); + bitmask_iter = _blsr_u64(bitmask_iter); + + bytes += pkt_meta[idx].bytes; + tcp_flags |= pkt_meta[idx].tcp_flags; + } + + dp_netdev_batch_execute(pmd, &action_batch, rules[rule_pkt_idx], + bytes, tcp_flags); + } + + return 0; +} + +#endif +#endif diff --git a/lib/dpif-netdev-private-dfc.h b/lib/dpif-netdev-private-dfc.h index 8f6a4899e..2cee0a38d 100644 --- a/lib/dpif-netdev-private-dfc.h +++ b/lib/dpif-netdev-private-dfc.h @@ -81,6 +81,9 @@ extern "C" { #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ DEFAULT_EM_FLOW_INSERT_INV_PROB) +/* Forward declaration for SMC function prototype. */ +struct dp_netdev_pmd_thread; + struct emc_entry { struct dp_netdev_flow *flow; struct netdev_flow_key key; /* key.hash used for emc hash value. */ @@ -237,6 +240,11 @@ emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) return NULL; } +struct dp_netdev_flow * +smc_lookup_single(struct dp_netdev_pmd_thread *pmd, + struct dp_packet *packet, + struct netdev_flow_key *key); + #ifdef __cplusplus } #endif diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h new file mode 100644 index 000000000..ae9068458 --- /dev/null +++ b/lib/dpif-netdev-private-dpif.h @@ -0,0 +1,32 @@ +/* + * Copyright (c) 2020 Intel Corperation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_DPIF_H +#define DPIF_NETDEV_PRIVATE_DPIF_H 1 + +#include "openvswitch/types.h" + +/* Forward declarations to avoid including files */ +struct dp_netdev_pmd_thread; +struct dp_packet_batch; + +/* Available implementations for dpif work */ +int32_t +dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port); + +#endif /* netdev-private.h */ diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h index 089223aaf..917211938 100644 --- a/lib/dpif-netdev-private-thread.h +++ b/lib/dpif-netdev-private-thread.h @@ -45,14 +45,19 @@ struct dp_netdev_pmd_thread_ctx { struct dp_netdev_rxq *last_rxq; /* EMC insertion probability context for the current processing cycle. */ uint32_t emc_insert_min; + /* Enable the SMC cache from ovsdb config */ + bool smc_enable_db; }; /* Forward declaration for typedef */ struct dp_netdev_pmd_thread; -typedef void (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, - struct dp_packet_batch *packets, - odp_port_t port_no); +/* Typedef for DPIF functions. + * Returns a bitmask of packets to handle, possibly including upcall/misses. + */ +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t port_no); /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h index d7b6fd7ec..9c2237fac 100644 --- a/lib/dpif-netdev-private.h +++ b/lib/dpif-netdev-private.h @@ -31,4 +31,29 @@ #include "dpif-netdev-private-dfc.h" #include "dpif-netdev-private-thread.h" +/* Allow other implementations to lookup the DPCLS instances */ +struct dpcls * +dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, + odp_port_t in_port); + +/* Allow other implementations to call dpcls_lookup() for subtable search */ +bool +dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key *keys[], + struct dpcls_rule **rules, const size_t cnt, + int *num_lookups_p); + +/* Allow other implementations to execute actions on a batch */ +void +dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + struct dpcls_rule *rule, + uint32_t bytes, + uint16_t tcp_flags); + +/* Available implementations for dpif work */ +int32_t +dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port); + #endif /* netdev-private.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c0cf44852..f1089752e 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -185,10 +185,6 @@ static uint32_t dpcls_subtable_lookup_reprobe(struct dpcls *cls); static void dpcls_insert(struct dpcls *, struct dpcls_rule *, const struct netdev_flow_key *mask); static void dpcls_remove(struct dpcls *, struct dpcls_rule *); -static bool dpcls_lookup(struct dpcls *cls, - const struct netdev_flow_key *keys[], - struct dpcls_rule **rules, size_t cnt, - int *num_lookups_p); /* Set of supported meter flags */ #define DP_SUPPORTED_METER_FLAGS_MASK \ @@ -484,7 +480,7 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, const struct flow *flow, const struct nlattr *actions, size_t actions_len); -static void dp_netdev_input(struct dp_netdev_pmd_thread *, +static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *, struct dp_packet_batch *, odp_port_t port_no); static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, struct dp_packet_batch *); @@ -556,7 +552,7 @@ dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, bool purge); static int dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd, struct tx_port *tx); -static inline struct dpcls * +inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port); @@ -1921,7 +1917,7 @@ void dp_netdev_flow_unref(struct dp_netdev_flow *flow) } } -static inline struct dpcls * +inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port) { @@ -2721,7 +2717,7 @@ dp_netdev_pmd_lookup_flow(struct dp_netdev_pmd_thread *pmd, int *lookup_num_p) { struct dpcls *cls; - struct dpcls_rule *rule; + struct dpcls_rule *rule = NULL; odp_port_t in_port = u32_to_odp(MINIFLOW_GET_U32(&key->mf, in_port.odp_port)); struct dp_netdev_flow *netdev_flow = NULL; @@ -4222,7 +4218,10 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, } /* Process packet batch. */ - pmd->netdev_input_func(pmd, &batch, port_no); + int32_t ret = pmd->netdev_input_func(pmd, &batch, port_no); + if (ret) { + dp_netdev_input(pmd, &batch, port_no); + } /* Assign processing cycles to rx queue. */ cycles = cycle_timer_stop(&pmd->perf_stats, &timer); @@ -5227,6 +5226,8 @@ dpif_netdev_run(struct dpif *dpif) non_pmd->ctx.emc_insert_min = 0; } + non_pmd->ctx.smc_enable_db = dp->smc_enable_db; + for (i = 0; i < port->n_rxq; i++) { if (!netdev_rxq_enabled(port->rxqs[i].rx)) { @@ -5498,6 +5499,8 @@ reload: pmd->ctx.emc_insert_min = 0; } + pmd->ctx.smc_enable_db = pmd->dp->smc_enable_db; + process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, poll_list[i].port_no); @@ -6391,6 +6394,24 @@ packet_batch_per_flow_execute(struct packet_batch_per_flow *batch, actions->actions, actions->size); } +void +dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + struct dpcls_rule *rule, + uint32_t bytes, + uint16_t tcp_flags) +{ + /* Gets action* from the rule */ + struct dp_netdev_flow *flow = dp_netdev_flow_cast(rule); + struct dp_netdev_actions *actions = dp_netdev_flow_get_actions(flow); + + dp_netdev_flow_used(flow, dp_packet_batch_size(packets), bytes, + tcp_flags, pmd->ctx.now / 1000); + const uint32_t steal = 1; + dp_netdev_execute_actions(pmd, packets, steal, &flow->flow, + actions->actions, actions->size); +} + static inline void dp_netdev_queue_batches(struct dp_packet *pkt, struct dp_netdev_flow *flow, uint16_t tcp_flags, @@ -6495,6 +6516,30 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd, pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, n_smc_hit); } +struct dp_netdev_flow * +smc_lookup_single(struct dp_netdev_pmd_thread *pmd, + struct dp_packet *packet, + struct netdev_flow_key *key) +{ + const struct cmap_node *flow_node = smc_entry_get(pmd, key->hash); + + if (OVS_LIKELY(flow_node != NULL)) { + struct dp_netdev_flow *flow = NULL; + + CMAP_NODE_FOR_EACH (flow, node, flow_node) { + /* Since we dont have per-port megaflow to check the port + * number, we need to verify that the input ports match. */ + if (OVS_LIKELY(dpcls_rule_matches_key(&flow->cr, key) && + flow->flow.in_port.odp_port == packet->md.in_port.odp_port)) { + + return (void *) flow; + } + } + } + + return NULL; +} + /* Try to process all ('cnt') the 'packets' using only the datapath flow cache * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the * miniflow is copied into 'keys' and the packet pointer is moved at the @@ -6900,12 +6945,13 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, } } -static void +static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, odp_port_t port_no) { dp_netdev_input__(pmd, packets, false, port_no); + return 0; } static void @@ -8346,7 +8392,7 @@ netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, /* Returns true if 'target' satisfies 'key' in 'mask', that is, if each 1-bit * in 'mask' the values in 'key' and 'target' are the same. */ -bool +inline bool ALWAYS_INLINE dpcls_rule_matches_key(const struct dpcls_rule *rule, const struct netdev_flow_key *target) { @@ -8372,7 +8418,7 @@ dpcls_rule_matches_key(const struct dpcls_rule *rule, * priorities, instead returning any rule which matches the flow. * * Returns true if all miniflows found a corresponding rule. */ -static bool +bool dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key *keys[], struct dpcls_rule **rules, const size_t cnt, int *num_lookups_p) From patchwork Fri Feb 12 17:17:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439947 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgDk5rVYz9sTD for ; Sat, 13 Feb 2021 04:17:58 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 3CEE987616; Fri, 12 Feb 2021 17:17:57 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OMpgZkUkvLUP; Fri, 12 Feb 2021 17:17:50 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id DA17C875EE; Fri, 12 Feb 2021 17:17:50 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id C8DD7C1834; Fri, 12 Feb 2021 17:17:50 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id D1EEDC013A for ; Fri, 12 Feb 2021 17:17:49 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 936E56F8B8 for ; Fri, 12 Feb 2021 17:17:49 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kd1gHtMb8Qvl for ; Fri, 12 Feb 2021 17:17:48 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id C0AD96F8AB; Fri, 12 Feb 2021 17:17:48 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id EB6816F79E for ; Fri, 12 Feb 2021 17:17:39 +0000 (UTC) IronPort-SDR: ulExf0wuJp7Q6CJOsj+nfuhHFLOr9ZEQgIjGcDrgJyV6V8S4BdRXevzpxD49neTtUEgMrQ226s ZodynCngfwAw== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595220" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595220" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:39 -0800 IronPort-SDR: eh06UsKwn48LeXvI0GKUzieqMC8xxc670DXYgO1VcQYdCeNRE/hyW1/vXhIm/zQsNOAmnMtL2k 5szS+Xz5tCgA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360484943" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:38 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:07 +0000 Message-Id: <20210212171718.2189798-6-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 05/16] dpif-avx512: Add HWOL support to avx512 dpif. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Partial hardware offload is implemented in a very similar way to the scalar dpif. Signed-off-by: Harry van Haaren --- lib/dpif-netdev-avx512.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index 10228aeb0..caba1fa1c 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -27,6 +27,7 @@ #include "dpif-netdev-private-dpcls.h" #include "dpif-netdev-private-flow.h" #include "dpif-netdev-private-thread.h" +#include "dpif-netdev-private-hwol.h" #include "dp-packet.h" #include "netdev.h" @@ -111,9 +112,32 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, uint32_t i = __builtin_ctz(iter); iter = _blsr_u64(iter); - /* Initialize packet md and do miniflow extract */ + /* Get packet pointer from bitmask and packet md */ struct dp_packet *packet = packets->packets[i]; pkt_metadata_init(&packet->md, in_port); + + struct dp_netdev_flow *f = NULL; + + /* Check for partial hardware offload mark */ + uint32_t mark; + if (dp_packet_has_flow_mark(packet, &mark)) { + f = mark_to_flow_find(pmd, mark); + if (f) { + rules[i] = &f->cr; + + /* This is nasty - instead of using the HWOL provided flow, + * parse the packet data anyway to find the location of the TCP + * header to extract the TCP flags for the rule. + */ + pkt_meta[i].tcp_flags = parse_tcp_flags(packet); + + pkt_meta[i].bytes = dp_packet_size(packet); + hwol_emc_smc_hitmask |= (1 << i); + continue; + } + } + + /* Do miniflow extract into keys */ struct netdev_flow_key *key = &keys[i]; miniflow_extract(packet, &key->mf); @@ -124,8 +148,6 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, key->len = netdev_flow_key_size(miniflow_n_values(&key->mf)); key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, &key->mf); - struct dp_netdev_flow *f = NULL; - if (emc_enabled) { f = emc_lookup(&cache->emc_cache, key); From patchwork Fri Feb 12 17:17:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439948 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgDm1zM7z9sTD for ; Sat, 13 Feb 2021 04:18:00 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id C5ABC87630; Fri, 12 Feb 2021 17:17:58 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7wF5PQKrKy+g; Fri, 12 Feb 2021 17:17:56 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id CE65C87633; Fri, 12 Feb 2021 17:17:56 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 83BAAC1834; Fri, 12 Feb 2021 17:17:56 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7091FC1825 for ; Fri, 12 Feb 2021 17:17:55 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 43CBE6F8CC for ; Fri, 12 Feb 2021 17:17:55 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uEMofOamzTNK for ; Fri, 12 Feb 2021 17:17:53 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 5E9E86F8AA; Fri, 12 Feb 2021 17:17:53 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 97DD26F7B3 for ; Fri, 12 Feb 2021 17:17:41 +0000 (UTC) IronPort-SDR: a4diprrz6aq2zdKehLdCj4gZ2EreZMA6ydtlMu2qIO72Dx7rdYIO9oVAkgEOX/BTmSgTw5gzhP YSURrANwNPyg== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595222" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595222" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:41 -0800 IronPort-SDR: Y0AtV/tysqL+Aisk2mns+cIyb5/Ef9eGUwrZ67PUCWTe1Y3dGC5rjUdWBFbshdMqqpsBHQKu/I ZjbKHZq4dbPA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360484967" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:39 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:08 +0000 Message-Id: <20210212171718.2189798-7-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 06/16] dpif-netdev: Add command to switch dpif implementation. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit adds a new command to allow the user to switch the active DPIF implementation at runtime. A probe function is executed before switching the DPIF implementation, to ensure the CPU is capable of running the ISA required. For example, the below code will switch to the AVX512 enabled DPIF assuming that the runtime CPU is capable of running AVX512 instructions: $ ovs-appctl dpif-netdev/dpif-set dpif_avx512 A new configuration flag is added to allow selection of the default DPIF. This is useful for running the unit-tests against the available DPIF implementations, without modifying each unit test. The design of the testing & validation for ISA optimized DPIF implementations is based around the work already upstream for DPCLS. Note however that a DPCLS lookup has no state or side-effects, allowing the auto-validator implementation to perform multiple lookups and provide consistent statistic counters. The DPIF component does have state, so running two implementations in parallel and comparing output is not a valid testing method, as there are changes in DPIF statistic counters (side effects). As a result, the DPIF is tested directly against the unit-tests. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- acinclude.m4 | 15 ++++++ configure.ac | 1 + lib/automake.mk | 1 + lib/dpif-netdev-avx512.c | 14 +++++ lib/dpif-netdev-private-dpif.c | 92 ++++++++++++++++++++++++++++++++ lib/dpif-netdev-private-dpif.h | 43 ++++++++++++++- lib/dpif-netdev-private-thread.h | 12 +---- lib/dpif-netdev.c | 86 +++++++++++++++++++++++++++-- 8 files changed, 248 insertions(+), 16 deletions(-) create mode 100644 lib/dpif-netdev-private-dpif.c diff --git a/acinclude.m4 b/acinclude.m4 index 435685c93..c9b0d56d6 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [ fi ]) +dnl Set OVS DPIF default implementation at configure time for running the unit +dnl tests on the whole codebase without modifying tests per DPIF impl +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [ + AC_ARG_ENABLE([dpif-default-avx512], + [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation as default.])], + [dpifavx512=yes],[dpifavx512=no]) + AC_MSG_CHECKING([whether DPIF AVX512 is default implementation]) + if test "$dpifavx512" != yes; then + AC_MSG_RESULT([no]) + else + OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT" + AC_MSG_RESULT([yes]) + fi +]) + dnl OVS_ENABLE_WERROR AC_DEFUN([OVS_ENABLE_WERROR], [AC_ARG_ENABLE( diff --git a/configure.ac b/configure.ac index c077034d4..e45685a6c 100644 --- a/configure.ac +++ b/configure.ac @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR OVS_ENABLE_SPARSE OVS_CTAGS_IDENTIFIERS OVS_CHECK_DPCLS_AUTOVALIDATOR +OVS_CHECK_DPIF_AVX512_DEFAULT OVS_CHECK_BINUTILS_AVX512 AC_ARG_VAR(KARCH, [Kernel Architecture String]) diff --git a/lib/automake.mk b/lib/automake.mk index d945d935e..5e493ebaf 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev.h \ lib/dpif-netdev-private-dfc.h \ lib/dpif-netdev-private-dpcls.h \ + lib/dpif-netdev-private-dpif.c \ lib/dpif-netdev-private-dpif.h \ lib/dpif-netdev-private-flow.h \ lib/dpif-netdev-private-hwol.h \ diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index caba1fa1c..fff469e10 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -19,6 +19,7 @@ #if !defined(__CHECKER__) #include +#include #include "dpif-netdev.h" #include "dpif-netdev-perf.h" @@ -54,6 +55,19 @@ struct dpif_userdata { struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST]; }; +int32_t +dp_netdev_input_outer_avx512_probe(void) +{ + int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f"); + int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2"); + + if (!avx512f_available || !bmi2_available) { + return -ENOTSUP; + } + + return 0; +} + int32_t dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c new file mode 100644 index 000000000..9e1f3b8f9 --- /dev/null +++ b/lib/dpif-netdev-private-dpif.c @@ -0,0 +1,92 @@ +/* + * Copyright (c) 2020 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include + +#include "dpif-netdev-private-dpif.h" +#include "util.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl); + +/* Actual list of implementations goes here. */ +static struct dpif_netdev_impl_info_t dpif_impls[] = { + /* The default scalar C code implementation. */ + { .func = dp_netdev_input, + .probe = NULL, + .name = "dpif_scalar", }, + +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__) + /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */ + { .func = dp_netdev_input_outer_avx512, + .probe = dp_netdev_input_outer_avx512_probe, + .name = "dpif_avx512", }, +#endif +}; + +dp_netdev_input_func +dp_netdev_impl_get_default(void) +{ + int dpif_idx = 0; + +/* Configure time overriding to run test suite on all implementations. */ +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__) +#ifdef DPIF_AVX512_DEFAULT + ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512); + if (!dp_netdev_input_outer_avx512_probe()) { + dpif_idx = 1; + }; +#endif +#endif + + VLOG_INFO("Default DPIF implementation is %s.\n", + dpif_impls[dpif_idx].name); + dp_netdev_input_func func = dpif_impls[dpif_idx].func; + + return func; +} + + +/* This function checks all available DPIF implementations, and selects the + * returns the function pointer to the one requested by "name". + */ +int32_t +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func) +{ + ovs_assert(name); + ovs_assert(out_func); + + uint32_t i; + + for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) { + if (strcmp(dpif_impls[i].name, name) == 0) { + /* Probe function is optional - so check it is set before exec. */ + if (dpif_impls[i].probe) { + int probe_ok = dpif_impls[i].probe(); + if (probe_ok) { + *out_func = NULL; + return probe_ok; + } + } + *out_func = dpif_impls[i].func; + return 0; + } + } + + return -EINVAL; +} diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h index ae9068458..a09f90acc 100644 --- a/lib/dpif-netdev-private-dpif.h +++ b/lib/dpif-netdev-private-dpif.h @@ -23,7 +23,48 @@ struct dp_netdev_pmd_thread; struct dp_packet_batch; -/* Available implementations for dpif work */ +/* Typedef for DPIF functions. + * Returns a bitmask of packets to handle, possibly including upcall/misses. + */ +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t port_no); + +/* Probe a DPIF implementation. This allows the implementation to validate CPU + * ISA availability. Returns 0 if not available, returns 1 is valid to use. + */ +typedef int32_t (*dp_netdev_input_func_probe)(void); + +/* Structure describing each available DPIF implmeentation. */ +struct dpif_netdev_impl_info_t { + /* Function pointer to execute to have this DPIF implementation run. */ + dp_netdev_input_func func; + /* Function pointer to execute to check the CPU ISA is available to run. + * May be NULL, which implies that it is always valid to use. + */ + dp_netdev_input_func_probe probe; + /* Name used to select this DPIF implementation. */ + const char *name; +}; + +/* This function checks all available DPIF implementations, and selects the + * returns the function pointer to the one requested by "name". + */ +int32_t +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func); + +/* Returns the ./configure selected DPIF as default, used to initialize. */ +dp_netdev_input_func dp_netdev_impl_get_default(void); + +/* Available implementations of DPIF below */ +int32_t +dp_netdev_input(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port); + +/* AVX512 enabled DPIF implementation and probe functions */ +int32_t +dp_netdev_input_outer_avx512_probe(void); int32_t dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h index 917211938..aac2342a7 100644 --- a/lib/dpif-netdev-private-thread.h +++ b/lib/dpif-netdev-private-thread.h @@ -28,6 +28,8 @@ #include "dpif-netdev-perf.h" #include "openvswitch/thread.h" +#include "dpif-netdev-private-dpif.h" + #ifdef __cplusplus extern "C" { #endif @@ -49,16 +51,6 @@ struct dp_netdev_pmd_thread_ctx { bool smc_enable_db; }; -/* Forward declaration for typedef */ -struct dp_netdev_pmd_thread; - -/* Typedef for DPIF functions. - * Returns a bitmask of packets to handle, possibly including upcall/misses. - */ -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, - struct dp_packet_batch *packets, - odp_port_t port_no); - /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can * not implement rx-wait for these devices. dpif-netdev needs to poll diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f1089752e..564d94a97 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -480,8 +480,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, const struct flow *flow, const struct nlattr *actions, size_t actions_len); -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *, - struct dp_packet_batch *, odp_port_t port_no); +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *, + struct dp_packet_batch *, odp_port_t port_no); static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, struct dp_packet_batch *); @@ -992,6 +992,78 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, ds_destroy(&reply); } +static void +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc, + const char *argv[], void *aux OVS_UNUSED) +{ + /* This function requires just one parameter, the DPIF name. + * A second optional parameter can identify the datapath instance. + */ + const char *dpif_name = argv[1]; + + static const char *error_description[2] = { + "Unknown DPIF implementation", + "CPU doesn't support the required instruction for", + }; + + dp_netdev_input_func new_func; + int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func); + if (err) { + struct ds reply = DS_EMPTY_INITIALIZER; + ds_put_format(&reply, "DPIF implementation not available: %s %s.\n", + error_description[ (err == -ENOTSUP) ], dpif_name); + const char *reply_str = ds_cstr(&reply); + unixctl_command_reply(conn, reply_str); + VLOG_INFO("%s", reply_str); + ds_destroy(&reply); + return; + } + + /* argv[2] is optional datapath instance. If no datapath name is provided + * and only one datapath exists, the one existing datapath is reprobed. + */ + ovs_mutex_lock(&dp_netdev_mutex); + struct dp_netdev *dp = NULL; + + if (argc == 3) { + dp = shash_find_data(&dp_netdevs, argv[2]); + } else if (shash_count(&dp_netdevs) == 1) { + dp = shash_first(&dp_netdevs)->data; + } + + if (!dp) { + ovs_mutex_unlock(&dp_netdev_mutex); + unixctl_command_reply_error(conn, + "please specify an existing datapath"); + return; + } + + /* Get PMD threads list */ + size_t n; + struct dp_netdev_pmd_thread **pmd_list; + sorted_poll_thread_list(dp, &pmd_list, &n); + + for (size_t i = 0; i < n; i++) { + struct dp_netdev_pmd_thread *pmd = pmd_list[i]; + if (pmd->core_id == NON_PMD_CORE_ID) { + continue; + } + + /* Set PMD threads DPIF implementation to requested one */ + pmd->netdev_input_func = *new_func; + }; + + ovs_mutex_unlock(&dp_netdev_mutex); + + /* Reply with success to command */ + struct ds reply = DS_EMPTY_INITIALIZER; + ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name); + const char *reply_str = ds_cstr(&reply); + unixctl_command_reply(conn, reply_str); + VLOG_INFO("%s", reply_str); + ds_destroy(&reply); +} + static void dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc, const char *argv[], void *aux OVS_UNUSED) @@ -1214,6 +1286,10 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "", 0, 0, dpif_netdev_subtable_lookup_get, NULL); + unixctl_command_register("dpif-netdev/dpif-set", + "[dpif implementation name] [dp]", + 1, 2, dpif_netdev_impl_set, + NULL); return 0; } @@ -6010,8 +6086,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, hmap_init(&pmd->send_port_cache); cmap_init(&pmd->tx_bonds); - /* Initialize the DPIF function pointer to the default scalar version */ - pmd->netdev_input_func = dp_netdev_input; + /* Initialize DPIF function pointer to the default configured version. */ + pmd->netdev_input_func = dp_netdev_impl_get_default(); /* init the 'flow_cache' since there is no * actual thread created for NON_PMD_CORE_ID. */ @@ -6945,7 +7021,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, } } -static int32_t +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, odp_port_t port_no) From patchwork Fri Feb 12 17:17:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439952 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgFP3x2vz9sVR for ; Sat, 13 Feb 2021 04:18:33 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 07D478764B; Fri, 12 Feb 2021 17:18:32 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ddYIWj5ZQ3fI; Fri, 12 Feb 2021 17:18:28 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id C290B87609; Fri, 12 Feb 2021 17:18:11 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id A4F94C0891; Fri, 12 Feb 2021 17:18:11 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id E4D88C013A for ; Fri, 12 Feb 2021 17:18:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id D0D516F7A2 for ; Fri, 12 Feb 2021 17:18:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CKIPBdw04M4g for ; Fri, 12 Feb 2021 17:18:07 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 754F46F8D9; Fri, 12 Feb 2021 17:18:07 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 278366F7C3 for ; Fri, 12 Feb 2021 17:17:43 +0000 (UTC) IronPort-SDR: D6HeQIYwC+bGoIGArXsrOdaUN/+XyqtIgG7Tjnysz7Ws0c5qLVQFkAw1J4H9H+g+M3FjkmEt3i m3ZcXpr4fzUw== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595224" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595224" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:42 -0800 IronPort-SDR: qsOZivAo9QyFvzDyen3I8e3398t1oLxj2xYO9ZgRxmOZsWqDle1PATvdP2F5AnrxUvNMzbk49g M4LOstohboNA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360484993" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:41 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:09 +0000 Message-Id: <20210212171718.2189798-8-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 07/16] dpif-netdev: Add command to get dpif implementations. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit adds a new command to retrieve the list of available DPIF implementations. This can be used by to check what implementations of the DPIF are available in any given OVS binary. Usage: $ ovs-appctl dpif-netdev/dpif-get Signed-off-by: Harry van Haaren --- lib/dpif-netdev-private-dpif.c | 7 +++++++ lib/dpif-netdev-private-dpif.h | 6 ++++++ lib/dpif-netdev.c | 24 ++++++++++++++++++++++++ 3 files changed, 37 insertions(+) diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c index 9e1f3b8f9..c5021fe9f 100644 --- a/lib/dpif-netdev-private-dpif.c +++ b/lib/dpif-netdev-private-dpif.c @@ -61,6 +61,13 @@ dp_netdev_impl_get_default(void) return func; } +uint32_t +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls) +{ + ovs_assert(out_impls); + *out_impls = dpif_impls; + return ARRAY_SIZE(dpif_impls); +} /* This function checks all available DPIF implementations, and selects the * returns the function pointer to the one requested by "name". diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h index a09f90acc..99fbda943 100644 --- a/lib/dpif-netdev-private-dpif.h +++ b/lib/dpif-netdev-private-dpif.h @@ -47,6 +47,12 @@ struct dpif_netdev_impl_info_t { const char *name; }; +/* This function returns all available implementations to the caller. The + * quantity of implementations is returned by the int return value. + */ +uint32_t +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls); + /* This function checks all available DPIF implementations, and selects the * returns the function pointer to the one requested by "name". */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 564d94a97..dff844f99 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -992,6 +992,27 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, ds_destroy(&reply); } +static void +dpif_netdev_impl_get(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED) +{ + const struct dpif_netdev_impl_info_t *dpif_impls; + uint32_t count = dp_netdev_impl_get(&dpif_impls); + if (count == 0) { + unixctl_command_reply_error(conn, "error getting dpif names"); + return; + } + + /* Add all dpif functions to reply string. */ + struct ds reply = DS_EMPTY_INITIALIZER; + ds_put_cstr(&reply, "Available DPIF implementations:\n"); + for (uint32_t i = 0; i < count; i++) { + ds_put_format(&reply, " %s\n", dpif_impls[i].name); + } + unixctl_command_reply(conn, ds_cstr(&reply)); + ds_destroy(&reply); +} + static void dpif_netdev_impl_set(struct unixctl_conn *conn, int argc, const char *argv[], void *aux OVS_UNUSED) @@ -1290,6 +1311,9 @@ dpif_netdev_init(void) "[dpif implementation name] [dp]", 1, 2, dpif_netdev_impl_set, NULL); + unixctl_command_register("dpif-netdev/dpif-get", "", + 0, 0, dpif_netdev_impl_get, + NULL); return 0; } From patchwork Fri Feb 12 17:17:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439951 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgFB0yJwz9sVR for ; Sat, 13 Feb 2021 04:18:22 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id A07588761F; Fri, 12 Feb 2021 17:18:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1ABxIpK8eauE; Fri, 12 Feb 2021 17:18:19 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id CC699875FD; Fri, 12 Feb 2021 17:18:19 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id ADFEEC1834; Fri, 12 Feb 2021 17:18:19 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0CFCDC013A for ; Fri, 12 Feb 2021 17:18:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id CDD0F6F8B2 for ; Fri, 12 Feb 2021 17:18:17 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C-z71zFcEcOb for ; Fri, 12 Feb 2021 17:18:16 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 337CD6F8F2; Fri, 12 Feb 2021 17:18:16 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 19EE96F674 for ; Fri, 12 Feb 2021 17:17:44 +0000 (UTC) IronPort-SDR: BTArY4KHetGwratODpcap7ZvKS+oTLnndP0jQD7E7L/HD40Xa18xct8UucEMSjYalk8BDn7RTg vdGoAUlQWznw== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595228" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595228" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:44 -0800 IronPort-SDR: mPjWsk62sLzb6EU83pvQnIrGUyDJSaQc1kBfRJqkrMaGhf/K+DDgc0aA2h7U64FU4NQz/pD48c dME4X9q5A34w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485027" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:43 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:10 +0000 Message-Id: <20210212171718.2189798-9-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 08/16] docs/dpdk/bridge: Add dpif performance section. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Cian Ferriter This section details how two new commands can be used to list and select the different dpif implementations. It also details how a non default dpif implementation can be tested with the OVS unit test suite. Add NEWS updates for the dpif-netdev.c refactor and the new dpif implementations/commands. Signed-off-by: Cian Ferriter --- v8: - Merge NEWS file items into one Userspace Datapath: heading --- Documentation/topics/dpdk/bridge.rst | 37 ++++++++++++++++++++++++++++ NEWS | 6 ++++- 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 526d5c959..ca90d7bdb 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -214,3 +214,40 @@ implementation :: Compile OVS in debug mode to have `ovs_assert` statements error out if there is a mis-match in the DPCLS lookup implementation. + +Datapath Interface Performance +------------------------------ + +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking +packets through the major components of the userspace datapath; such as +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance +stats associated with the datapath. + +Just like with the SIMD DPCLS work above, SIMD can be applied to the DPIF to +improve performance. + +OVS provides multiple implementations of the DPIF. These can be listed with the +following command :: + + $ ovs-appctl dpif-netdev/dpif-get + Available DPIF implementations: + dpif_scalar + dpif_avx512 + +By default, dpif_scalar is used. The DPIF implementation can be selected by +name :: + + $ ovs-appctl dpif-netdev/dpif-set dpif_avx512 + DPIF implementation set to dpif_avx512. + + $ ovs-appctl dpif-netdev/dpif-set dpif_scalar + DPIF implementation set to dpif_scalar. + +Running Unit Tests with AVX512 DPIF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Since the AVX512 DPIF is disabled by default, a compile time option is +available in order to test it with the OVS unit test suite. When building with +a CPU that supports AVX512, use the following configure option :: + + $ ./configure --enable-dpif-default-avx512 diff --git a/NEWS b/NEWS index a7bffce97..a03e9d7be 100644 --- a/NEWS +++ b/NEWS @@ -2,7 +2,11 @@ Post-v2.15.0 --------------------- - In ovs-vsctl and vtep-ctl, the "find" command now accept new operators {in} and {not-in}. - + - Userspace Datapath: + * Refactor lib/dpif-netdev.c to multiple header files. + * Add avx512 implementation of dpif which can process non recirculated + packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. + * Add commands to get and set the dpif implementations. v2.15.0 - xx xxx xxxx --------------------- From patchwork Fri Feb 12 17:17:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439953 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgFf3dcmz9sTD for ; Sat, 13 Feb 2021 04:18:46 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id B5AC986E19; Fri, 12 Feb 2021 17:18:44 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ia8mV1Rh0pdK; Fri, 12 Feb 2021 17:18:41 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id 1BBC386DE5; Fri, 12 Feb 2021 17:18:24 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id D826DC1834; Fri, 12 Feb 2021 17:18:23 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5A1C0C0891 for ; Fri, 12 Feb 2021 17:18:22 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 52D726F8EB for ; Fri, 12 Feb 2021 17:18:22 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3Sk_yz5GUQit for ; Fri, 12 Feb 2021 17:18:21 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 1D8E06F8F7; Fri, 12 Feb 2021 17:18:21 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 808846F794 for ; Fri, 12 Feb 2021 17:17:46 +0000 (UTC) IronPort-SDR: ncCKa6pETGGiynqj99mStA4+yT7BCYXEaCtryjSJimvxTyDWQa5WaFqF9Ywfnd4QxR5alf9yvq vtroe9GcnjFw== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595229" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595229" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:46 -0800 IronPort-SDR: XK52IcrwJ43nFeGb8A0yTrLlVSIZzk6mMnopcWzQfEXTmQW+j5zgqiF5a8mf9aF8jLDSO4pRT1 MX4cXmOu5RCw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485053" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:44 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:11 +0000 Message-Id: <20210212171718.2189798-10-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 09/16] dpif-netdev/dpcls: Refactor function names to dpcls. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit refactors the function names from netdev_* namespace to the dpcls_* namespace, as they are only used by dpcls code. With the name change, it becomes more obvious that the functions belong to dpcls functionality, and in the dpif-netdev-private-dpcls.h header file. Signed-off-by: Harry van Haaren --- lib/dpif-netdev-private-dpcls.h | 6 ++---- lib/dpif-netdev.c | 21 ++++++++++----------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/lib/dpif-netdev-private-dpcls.h b/lib/dpif-netdev-private-dpcls.h index 5bc579bba..e66cae3f4 100644 --- a/lib/dpif-netdev-private-dpcls.h +++ b/lib/dpif-netdev-private-dpcls.h @@ -97,10 +97,8 @@ struct dpcls_subtable { /* Generates a mask for each bit set in the subtable's miniflow. */ void -netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, - uint64_t *mf_masks, - const uint32_t mf_bits_u0, - const uint32_t mf_bits_u1); +dpcls_flow_key_gen_masks(const struct netdev_flow_key *tbl, uint64_t *mf_masks, + const uint32_t mf_bits_u0, const uint32_t mf_bits_u1); /* Matches a dpcls rule against the incoming packet in 'target' */ bool dpcls_rule_matches_key(const struct dpcls_rule *rule, diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index dff844f99..5e83755d7 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -8278,7 +8278,7 @@ dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask) subtable->mf_bits_set_unit0 = unit0; subtable->mf_bits_set_unit1 = unit1; subtable->mf_masks = xmalloc(sizeof(uint64_t) * (unit0 + unit1)); - netdev_flow_key_gen_masks(mask, subtable->mf_masks, unit0, unit1); + dpcls_flow_key_gen_masks(mask, subtable->mf_masks, unit0, unit1); /* Get the preferred subtable search function for this (u0,u1) subtable. * The function is guaranteed to always return a valid implementation, and @@ -8453,11 +8453,10 @@ dpcls_remove(struct dpcls *cls, struct dpcls_rule *rule) } } -/* Inner loop for mask generation of a unit, see netdev_flow_key_gen_masks. */ +/* Inner loop for mask generation of a unit, see dpcls_flow_key_gen_masks. */ static inline void -netdev_flow_key_gen_mask_unit(uint64_t iter, - const uint64_t count, - uint64_t *mf_masks) +dpcls_flow_key_gen_mask_unit(uint64_t iter, const uint64_t count, + uint64_t *mf_masks) { int i; for (i = 0; i < count; i++) { @@ -8478,16 +8477,16 @@ netdev_flow_key_gen_mask_unit(uint64_t iter, * @param mf_bits_unit0 Number of bits set in unit0 of the miniflow */ void -netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, - uint64_t *mf_masks, - const uint32_t mf_bits_u0, - const uint32_t mf_bits_u1) +dpcls_flow_key_gen_masks(const struct netdev_flow_key *tbl, + uint64_t *mf_masks, + const uint32_t mf_bits_u0, + const uint32_t mf_bits_u1) { uint64_t iter_u0 = tbl->mf.map.bits[0]; uint64_t iter_u1 = tbl->mf.map.bits[1]; - netdev_flow_key_gen_mask_unit(iter_u0, mf_bits_u0, &mf_masks[0]); - netdev_flow_key_gen_mask_unit(iter_u1, mf_bits_u1, &mf_masks[mf_bits_u0]); + dpcls_flow_key_gen_mask_unit(iter_u0, mf_bits_u0, &mf_masks[0]); + dpcls_flow_key_gen_mask_unit(iter_u1, mf_bits_u1, &mf_masks[mf_bits_u0]); } /* Returns true if 'target' satisfies 'key' in 'mask', that is, if each 1-bit From patchwork Fri Feb 12 17:17:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439959 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgJy4BQLz9sTD for ; Sat, 13 Feb 2021 04:21:38 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 1168C6F7C4 for ; Fri, 12 Feb 2021 17:21:36 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5uB5Rfd2JEG6 for ; Fri, 12 Feb 2021 17:21:32 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 7B9D46F8D9; Fri, 12 Feb 2021 17:21:32 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTP id B7B126F914; Fri, 12 Feb 2021 17:18:38 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 89657C1825; Fri, 12 Feb 2021 17:18:38 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 22CCAC1D9F for ; Fri, 12 Feb 2021 17:18:37 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id E7A306F7B8 for ; Fri, 12 Feb 2021 17:18:36 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2KccT8_UvJrj for ; Fri, 12 Feb 2021 17:18:34 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 3DC6A6F926; Fri, 12 Feb 2021 17:18:34 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 5A1376F78A for ; Fri, 12 Feb 2021 17:17:48 +0000 (UTC) IronPort-SDR: csWtwTozLmtoZgGyRIA2KIoX5kxonnnQlDVVfyDRxZhFnd7kzYqM8icp5l2a8Q9Lwh18uSX/cL 6zIgCwtvRG6A== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595230" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595230" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:48 -0800 IronPort-SDR: Fd7RKVRkuFe53Cd0kbSMD3I2nxKv2BDfSirda9wyK+DC8pVZHt5bk9m/ClnZbW+25IQMBoVfZm U8CjitQRPBxw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485084" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:46 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:12 +0000 Message-Id: <20210212171718.2189798-11-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 10/16] dpif-netdev/dpcls-avx512: enable 16 block processing. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit implements larger subtable searches in avx512. A limitation of the previous implementation was that up to 8 blocks of miniflow data could be matched on (so a subtable with 8 blocks was handled in avx, but 9 blocks or more would fall back to scalar/generic). This limitation is removed in this patch, where up to 16 blocks of subtable can be matched on. From an implementation perspective, the key to enabling 16 blocks over 8 blocks was to do bitmask calculation up front, and then use the pre-calculated bitmasks for 2x passes of the "blocks gather" routine. The bitmasks need to be shifted for k-mask usage in the upper (8-15) block range, but it is relatively trivial. This also helps in case expanding to 24 blocks is desired in future. The implementation of the 2nd iteration to handle > 8 blocks is behind a conditional branch which checks the total number of bits. This helps the specialized versions of the function that have a miniflow fingerprint of less-than-or-equal 8 blocks, as the code can be statically stripped out of those functions. Specialized functions that do require more than 8 blocks will have the branch removed and unconditionally execute the 2nd blocks gather routine. Lastly, the _any() flavour will have the conditional branch, and the branch predictor may mispredict a bit, but per burst will likely get most packets correct (particularly towards the middle and end of a burst). The code has been run with unit tests under autovalidation and passes all cases, and unit test coverage has been checked to ensure the 16 block code paths are executing. Signed-off-by: Harry van Haaren --- v9: Fixup post 2.15 rebase on NEWS v8: Add NEWS entry --- NEWS | 1 + lib/dpif-netdev-lookup-avx512-gather.c | 203 ++++++++++++++++++------- 2 files changed, 147 insertions(+), 57 deletions(-) diff --git a/NEWS b/NEWS index a03e9d7be..d3b9221ed 100644 --- a/NEWS +++ b/NEWS @@ -7,6 +7,7 @@ Post-v2.15.0 * Add avx512 implementation of dpif which can process non recirculated packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. * Add commands to get and set the dpif implementations. + * Enable AVX512 optimized DPCLS to search subtables with larger miniflows. v2.15.0 - xx xxx xxxx --------------------- diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 8fc1cdfa5..1f27c0536 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -34,7 +34,21 @@ * AVX512 code at a time. */ #define NUM_U64_IN_ZMM_REG (8) -#define BLOCKS_CACHE_SIZE (NETDEV_MAX_BURST * NUM_U64_IN_ZMM_REG) + +/* This implementation of AVX512 gather allows up to 16 blocks of MF data to be + * present in the blocks_cache, hence the multiply by 2 in the blocks count. + */ +#define MF_BLOCKS_PER_PACKET (NUM_U64_IN_ZMM_REG * 2) + +/* Blocks cache size is the maximum number of miniflow blocks that this + * implementation of lookup can handle. + */ +#define BLOCKS_CACHE_SIZE (NETDEV_MAX_BURST * MF_BLOCKS_PER_PACKET) + +/* The gather instruction can handle a scale for the size of the items to + * gather. For uint64_t data, this scale is 8. + */ +#define GATHER_SCALE_8 (8) VLOG_DEFINE_THIS_MODULE(dpif_lookup_avx512_gather); @@ -69,22 +83,83 @@ netdev_rule_matches_key(const struct dpcls_rule *rule, { const uint64_t *keyp = miniflow_get_values(&rule->flow.mf); const uint64_t *maskp = miniflow_get_values(&rule->mask->mf); - const uint32_t lane_mask = (1 << mf_bits_total) - 1; + const uint32_t lane_mask = (1ULL << mf_bits_total) - 1; /* Always load a full cache line from blocks_cache. Other loads must be * trimmed to the amount of data required for mf_bits_total blocks. */ - __m512i v_blocks = _mm512_loadu_si512(&block_cache[0]); - __m512i v_mask = _mm512_maskz_loadu_epi64(lane_mask, &maskp[0]); - __m512i v_key = _mm512_maskz_loadu_epi64(lane_mask, &keyp[0]); + uint32_t res_mask; + + { + __m512i v_blocks = _mm512_loadu_si512(&block_cache[0]); + __m512i v_mask = _mm512_maskz_loadu_epi64(lane_mask, &maskp[0]); + __m512i v_key = _mm512_maskz_loadu_epi64(lane_mask, &keyp[0]); + __m512i v_data = _mm512_and_si512(v_blocks, v_mask); + res_mask = _mm512_mask_cmpeq_epi64_mask(lane_mask, v_data, v_key); + } - __m512i v_data = _mm512_and_si512(v_blocks, v_mask); - uint32_t res_mask = _mm512_mask_cmpeq_epi64_mask(lane_mask, v_data, v_key); + if (mf_bits_total > 8) { + uint32_t lane_mask_gt8 = lane_mask >> 8; + __m512i v_blocks = _mm512_loadu_si512(&block_cache[8]); + __m512i v_mask = _mm512_maskz_loadu_epi64(lane_mask_gt8, &maskp[8]); + __m512i v_key = _mm512_maskz_loadu_epi64(lane_mask_gt8, &keyp[8]); + __m512i v_data = _mm512_and_si512(v_blocks, v_mask); + uint32_t c = _mm512_mask_cmpeq_epi64_mask(lane_mask_gt8, v_data, + v_key); + res_mask |= (c << 8); + } - /* returns 1 assuming result of SIMD compare is all blocks. */ + /* returns 1 assuming result of SIMD compare is all blocks matching. */ return res_mask == lane_mask; } +/* Takes u0 and u1 inputs, and gathers the next 8 blocks to be stored + * contigously into the blocks cache. Note that the pointers and bitmasks + * passed into this function must be incremented for handling next 8 blocks. + */ +static inline ALWAYS_INLINE __m512i +avx512_blocks_gather(__m512i v_u0, /* reg of u64 of all u0 bits */ + __m512i v_u1, /* reg of u64 of all u1 bits */ + const uint64_t *pkt_blocks, /* ptr pkt blocks to load */ + const void *tbl_blocks, /* ptr to blocks in tbl */ + const void *tbl_mf_masks, /* ptr to subtable mf masks */ + __mmask64 u1_bcast_msk, /* mask of u1 lanes */ + const uint64_t pkt_mf_u0_pop, /* num bits in u0 of pkt */ + __mmask64 zero_mask, /* maskz if pkt not have mf bit */ + __mmask64 u64_lanes_mask) /* total lane count to use */ +{ + /* Suggest to compiler to load tbl blocks ahead of gather() */ + __m512i v_tbl_blocks = _mm512_maskz_loadu_epi64(u64_lanes_mask, + tbl_blocks); + + /* Blend u0 and u1 bits together for these 8 blocks */ + __m512i v_pkt_bits = _mm512_mask_blend_epi64(u1_bcast_msk, v_u0, v_u1); + + /* Load pre-created tbl miniflow bitmasks, bitwise AND with them */ + __m512i v_tbl_masks = _mm512_maskz_loadu_epi64(u64_lanes_mask, + tbl_mf_masks); + __m512i v_masks = _mm512_and_si512(v_pkt_bits, v_tbl_masks); + + /* Manual AVX512 popcount for u64 lanes. */ + __m512i v_popcnts = _mm512_popcnt_epi64_manual(v_masks); + + /* Add popcounts and offset for u1 bits. */ + __m512i v_idx_u0_offset = _mm512_maskz_set1_epi64(u1_bcast_msk, + pkt_mf_u0_pop); + __m512i v_indexes = _mm512_add_epi64(v_popcnts, v_idx_u0_offset); + + /* Gather u64 blocks from packet miniflow. */ + __m512i v_zeros = _mm512_setzero_si512(); + __m512i v_blocks = _mm512_mask_i64gather_epi64(v_zeros, u64_lanes_mask, + v_indexes, pkt_blocks, + GATHER_SCALE_8); + + /* Mask pkt blocks with subtable blocks, k-mask to zero lanes */ + __m512i v_masked_blocks = _mm512_maskz_and_epi64(zero_mask, v_blocks, + v_tbl_blocks); + return v_masked_blocks; +} + static inline uint32_t ALWAYS_INLINE avx512_lookup_impl(struct dpcls_subtable *subtable, uint32_t keys_map, @@ -94,76 +169,86 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, const uint32_t bit_count_u1) { OVS_ALIGNED_VAR(CACHE_LINE_SIZE)uint64_t block_cache[BLOCKS_CACHE_SIZE]; - - const uint32_t bit_count_total = bit_count_u0 + bit_count_u1; - int i; uint32_t hashes[NETDEV_MAX_BURST]; + const uint32_t n_pkts = __builtin_popcountll(keys_map); ovs_assert(NETDEV_MAX_BURST >= n_pkts); + const uint32_t bit_count_total = bit_count_u0 + bit_count_u1; + const uint64_t bit_count_total_mask = (1ULL << bit_count_total) - 1; + const uint64_t tbl_u0 = subtable->mask.mf.map.bits[0]; const uint64_t tbl_u1 = subtable->mask.mf.map.bits[1]; - /* Load subtable blocks for masking later. */ const uint64_t *tbl_blocks = miniflow_get_values(&subtable->mask.mf); - const __m512i v_tbl_blocks = _mm512_loadu_si512(&tbl_blocks[0]); - - /* Load pre-created subtable masks for each block in subtable. */ - const __mmask8 bit_count_total_mask = (1 << bit_count_total) - 1; - const __m512i v_mf_masks = _mm512_maskz_loadu_epi64(bit_count_total_mask, - subtable->mf_masks); + const uint64_t *tbl_mf_masks = subtable->mf_masks; + int i; ULLONG_FOR_EACH_1 (i, keys_map) { + /* Create mask register with packet-specific u0 offset. + * Note that as 16 blocks can be handled in total, the width of the + * mask register must be >=16. + */ const uint64_t pkt_mf_u0_bits = keys[i]->mf.map.bits[0]; const uint64_t pkt_mf_u0_pop = __builtin_popcountll(pkt_mf_u0_bits); - - /* Pre-create register with *PER PACKET* u0 offset. */ - const __mmask8 u1_bcast_mask = (UINT8_MAX << bit_count_u0); - const __m512i v_idx_u0_offset = _mm512_maskz_set1_epi64(u1_bcast_mask, - pkt_mf_u0_pop); + const __mmask64 u1_bcast_mask = (UINT64_MAX << bit_count_u0); /* Broadcast u0, u1 bitmasks to 8x u64 lanes. */ - __m512i v_u0 = _mm512_set1_epi64(pkt_mf_u0_bits); - __m512i v_pkt_bits = _mm512_mask_set1_epi64(v_u0, u1_bcast_mask, - keys[i]->mf.map.bits[1]); - - /* Bitmask by pre-created masks. */ - __m512i v_masks = _mm512_and_si512(v_pkt_bits, v_mf_masks); - - /* Manual AVX512 popcount for u64 lanes. */ - __m512i v_popcnts = _mm512_popcnt_epi64_manual(v_masks); - - /* Offset popcounts for u1 with pre-created offset register. */ - __m512i v_indexes = _mm512_add_epi64(v_popcnts, v_idx_u0_offset); - - /* Gather u64 blocks from packet miniflow. */ - const __m512i v_zeros = _mm512_setzero_si512(); - const void *pkt_data = miniflow_get_values(&keys[i]->mf); - __m512i v_all_blocks = _mm512_mask_i64gather_epi64(v_zeros, - bit_count_total_mask, v_indexes, - pkt_data, 8); + __m512i v_u0 = _mm512_set1_epi64(keys[i]->mf.map.bits[0]); + __m512i v_u1 = _mm512_set1_epi64(keys[i]->mf.map.bits[1]); /* Zero out bits that pkt doesn't have: * - 2x pext() to extract bits from packet miniflow as needed by TBL * - Shift u1 over by bit_count of u0, OR to create zero bitmask */ - uint64_t u0_to_zero = _pext_u64(keys[i]->mf.map.bits[0], tbl_u0); - uint64_t u1_to_zero = _pext_u64(keys[i]->mf.map.bits[1], tbl_u1); - uint64_t zero_mask = (u1_to_zero << bit_count_u0) | u0_to_zero; - - /* Mask blocks using AND with subtable blocks, use k-mask to zero - * where lanes as required for this packet. - */ - __m512i v_masked_blocks = _mm512_maskz_and_epi64(zero_mask, - v_all_blocks, v_tbl_blocks); + uint64_t u0_to_zero = _pext_u64(keys[i]->mf.map.bits[0], tbl_u0); + uint64_t u1_to_zero = _pext_u64(keys[i]->mf.map.bits[1], tbl_u1); + const uint64_t zero_mask_wip = (u1_to_zero << bit_count_u0) | + u0_to_zero; + const uint64_t zero_mask = zero_mask_wip & bit_count_total_mask; + + /* Get ptr to packet data blocks */ + const uint64_t *pkt_blocks = miniflow_get_values(&keys[i]->mf); + + /* Store first 8 blocks cache, full cache line aligned. */ + __m512i v_blocks = avx512_blocks_gather(v_u0, v_u1, + &pkt_blocks[0], + &tbl_blocks[0], + &tbl_mf_masks[0], + u1_bcast_mask, + pkt_mf_u0_pop, + zero_mask, + bit_count_total_mask); + _mm512_storeu_si512(&block_cache[i * MF_BLOCKS_PER_PACKET], v_blocks); + + if (bit_count_total > 8) { + /* Shift masks over by 8. + * Pkt blocks pointer remains 0, it is incremented by popcount. + * Move tbl and mf masks pointers forward. + * Increase offsets by 8. + * Re-run same gather code. + */ + uint64_t zero_mask_gt8 = (zero_mask >> 8); + uint64_t u1_bcast_mask_gt8 = (u1_bcast_mask >> 8); + uint64_t bit_count_gt8_mask = bit_count_total_mask >> 8; + + __m512i v_blocks_gt8 = avx512_blocks_gather(v_u0, v_u1, + &pkt_blocks[0], + &tbl_blocks[8], + &tbl_mf_masks[8], + u1_bcast_mask_gt8, + pkt_mf_u0_pop, + zero_mask_gt8, + bit_count_gt8_mask); + _mm512_storeu_si512(&block_cache[(i * MF_BLOCKS_PER_PACKET) + 8], + v_blocks_gt8); + } - /* Store to blocks cache, full cache line aligned. */ - _mm512_storeu_si512(&block_cache[i * 8], v_masked_blocks); } /* Hash the now linearized blocks of packet metadata. */ ULLONG_FOR_EACH_1 (i, keys_map) { - uint64_t *block_ptr = &block_cache[i * 8]; + uint64_t *block_ptr = &block_cache[i * MF_BLOCKS_PER_PACKET]; uint32_t hash = hash_add_words64(0, block_ptr, bit_count_total); hashes[i] = hash_finish(hash, bit_count_total * 8); } @@ -183,7 +268,7 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, struct dpcls_rule *rule; CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) { - const uint32_t cidx = i * 8; + const uint32_t cidx = i * MF_BLOCKS_PER_PACKET; uint32_t match = netdev_rule_matches_key(rule, bit_count_total, &block_cache[cidx]); if (OVS_LIKELY(match)) { @@ -220,7 +305,7 @@ DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) /* Check if a specialized function is valid for the required subtable. */ #define CHECK_LOOKUP_FUNCTION(U0, U1) \ - ovs_assert((U0 + U1) <= NUM_U64_IN_ZMM_REG); \ + ovs_assert((U0 + U1) <= (NUM_U64_IN_ZMM_REG * 2)); \ if (!f && u0_bits == U0 && u1_bits == U1) { \ f = dpcls_avx512_gather_mf_##U0##_##U1; \ } @@ -250,7 +335,11 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, uint32_t u1_bits) CHECK_LOOKUP_FUNCTION(4, 1); CHECK_LOOKUP_FUNCTION(4, 0); - if (!f && (u0_bits + u1_bits) < NUM_U64_IN_ZMM_REG) { + /* Check if the _any looping version of the code can perform this miniflow + * lookup. Performance gain may be less pronounced due to non-specialized + * hashing, however there is usually a good performance win overall. + */ + if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) { f = dpcls_avx512_gather_mf_any; VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n", u0_bits, u1_bits); From patchwork Fri Feb 12 17:17:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439954 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgG025YGz9sVX for ; Sat, 13 Feb 2021 04:19:04 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id D355287034; Fri, 12 Feb 2021 17:19:02 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AiBHJIYfgd4g; Fri, 12 Feb 2021 17:19:01 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id D4B5C870B2; Fri, 12 Feb 2021 17:18:39 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id A7C65C1D9F; Fri, 12 Feb 2021 17:18:39 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 77F29C0891 for ; Fri, 12 Feb 2021 17:18:37 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 4B95F6F7B8 for ; Fri, 12 Feb 2021 17:18:37 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gTOdtMp8b2mX for ; Fri, 12 Feb 2021 17:18:35 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id C628F6F90C; Fri, 12 Feb 2021 17:18:35 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id F42136F8BF for ; Fri, 12 Feb 2021 17:17:49 +0000 (UTC) IronPort-SDR: sPcrEoa8vHwmynA+6+cChy5fFmM4HVoOo42XOigU0/z9WYif/YSuoQW/EQTqjoaU25hdqJo2A0 ALcMJDgqc7Rw== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595232" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595232" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:49 -0800 IronPort-SDR: LKP9BbFeiVswSo4LT0qiaq//RHrXUkZ6/drE2VYG1se/xFy8vFuETSiM/N4stflVagEe+jg9s8 VwhPXlKyG2kA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485113" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:48 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:13 +0000 Message-Id: <20210212171718.2189798-12-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 11/16] dpif-netdev/dpcls: specialize more subtable signatures. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit adds more subtables to be specialized. The traffic pattern here being matched is VXLAN traffic subtables, which commonly have (5,3), (9,1) and (9,4) subtable fingerprints. Signed-off-by: Harry van Haaren --- v8: Add NEWS entry. --- NEWS | 2 ++ lib/dpif-netdev-lookup-avx512-gather.c | 6 ++++++ lib/dpif-netdev-lookup-generic.c | 6 ++++++ 3 files changed, 14 insertions(+) diff --git a/NEWS b/NEWS index d3b9221ed..a5bb16da2 100644 --- a/NEWS +++ b/NEWS @@ -8,6 +8,8 @@ Post-v2.15.0 packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. * Add commands to get and set the dpif implementations. * Enable AVX512 optimized DPCLS to search subtables with larger miniflows. + * Add more specialized DPCLS subtables to cover common rules, enhancing + the lookup performance. v2.15.0 - xx xxx xxxx --------------------- diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 1f27c0536..3a684fadf 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -299,6 +299,9 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, return avx512_lookup_impl(subtable, keys_map, keys, rules, U0, U1); \ } \ +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) @@ -331,6 +334,9 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, uint32_t u1_bits) return NULL; } + CHECK_LOOKUP_FUNCTION(9, 4); + CHECK_LOOKUP_FUNCTION(9, 1); + CHECK_LOOKUP_FUNCTION(5, 3); CHECK_LOOKUP_FUNCTION(5, 1); CHECK_LOOKUP_FUNCTION(4, 1); CHECK_LOOKUP_FUNCTION(4, 0); diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c index e3b6be4b6..6c74ac3a1 100644 --- a/lib/dpif-netdev-lookup-generic.c +++ b/lib/dpif-netdev-lookup-generic.c @@ -282,6 +282,9 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, return lookup_generic_impl(subtable, keys_map, keys, rules, U0, U1); \ } \ +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) @@ -303,6 +306,9 @@ dpcls_subtable_generic_probe(uint32_t u0_bits, uint32_t u1_bits) { dpcls_subtable_lookup_func f = NULL; + CHECK_LOOKUP_FUNCTION(9, 4); + CHECK_LOOKUP_FUNCTION(9, 1); + CHECK_LOOKUP_FUNCTION(5, 3); CHECK_LOOKUP_FUNCTION(5, 1); CHECK_LOOKUP_FUNCTION(4, 1); CHECK_LOOKUP_FUNCTION(4, 0); From patchwork Fri Feb 12 17:17:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439957 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgH408gzz9sVR for ; Sat, 13 Feb 2021 04:19:57 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 093088692C; Fri, 12 Feb 2021 17:19:56 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RBUdOrMzmG1Q; Fri, 12 Feb 2021 17:19:53 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 5D0F086D6E; Fri, 12 Feb 2021 17:18:59 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 29146C1DA0; Fri, 12 Feb 2021 17:18:59 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id C16FAC1834 for ; Fri, 12 Feb 2021 17:18:56 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id A83DC6F945 for ; Fri, 12 Feb 2021 17:18:56 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DiNeMmr4b-zi for ; Fri, 12 Feb 2021 17:18:55 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id CBF3E6F937; Fri, 12 Feb 2021 17:18:47 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 93A3C6F8A0 for ; Fri, 12 Feb 2021 17:17:51 +0000 (UTC) IronPort-SDR: Hgv4L1ILsh2GYBxU5JChDoFviavayXmI7N6furJ65x5HVGswHbnjUQMNJGHI2VkxpXMU/Vhlll E+5Ysn47ZTYA== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595235" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595235" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:51 -0800 IronPort-SDR: n4ez006+zETGH65Klh+4qRuqgA64JFIj1NA2iKDC6yUZo6z5akS+V5IquDvjC4Q44lHhryRsNZ 4dOJlt+GWBjA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485148" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:49 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:14 +0000 Message-Id: <20210212171718.2189798-13-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 12/16] dpdk: Cache result of CPU ISA checks. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" As a small optimization, this patch caches the result of a CPU ISA check from DPDK. Particularly in the case of running the DPCLS autovalidator (which repeatedly probes subtables) this reduces the amount of CPU ISA lookups from the DPDK level. By caching them at the OVS/dpdk.c level, the ISA checks remain runtime for the CPU where they are executed, but subsequent checks for the same ISA feature become much cheaper. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- v8: Add NEWS entry. --- NEWS | 1 + lib/dpdk.c | 28 ++++++++++++++++++++++++---- 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index a5bb16da2..0a093e582 100644 --- a/NEWS +++ b/NEWS @@ -33,6 +33,7 @@ v2.15.0 - xx xxx xxxx - DPDK: * Removed support for vhost-user dequeue zero-copy. * Add support for DPDK 20.11. + * Cache results for CPU ISA checks, reduces overhead on repeated lookups. - Userspace datapath: * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which restricts a flow dump to a single PMD thread if set. diff --git a/lib/dpdk.c b/lib/dpdk.c index 319540394..c883a4b8b 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -614,13 +614,33 @@ print_dpdk_version(void) puts(rte_version()); } +/* Avoid calling rte_cpu_get_flag_enabled() excessively, by caching the + * result of the call for each CPU flag in a static variable. To avoid + * allocating large numbers of static variables, use a uint8 as a bitfield. + * Note the macro must only return if the ISA check is done and available. + */ +#define ISA_CHECK_DONE_BIT (1 << 0) +#define ISA_AVAILABLE_BIT (1 << 1) + #define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG) \ do { \ if (strncmp(feature, name_str, strlen(name_str)) == 0) { \ - int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ - VLOG_DBG("CPU flag %s, available %s\n", name_str, \ - has_isa ? "yes" : "no"); \ - return true; \ + static uint8_t isa_check_##RTE_CPUFLAG; \ + int check = isa_check_##RTE_CPUFLAG & ISA_CHECK_DONE_BIT; \ + if (OVS_UNLIKELY(!check)) { \ + int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ + VLOG_DBG("CPU flag %s, available %s\n", \ + name_str, has_isa ? "yes" : "no"); \ + isa_check_##RTE_CPUFLAG = ISA_CHECK_DONE_BIT; \ + if (has_isa) { \ + isa_check_##RTE_CPUFLAG |= ISA_AVAILABLE_BIT; \ + } \ + } \ + if (isa_check_##RTE_CPUFLAG & ISA_AVAILABLE_BIT) { \ + return true; \ + } else { \ + return false; \ + } \ } \ } while (0) From patchwork Fri Feb 12 17:17:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439955 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgGF0sNNz9sTD for ; Sat, 13 Feb 2021 04:19:17 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id A470A865E0; Fri, 12 Feb 2021 17:19:15 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z0OA9fP6FmyM; Fri, 12 Feb 2021 17:19:12 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id C693987008; Fri, 12 Feb 2021 17:19:01 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id AEC96C0891; Fri, 12 Feb 2021 17:19:01 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id B2969C1825 for ; Fri, 12 Feb 2021 17:18:58 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 890DF6F92E for ; Fri, 12 Feb 2021 17:18:58 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wPYq5g78-5Xq for ; Fri, 12 Feb 2021 17:18:56 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id E6EC06F8A6; Fri, 12 Feb 2021 17:18:50 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 4F2B36F8AC for ; Fri, 12 Feb 2021 17:17:53 +0000 (UTC) IronPort-SDR: u8OYHDZkczkD21jE5frKbi7xSnMJib7+MI8HDGEJFWAubnrJq0XuruHEzSLj22nFNUQk4ZQvST mtWdBCnp5K+Q== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595236" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595236" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:53 -0800 IronPort-SDR: 5BH0IvggzOkZXjiVcbvwEwNuOE2BO6xskUqxOJpajNWpfuFXpI2HGRZlmbMriM3v+dAu8DQDWc e7daLxnbF29w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485186" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:51 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:15 +0000 Message-Id: <20210212171718.2189798-14-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 13/16] dpcls-avx512: enabling avx512 vector popcount instruction. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit enables the AVX512-VPOPCNTDQ Vector Popcount instruction. This instruction is not available on every CPU that supports the AVX512-F Foundation ISA, hence it is enabled only when the additional VPOPCNTDQ ISA check is passed. The vector popcount instruction is used instead of the AVX512 popcount emulation code present in the avx512 optimized DPCLS today. It provides higher performance in the SIMD miniflow processing as that requires the popcount to calculate the miniflow block indexes. Signed-off-by: Harry van Haaren --- v8: Add NEWS entry. --- NEWS | 3 + lib/dpdk.c | 1 + lib/dpif-netdev-lookup-avx512-gather.c | 84 ++++++++++++++++++++------ 3 files changed, 70 insertions(+), 18 deletions(-) diff --git a/NEWS b/NEWS index 0a093e582..5f1e3b5e0 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,9 @@ Post-v2.15.0 * Enable AVX512 optimized DPCLS to search subtables with larger miniflows. * Add more specialized DPCLS subtables to cover common rules, enhancing the lookup performance. + * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the + CPU supports it. This enhances performance by using the native vpopcount + instructions, instead of the emulated version of vpopcount. v2.15.0 - xx xxx xxxx --------------------- diff --git a/lib/dpdk.c b/lib/dpdk.c index c883a4b8b..a9494a40f 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -655,6 +655,7 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature) #if __x86_64__ /* CPU flags only defined for the architecture that support it. */ CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F); + CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ); CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2); #endif diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 3a684fadf..9a3273dc6 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -53,6 +53,15 @@ VLOG_DEFINE_THIS_MODULE(dpif_lookup_avx512_gather); + +/* Wrapper function required to enable ISA. */ +static inline __m512i +__attribute__((__target__("avx512vpopcntdq"))) +_mm512_popcnt_epi64_wrapper(__m512i v_in) +{ + return _mm512_popcnt_epi64(v_in); +} + static inline __m512i _mm512_popcnt_epi64_manual(__m512i v_in) { @@ -126,7 +135,8 @@ avx512_blocks_gather(__m512i v_u0, /* reg of u64 of all u0 bits */ __mmask64 u1_bcast_msk, /* mask of u1 lanes */ const uint64_t pkt_mf_u0_pop, /* num bits in u0 of pkt */ __mmask64 zero_mask, /* maskz if pkt not have mf bit */ - __mmask64 u64_lanes_mask) /* total lane count to use */ + __mmask64 u64_lanes_mask, /* total lane count to use */ + const uint32_t use_vpop) /* use AVX512 vpopcntdq */ { /* Suggest to compiler to load tbl blocks ahead of gather() */ __m512i v_tbl_blocks = _mm512_maskz_loadu_epi64(u64_lanes_mask, @@ -140,8 +150,15 @@ avx512_blocks_gather(__m512i v_u0, /* reg of u64 of all u0 bits */ tbl_mf_masks); __m512i v_masks = _mm512_and_si512(v_pkt_bits, v_tbl_masks); - /* Manual AVX512 popcount for u64 lanes. */ - __m512i v_popcnts = _mm512_popcnt_epi64_manual(v_masks); + /* Calculate AVX512 popcount for u64 lanes using the native instruction + * if available, or using emulation if not available. + */ + __m512i v_popcnts; + if (use_vpop) { + v_popcnts = _mm512_popcnt_epi64_wrapper(v_masks); + } else { + v_popcnts = _mm512_popcnt_epi64_manual(v_masks); + } /* Add popcounts and offset for u1 bits. */ __m512i v_idx_u0_offset = _mm512_maskz_set1_epi64(u1_bcast_msk, @@ -166,7 +183,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, const struct netdev_flow_key *keys[], struct dpcls_rule **rules, const uint32_t bit_count_u0, - const uint32_t bit_count_u1) + const uint32_t bit_count_u1, + const uint32_t use_vpop) { OVS_ALIGNED_VAR(CACHE_LINE_SIZE)uint64_t block_cache[BLOCKS_CACHE_SIZE]; uint32_t hashes[NETDEV_MAX_BURST]; @@ -218,7 +236,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, u1_bcast_mask, pkt_mf_u0_pop, zero_mask, - bit_count_total_mask); + bit_count_total_mask, + use_vpop); _mm512_storeu_si512(&block_cache[i * MF_BLOCKS_PER_PACKET], v_blocks); if (bit_count_total > 8) { @@ -239,7 +258,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, u1_bcast_mask_gt8, pkt_mf_u0_pop, zero_mask_gt8, - bit_count_gt8_mask); + bit_count_gt8_mask, + use_vpop); _mm512_storeu_si512(&block_cache[(i * MF_BLOCKS_PER_PACKET) + 8], v_blocks_gt8); } @@ -288,7 +308,11 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, return found_map; } -/* Expand out specialized functions with U0 and U1 bit attributes. */ +/* Expand out specialized functions with U0 and U1 bit attributes. As the + * AVX512 vpopcnt instruction is not supported on all AVX512 capable CPUs, + * create two functions for each miniflow signature. This allows the runtime + * CPU detection in probe() to select the ideal implementation. + */ #define DECLARE_OPTIMIZED_LOOKUP_FUNCTION(U0, U1) \ static uint32_t \ dpcls_avx512_gather_mf_##U0##_##U1(struct dpcls_subtable *subtable, \ @@ -296,7 +320,20 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, const struct netdev_flow_key *keys[], \ struct dpcls_rule **rules) \ { \ - return avx512_lookup_impl(subtable, keys_map, keys, rules, U0, U1); \ + const uint32_t use_vpop = 0; \ + return avx512_lookup_impl(subtable, keys_map, keys, rules, \ + U0, U1, use_vpop); \ + } \ + \ + static uint32_t __attribute__((__target__("avx512vpopcntdq"))) \ + dpcls_avx512_gather_mf_##U0##_##U1##_vpop(struct dpcls_subtable *subtable,\ + uint32_t keys_map, \ + const struct netdev_flow_key *keys[], \ + struct dpcls_rule **rules) \ + { \ + const uint32_t use_vpop = 1; \ + return avx512_lookup_impl(subtable, keys_map, keys, rules, \ + U0, U1, use_vpop); \ } \ DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4) @@ -306,11 +343,18 @@ DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) -/* Check if a specialized function is valid for the required subtable. */ -#define CHECK_LOOKUP_FUNCTION(U0, U1) \ +/* Check if a specialized function is valid for the required subtable. + * The use_vpop variable is used to decide if the VPOPCNT instruction can be + * used or not. + */ +#define CHECK_LOOKUP_FUNCTION(U0, U1, use_vpop) \ ovs_assert((U0 + U1) <= (NUM_U64_IN_ZMM_REG * 2)); \ if (!f && u0_bits == U0 && u1_bits == U1) { \ - f = dpcls_avx512_gather_mf_##U0##_##U1; \ + if (use_vpop) { \ + f = dpcls_avx512_gather_mf_##U0##_##U1##_vpop; \ + } else { \ + f = dpcls_avx512_gather_mf_##U0##_##U1; \ + } \ } static uint32_t @@ -318,9 +362,11 @@ dpcls_avx512_gather_mf_any(struct dpcls_subtable *subtable, uint32_t keys_map, const struct netdev_flow_key *keys[], struct dpcls_rule **rules) { + const uint32_t use_vpop = 0; return avx512_lookup_impl(subtable, keys_map, keys, rules, subtable->mf_bits_set_unit0, - subtable->mf_bits_set_unit1); + subtable->mf_bits_set_unit1, + use_vpop); } dpcls_subtable_lookup_func @@ -334,12 +380,14 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, uint32_t u1_bits) return NULL; } - CHECK_LOOKUP_FUNCTION(9, 4); - CHECK_LOOKUP_FUNCTION(9, 1); - CHECK_LOOKUP_FUNCTION(5, 3); - CHECK_LOOKUP_FUNCTION(5, 1); - CHECK_LOOKUP_FUNCTION(4, 1); - CHECK_LOOKUP_FUNCTION(4, 0); + int use_vpop = dpdk_get_cpu_has_isa("x86_64", "avx512vpopcntdq"); + + CHECK_LOOKUP_FUNCTION(9, 4, use_vpop); + CHECK_LOOKUP_FUNCTION(9, 1, use_vpop); + CHECK_LOOKUP_FUNCTION(5, 3, use_vpop); + CHECK_LOOKUP_FUNCTION(5, 1, use_vpop); + CHECK_LOOKUP_FUNCTION(4, 1, use_vpop); + CHECK_LOOKUP_FUNCTION(4, 0, use_vpop); /* Check if the _any looping version of the code can perform this miniflow * lookup. Performance gain may be less pronounced due to non-specialized From patchwork Fri Feb 12 17:17:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439956 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgGX6Zppz9sTD for ; Sat, 13 Feb 2021 04:19:32 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 6DE88863FF; Fri, 12 Feb 2021 17:19:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UBEcp1Ftc2DJ; Fri, 12 Feb 2021 17:19:29 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id DB30886F1B; Fri, 12 Feb 2021 17:19:21 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id BFE82C0891; Fri, 12 Feb 2021 17:19:21 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 99125C0891 for ; Fri, 12 Feb 2021 17:19:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 560836F976 for ; Fri, 12 Feb 2021 17:19:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9HrKG-ynt1zu for ; Fri, 12 Feb 2021 17:19:18 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id EC9F86F975; Fri, 12 Feb 2021 17:19:17 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id C425C6F79F for ; Fri, 12 Feb 2021 17:17:54 +0000 (UTC) IronPort-SDR: nEoaX8ZocR7GHAt12QAabw9PU0ybBF33bQmeo1gybBs9Vv2c9KfjuHKF6/gRmPOOBbkiSzglf7 Curntvxk6AQA== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595239" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595239" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:54 -0800 IronPort-SDR: O3CqZcc+XvuLuFL5k7zzsqE6D8pIi9KHaHffJdrO7GDapfxZnsLJkdizw1MmgGjnRkn7Gx/Ap1 BB7ni0GnJY3w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485215" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:53 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:16 +0000 Message-Id: <20210212171718.2189798-15-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 14/16] dpif-netdev: Optimize dp output action X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit optimizes the output action, by enabling the compiler to optimize the code better through reducing code complexity. The core concept of this optimization is that the array-length checks have already been performed above the copying code, so can be removed. Removing of the per-packet length checks allows the compiler to auto-vectorize the stores using SIMD registers. Signed-off-by: Harry van Haaren --- v8: Add NEWS entry. --- NEWS | 1 + lib/dpif-netdev.c | 23 ++++++++++++++++++----- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index 5f1e3b5e0..2ffc155f9 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,7 @@ Post-v2.15.0 * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. + * Optimize dp_netdev_output by enhancing compiler optimization potential. v2.15.0 - xx xxx xxxx --------------------- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5e83755d7..b2cf1bd46 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7254,12 +7254,25 @@ dp_execute_output_action(struct dp_netdev_pmd_thread *pmd, pmd->n_output_batches++; } - struct dp_packet *packet; - DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) { - p->output_pkts_rxqs[dp_packet_batch_size(&p->output_pkts)] = - pmd->ctx.last_rxq; - dp_packet_batch_add(&p->output_pkts, packet); + /* The above checks ensure that there is enough space in the output batch. + * Using dp_packet_batch_add() has a branch to check if the batch is full. + * This branch reduces the compiler's ability to optimize efficiently. The + * below code implements packet movement between batches without checks, + * with the required semantics of output batch perhaps containing packets. + */ + int batch_size = dp_packet_batch_size(packets_); + int out_batch_idx = dp_packet_batch_size(&p->output_pkts); + struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq; + struct dp_packet_batch *output_batch = &p->output_pkts; + + for (int i = 0; i < batch_size; i++) { + struct dp_packet *packet = packets_->packets[i]; + p->output_pkts_rxqs[out_batch_idx] = rxq; + output_batch->packets[out_batch_idx] = packet; + out_batch_idx++; } + output_batch->count += batch_size; + return true; } From patchwork Fri Feb 12 17:17:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439958 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgHs2gyZz9sTD for ; Sat, 13 Feb 2021 04:20:41 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id D181E85B8A; Fri, 12 Feb 2021 17:20:39 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6uO1PgnTcFcM; Fri, 12 Feb 2021 17:20:36 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 846418761F; Fri, 12 Feb 2021 17:19:37 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 61026C1E72; Fri, 12 Feb 2021 17:19:37 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 00451C0891 for ; Fri, 12 Feb 2021 17:19:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id BF8756F99E for ; Fri, 12 Feb 2021 17:19:35 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PNcEmL3xs3E2 for ; Fri, 12 Feb 2021 17:19:34 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 76BC46F8CF; Fri, 12 Feb 2021 17:19:34 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id C555A6F7B3 for ; Fri, 12 Feb 2021 17:17:56 +0000 (UTC) IronPort-SDR: IQ7HjIuCRn2NI2RDeZZdAEyh+l7Ur64CfQwx9xlwc+RfOY1aBgLYHAQA+DbSz9POYvSIf3Zb3o uWvywtvjKtOA== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595249" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595249" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:56 -0800 IronPort-SDR: I83lJSyapFqtkrwVTI4yPFsSLnitC+7QMXHPoWNPK91Eq4xjCh1H9vUH0OIT95xxkkUeBxESWO 1MZiWPJ05wKA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485252" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:54 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:17 +0000 Message-Id: <20210212171718.2189798-16-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 15/16] netdev: Optimize netdev_send_prepare_batch X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Optimize for the best case here where all packets will be compatible with 'netdev_flags'. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- v9: rebase 2 --- NEWS | 2 ++ lib/netdev.c | 31 ++++++++++++++++++++++--------- 2 files changed, 24 insertions(+), 9 deletions(-) diff --git a/NEWS b/NEWS index 2ffc155f9..cbdcf53a1 100644 --- a/NEWS +++ b/NEWS @@ -14,6 +14,8 @@ Post-v2.15.0 CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. * Optimize dp_netdev_output by enhancing compiler optimization potential. + * Optimize netdev sending by assuming the happy case, and using fallback + for if the netdev doesnt meet the required HWOL needs of a packet. v2.15.0 - xx xxx xxxx --------------------- diff --git a/lib/netdev.c b/lib/netdev.c index 91e91955c..29a5f1aa9 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -837,20 +837,33 @@ static void netdev_send_prepare_batch(const struct netdev *netdev, struct dp_packet_batch *batch) { - struct dp_packet *packet; - size_t i, size = dp_packet_batch_size(batch); + struct dp_packet *p; + uint32_t i, size = dp_packet_batch_size(batch); + char *err_msg = NULL; - DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { - char *errormsg = NULL; + for (i = 0; i < size; i++) { + p = batch->packets[i]; + int pkt_ok = netdev_send_prepare_packet(netdev->ol_flags, p, &err_msg); - if (netdev_send_prepare_packet(netdev->ol_flags, packet, &errormsg)) { - dp_packet_batch_refill(batch, packet, i); + if (OVS_UNLIKELY(!pkt_ok)) { + goto refill_loop; + } + } + + return; + +refill_loop: + /* Loop through packets from the start of the batch again. This is the + * exceptional case where packets aren't compatible with 'netdev_flags'. */ + DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, p, batch) { + if (netdev_send_prepare_packet(netdev->ol_flags, p, &err_msg)) { + dp_packet_batch_refill(batch, p, i); } else { - dp_packet_delete(packet); + dp_packet_delete(p); COVERAGE_INC(netdev_send_prepare_drops); VLOG_WARN_RL(&rl, "%s: Packet dropped: %s", - netdev_get_name(netdev), errormsg); - free(errormsg); + netdev_get_name(netdev), err_msg); + free(err_msg); } } } From patchwork Fri Feb 12 17:17:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1439963 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DcgMF0w0Jz9sCD for ; Sat, 13 Feb 2021 04:23:37 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 0C5156F8D5 for ; Fri, 12 Feb 2021 17:23:35 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XpzUwg-fIryZ for ; Fri, 12 Feb 2021 17:23:32 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 1C7E16F7CC; Fri, 12 Feb 2021 17:23:32 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTP id A105F6F8A9; Fri, 12 Feb 2021 17:19:48 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 708ECC0891; Fri, 12 Feb 2021 17:19:48 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id D8B05C013A for ; Fri, 12 Feb 2021 17:19:46 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 9EECB6F8A9 for ; Fri, 12 Feb 2021 17:19:46 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fEZ-Cs-r0-QU for ; Fri, 12 Feb 2021 17:19:44 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 045456F9A2; Fri, 12 Feb 2021 17:19:43 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp3.osuosl.org (Postfix) with ESMTPS id 778AF6F8C7 for ; Fri, 12 Feb 2021 17:17:58 +0000 (UTC) IronPort-SDR: qFiNcbMD3d38ES/ThIQAKGqKVtGOVA6N1dNmveoGXjqDoVnINDP4V7spSmgFoOmTdhtNqq+5Ny 8hA7orilBAsw== X-IronPort-AV: E=McAfee;i="6000,8403,9893"; a="201595253" X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="201595253" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2021 09:17:58 -0800 IronPort-SDR: RXNYirqpiFTjVqsi2r6AeMahkLk0rf19qGh5eOSnrI3FCV7F4cY1tsvedOG7CBSDRit/C+FSn1 s90zKowxdnXw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,174,1610438400"; d="scan'208";a="360485296" Received: from silpixa00400633.ir.intel.com ([10.237.213.44]) by orsmga003.jf.intel.com with ESMTP; 12 Feb 2021 09:17:56 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Fri, 12 Feb 2021 17:17:18 +0000 Message-Id: <20210212171718.2189798-17-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210212171718.2189798-1-harry.van.haaren@intel.com> References: <20210104163653.2218575-1-harry.van.haaren@intel.com> <20210212171718.2189798-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v9 16/16] dpif-netdev: POC of future DPIF and MFEX AVX512 optimizations X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This is a POC patch, showing future DPIF and MFEX optimizations. The main optimization is doing MiniflowExtract in AVX512. This speeds up the specific protocol parsing a lot. Other optimizations for DPIF show value in removing complexity from the code by specialization. In particular if only DPCLS is enabled, we can avoid rebatching packets. Signed-off-by: Harry van Haaren --- lib/automake.mk | 1 + lib/dpdk.c | 1 + lib/dpif-netdev-avx512.c | 178 +++++++++++++++++++++---------- lib/dpif-netdev-private-dpif.h | 6 ++ lib/dpif-netdev-private-thread.h | 10 ++ lib/flow_avx512.h | 117 ++++++++++++++++++++ 6 files changed, 255 insertions(+), 58 deletions(-) create mode 100644 lib/flow_avx512.h diff --git a/lib/automake.mk b/lib/automake.mk index 5e493ebaf..a5dbf7f7e 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -137,6 +137,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/fatal-signal.h \ lib/flow.c \ lib/flow.h \ + lib/flow_avx512.h \ lib/guarded-list.c \ lib/guarded-list.h \ lib/hash.c \ diff --git a/lib/dpdk.c b/lib/dpdk.c index a9494a40f..a82ff04b6 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -655,6 +655,7 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature) #if __x86_64__ /* CPU flags only defined for the architecture that support it. */ CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F); + CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI); CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ); CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2); #endif diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index fff469e10..29b4b856a 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -35,6 +35,8 @@ #include "immintrin.h" +#include "flow_avx512.h" + /* Structure to contain per-packet metadata that must be attributed to the * dp netdev flow. This is unfortunate to have to track per packet, however * it's a bit awkward to maintain them in a performant way. This structure @@ -68,15 +70,24 @@ dp_netdev_input_outer_avx512_probe(void) return 0; } -int32_t -dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, - struct dp_packet_batch *packets, - odp_port_t in_port) +/* Specialize DPIF based on enabled options, eg for DPCLS only. */ +static inline ALWAYS_INLINE int32_t +dp_netdev_input_outer_avx512_impl(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port, + uint32_t dpcls_only) { - /* Allocate DPIF userdata. */ if (OVS_UNLIKELY(!pmd->netdev_input_func_userdata)) { pmd->netdev_input_func_userdata = xmalloc_pagealign(sizeof(struct dpif_userdata)); + /* TODO: Enable MFEX selector/autovalidator as done for DPCLS. + * This code shows the POC value, not final upstream code. + * As the code uses AVX512-VBMI, check for ISA at runtime. + */ + int avx512vbmi = dpdk_get_cpu_has_isa("x86_64", "avx512vbmi"); + if (avx512vbmi) { + pmd->mfex_func = mfex_avx512_ipv4_udp; + } } struct dpif_userdata *ud = pmd->netdev_input_func_userdata; @@ -84,6 +95,14 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, struct netdev_flow_key **key_ptrs = ud->key_ptrs; struct pkt_flow_meta *pkt_meta = ud->pkt_meta; + /* TODO: make runtime command to allow users to disable/enable. + * Not all users need TCP-flags or bytes per rule, and it costs performance + * to always calculate it. Enabling this costs ~6 cycles/pkt. It will be + * enabled by default for consistency & backwards compat, but disabling + * could be investigated by users if they so desire. + */ + uint32_t do_pkt_meta = 1; + /* Stores the computed output: a rule pointer for each packet */ /* The AVX512 DPIF implementation handles rules in a way that is optimized * for reducing data-movement between HWOL/EMC/SMC and DPCLS. This is @@ -92,7 +111,8 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, * array. Later the two arrays are merged by AVX-512 expand instructions. */ struct dpcls_rule *rules[NETDEV_MAX_BURST]; - struct dpcls_rule *dpcls_rules[NETDEV_MAX_BURST]; + struct dpcls_rule *dpcls_rules_impl[NETDEV_MAX_BURST]; + struct dpcls_rule **dpcls_rules = dpcls_rules_impl; uint32_t dpcls_key_idx = 0; for (uint32_t i = 0; i < NETDEV_MAX_BURST; i += 8) { @@ -100,12 +120,8 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, _mm512_storeu_si512(&dpcls_rules[i], _mm512_setzero_si512()); } - /* Prefetch each packet's metadata */ - const size_t batch_size = dp_packet_batch_size(packets); - for (int i = 0; i < batch_size; i++) { - struct dp_packet *packet = packets->packets[i]; - OVS_PREFETCH(dp_packet_data(packet)); - pkt_metadata_prefetch_init(&packet->md); + if (dpcls_only) { + dpcls_rules = rules; } /* Check if EMC or SMC are enabled */ @@ -120,32 +136,41 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, uint32_t hwol_emc_smc_hitmask = 0; /* Perform first packet interation */ + const size_t batch_size = dp_packet_batch_size(packets); uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; - uint32_t iter = lookup_pkts_bitmask; - while (iter) { - uint32_t i = __builtin_ctz(iter); - iter = _blsr_u64(iter); + + const uint32_t pf_ahead = 4; + int pf = batch_size < pf_ahead ? batch_size : pf_ahead; + for (int i = 0; i < pf; i++) { + struct dp_packet *packet = packets->packets[i]; + char *pkt_data_ptr = dp_packet_data(packet); + OVS_PREFETCH(pkt_data_ptr); + pkt_metadata_prefetch_init(&packet->md); + } + + for (int i = 0; i < batch_size; i++) { + if (i + pf < batch_size) { + struct dp_packet *pfm = packets->packets[i + pf]; + char *pkt_data_ptr = dp_packet_data(pfm); + OVS_PREFETCH(pkt_data_ptr); + pkt_metadata_prefetch_init(&pfm->md); + } /* Get packet pointer from bitmask and packet md */ struct dp_packet *packet = packets->packets[i]; pkt_metadata_init(&packet->md, in_port); - struct dp_netdev_flow *f = NULL; /* Check for partial hardware offload mark */ uint32_t mark; - if (dp_packet_has_flow_mark(packet, &mark)) { + if (!dpcls_only && dp_packet_has_flow_mark(packet, &mark)) { f = mark_to_flow_find(pmd, mark); if (f) { rules[i] = &f->cr; - - /* This is nasty - instead of using the HWOL provided flow, - * parse the packet data anyway to find the location of the TCP - * header to extract the TCP flags for the rule. - */ - pkt_meta[i].tcp_flags = parse_tcp_flags(packet); - - pkt_meta[i].bytes = dp_packet_size(packet); + if (do_pkt_meta) { + pkt_meta[i].tcp_flags = parse_tcp_flags(packet); + pkt_meta[i].bytes = dp_packet_size(packet); + } hwol_emc_smc_hitmask |= (1 << i); continue; } @@ -153,16 +178,29 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, /* Do miniflow extract into keys */ struct netdev_flow_key *key = &keys[i]; - miniflow_extract(packet, &key->mf); - /* Cache TCP and byte values for all packets */ - pkt_meta[i].bytes = dp_packet_size(packet); - pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(&key->mf); + const struct pkt_metadata *md = &packet->md; + if (pmd->mfex_func) { + uint32_t match = pmd->mfex_func(packet, + (struct miniflow *)&key->mf, + md->in_port.odp_port); + if (!match) { + miniflow_extract(packet, &key->mf); + } + } else { + miniflow_extract(packet, &key->mf); + } + + if (do_pkt_meta) { + /* Cache TCP and byte values for all packets */ + pkt_meta[i].bytes = dp_packet_size(packet); + pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(&key->mf); + } key->len = netdev_flow_key_size(miniflow_n_values(&key->mf)); key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, &key->mf); - if (emc_enabled) { + if (!dpcls_only && emc_enabled) { f = emc_lookup(&cache->emc_cache, key); if (f) { @@ -173,7 +211,7 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, } }; - if (smc_enabled && !f) { + if (!dpcls_only && smc_enabled && !f) { f = smc_lookup_single(pmd, packet, key); if (f) { rules[i] = &f->cr; @@ -207,28 +245,29 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, return -1; } - /* Merge DPCLS rules and HWOL/EMC/SMC rules. */ - uint32_t dpcls_idx = 0; - for (int i = 0; i < NETDEV_MAX_BURST; i += 8) { - /* Indexing here is somewhat complicated due to DPCLS output rule - * load index depending on the hitmask of HWOL/EMC/SMC. More - * packets from HWOL/EMC/SMC bitmask means less DPCLS rules are - * used. - */ - __m512i v_cache_rules = _mm512_loadu_si512(&rules[i]); - __m512i v_merged_rules = - _mm512_mask_expandloadu_epi64(v_cache_rules, + if (!dpcls_only) { + /* Merge DPCLS rules and HWOL/EMC/SMC rules. */ + uint32_t dpcls_idx = 0; + for (int i = 0; i < NETDEV_MAX_BURST; i += 8) { + /* Indexing here is somewhat complicated due to DPCLS output rule + * load index depending on the hitmask of HWOL/EMC/SMC. More + * packets from HWOL/EMC/SMC bitmask means less DPCLS rules are + * used. + */ + __m512i v_cache_rules = _mm512_loadu_si512(&rules[i]); + __m512i v_merged_rules = _mm512_mask_expandloadu_epi64(v_cache_rules, ~hwol_emc_smc_hitmask, &dpcls_rules[dpcls_idx]); - _mm512_storeu_si512(&rules[i], v_merged_rules); - - /* Update DPCLS load index and bitmask for HWOL/EMC/SMC hits. - * There are 8 output pointer per register, subtract the - * HWOL/EMC/SMC lanes equals the number of DPCLS rules consumed. - */ - uint32_t hitmask_FF = (hwol_emc_smc_hitmask & 0xFF); - dpcls_idx += 8 - __builtin_popcountll(hitmask_FF); - hwol_emc_smc_hitmask = (hwol_emc_smc_hitmask >> 8); + _mm512_storeu_si512(&rules[i], v_merged_rules); + + /* Update DPCLS load index and bitmask for HWOL/EMC/SMC hits. + * There are 8 output pointer per register, subtract the + * HWOL/EMC/SMC lanes equals the number of DPCLS rules consumed. + */ + uint32_t hitmask_FF = (hwol_emc_smc_hitmask & 0xFF); + dpcls_idx += 8 - __builtin_popcountll(hitmask_FF); + hwol_emc_smc_hitmask = (hwol_emc_smc_hitmask >> 8); + } } } @@ -280,13 +319,17 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, */ uint32_t bytes = 0; uint16_t tcp_flags = 0; - uint32_t bitmask_iter = batch_bitmask; - for (int i = 0; i < action_batch.count; i++) { - uint32_t idx = __builtin_ctzll(bitmask_iter); - bitmask_iter = _blsr_u64(bitmask_iter); - bytes += pkt_meta[idx].bytes; - tcp_flags |= pkt_meta[idx].tcp_flags; + /* Avoid this bitmasky/store-y work if possible */ + if (do_pkt_meta) { + uint32_t bitmask_iter = batch_bitmask; + for (int i = 0; i < action_batch.count; i++) { + uint32_t idx = __builtin_ctzll(bitmask_iter); + bitmask_iter = _blsr_u64(bitmask_iter); + + bytes += pkt_meta[idx].bytes; + tcp_flags |= pkt_meta[idx].tcp_flags; + } } dp_netdev_batch_execute(pmd, &action_batch, rules[rule_pkt_idx], @@ -296,5 +339,24 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, return 0; } +/* Specialized DPIFs remove branches/complexity in DPCLS only case. */ +int32_t +dpif_outer_avx512_wildcard(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port) +{ + uint32_t dpcls_only = 0; + return dp_netdev_input_outer_avx512_impl(pmd, packets, in_port, dpcls_only); +} + +int32_t +dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port) +{ + uint32_t dpcls_only = 1; + return dp_netdev_input_outer_avx512_impl(pmd, packets, in_port, dpcls_only); +} + #endif #endif diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h index 99fbda943..2f5cc0d5a 100644 --- a/lib/dpif-netdev-private-dpif.h +++ b/lib/dpif-netdev-private-dpif.h @@ -71,9 +71,15 @@ dp_netdev_input(struct dp_netdev_pmd_thread *pmd, /* AVX512 enabled DPIF implementation and probe functions */ int32_t dp_netdev_input_outer_avx512_probe(void); + +/* Two specialized instances of the same DPIF impl. */ int32_t dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, odp_port_t in_port); +int32_t +dpif_outer_avx512_wildcard(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port); #endif /* netdev-private.h */ diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h index aac2342a7..523a830f0 100644 --- a/lib/dpif-netdev-private-thread.h +++ b/lib/dpif-netdev-private-thread.h @@ -51,6 +51,14 @@ struct dp_netdev_pmd_thread_ctx { bool smc_enable_db; }; +struct miniflow; +typedef uint32_t (*dp_netdev_mfex_func)(struct dp_packet *pkt, + struct miniflow *mf, uint32_t in_port); + +/* Prototype for traffic specific AVX512 MFEX */ +uint32_t mfex_avx512_ipv4_udp(struct dp_packet *pkt, + struct miniflow *mf, uint32_t in_port); + /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can * not implement rx-wait for these devices. dpif-netdev needs to poll @@ -110,6 +118,8 @@ struct dp_netdev_pmd_thread { /* Pointer for per-DPIF implementation scratch space. */ void *netdev_input_func_userdata; + dp_netdev_mfex_func mfex_func; + struct seq *reload_seq; uint64_t last_reload_seq; diff --git a/lib/flow_avx512.h b/lib/flow_avx512.h new file mode 100644 index 000000000..a1bf01e5d --- /dev/null +++ b/lib/flow_avx512.h @@ -0,0 +1,117 @@ +#pragma once + +#include +#include + +#include "dp-packet.h" + +/* This file contains optimized implementations of miniflow_extract() + * for specific common traffic patterns. The optimizations allow for + * quick probing of a specific packet type, and if a match with a specific + * type is found, a shuffle like proceedure builds up the required miniflow. + * + * The functionality here can be easily auto-validated and tested against the + * scalar miniflow_extract() function. As such, manual review of the code by + * the community (although welcome) is not required. Confidence in the + * correctness of the code can be had from the autovalidation. + */ + +/* Generator for EtherType masks and values. */ +#define PATTERN_ETHERTYPE_GEN(type_b0, type_b1) \ + 0, 0, 0, 0, 0, 0, /* Ether MAC DST */ \ + 0, 0, 0, 0, 0, 0, /* Ether MAC SRC */ \ + type_b0, type_b1, /* EtherType */ + +#define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF) +#define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00) + +/* Generator for checking IPv4 ver, ihl, and proto */ +#define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \ + VER_IHL, /* Version and IHL */ \ + 0, 0, 0, /* DSCP, ECN, Total Lenght */ \ + 0, 0, /* Identification */ \ + /* Flags/Fragment offset: don't match MoreFrag (MF) or FragOffset */ \ + FLAG_OFF_B0, FLAG_OFF_B1, \ + 0, /* TTL */ \ + PROTO, /* Protocol */ \ + 0, 0, /* Header checksum */ \ + 0, 0, 0, 0, /* Src IP */ \ + 0, 0, 0, 0, /* Dst IP */ + +#define PATTERN_IPV4_MASK PATTERN_IPV4_GEN(0xFF, 0xFE, 0xFF, 0xFF) +#define PATTERN_IPV4_UDP PATTERN_IPV4_GEN(0x45, 0, 0, 0x11) + +#define NU 0 +#define PATTERN_IPV4_UDP_SHUFFLE \ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, NU, NU, /* Ether */ \ + 26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */ \ + 34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */ + +/* Masks for Ether()/IP()/UDP() traffic */ +static const uint8_t eth_ip_udp_mask[64] = { + PATTERN_ETHERTYPE_MASK PATTERN_IPV4_MASK +}; +static const uint8_t eth_ip_udp_values[64] = { + PATTERN_ETHERTYPE_IPV4 PATTERN_IPV4_UDP +}; +static const uint8_t eth_ip_udp_shuf[64] = { + PATTERN_IPV4_UDP_SHUFFLE +}; + +static inline void __attribute__((target("avx512vbmi"))) +avx512_ipv4_udp_store(const uint8_t *pkt, struct miniflow *mf, uint32_t in_port) +{ + int64_t u0b = 0x18a0000000000000; + int64_t u1b = 0x0000000000040401; + __m128i v_bits = {u0b, u1b}; + + /* Store mf Bits */ + uint64_t *bits = (void *)&mf->map.bits[0]; + uint64_t *blocks = miniflow_values(mf); + _mm_storeu_si128((__m128i*)bits, v_bits); + + /* Load packet and shuffle */ + __m512i v_pkt0 = _mm512_loadu_si512(&pkt[0]); + __m512i v_eth_ip_udp_shuf = _mm512_loadu_si512(eth_ip_udp_shuf); + + /* Shuffle pkt and store blocks */ + __mmask64 k_shufzero = 0b0000111111110000111111110011111111111111; + __m512i v_blk0 = _mm512_maskz_permutexvar_epi8(k_shufzero, v_eth_ip_udp_shuf, v_pkt0); + _mm512_storeu_si512(&blocks[2], v_blk0); + + uint64_t inp = ((uint64_t)in_port) << 32; + blocks[0] = inp; +} + +static inline uint32_t +avx512_ipv4_udp_probe(const uint8_t *pkt, uint32_t len) +{ + /* Packet data is masked to known IPv4/UDP parse length. */ + uint64_t klen = UINT64_MAX; + if (len < 64) { + klen = (1ULL << len) - 1; + } + + __m512i v_pkt0 = _mm512_maskz_loadu_epi8(klen, &pkt[0]); + __m512i v_eth_ip_udp_mask = _mm512_loadu_si512(eth_ip_udp_mask); + __m512i v_eth_ip_udp_vals = _mm512_loadu_si512(eth_ip_udp_values); + __m512i v_pkt0_masked = _mm512_and_si512(v_pkt0, v_eth_ip_udp_mask); + __mmask64 k_cmp = _mm512_cmpeq_epi8_mask(v_pkt0_masked, v_eth_ip_udp_vals); + + return (k_cmp == -1); +} + +uint32_t __attribute__((target("avx512vbmi"))) +mfex_avx512_ipv4_udp(struct dp_packet *dp_pkt, struct miniflow *mf, + uint32_t in_port) +{ + const uint8_t *pkt = dp_packet_data(dp_pkt); + const uint32_t size = dp_packet_size(dp_pkt); + + uint32_t match = avx512_ipv4_udp_probe(pkt, size); + if (match) { + avx512_ipv4_udp_store(pkt, mf, in_port); + return 1; + } + return 0; +}