From patchwork Wed Apr 7 09:34:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463260 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfM05nj0z9sV5 for ; Wed, 7 Apr 2021 19:32:44 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id AC05841895; Wed, 7 Apr 2021 09:32:41 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DJUHMg9OxNnS; Wed, 7 Apr 2021 09:32:33 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTP id 2374841826; Wed, 7 Apr 2021 09:32:32 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id CC5EDC0019; Wed, 7 Apr 2021 09:32:30 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0F6FFC000C for ; Wed, 7 Apr 2021 09:32:28 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id F2E5560A4E for ; Wed, 7 Apr 2021 09:32:26 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YhTWSbNhcSMa for ; Wed, 7 Apr 2021 09:32:23 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id A683560A43 for ; Wed, 7 Apr 2021 09:32:22 +0000 (UTC) IronPort-SDR: H3f/H/W1NVREL49kO9+9WvfbKpstWaNVknWC+PCueRNgsramblF84zMQa4xmmBUZkGgKX2gvyN Pars4r/xNlSQ== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344870" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344870" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:22 -0700 IronPort-SDR: qcO+oc5v/7EK9gCbnbllWbu85QVsTEBgP9sGh5Y35T+ht47HqfTMkf61t/tPDO8JnQ4wme5HSa zE0djXqYvudA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253790" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:20 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:28 +0100 Message-Id: <20210407093442.41568-2-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 01/15] dpif-netdev: Refactor to multiple header files. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren Split the very large file dpif-netdev.c and the datastructures it contains into multiple header files. Each header file is responsible for the datastructures of that component. This logical split allows better reuse and modularity of the code, and reduces the very large file dpif-netdev.c to be more managable. Due to dependencies between components, it is not possible to move component in smaller granularities than this patch. To explain the dependencies better, eg: DPCLS has no deps (from dpif-netdev.c file) FLOW depends on DPCLS (struct dpcls_rule) DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key) THREAD depends on DFC (struct dfc_cache) DFC_PROC depends on THREAD (struct pmd_thread) DPCLS lookup.h/c require only DPCLS DPCLS implementations require only dpif-netdev-lookup.h. - This change was made in 2.12 release with function pointers - This commit only refactors the name to "private-dpcls.h" Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- lib/automake.mk | 4 + lib/dpif-netdev-lookup-autovalidator.c | 1 - lib/dpif-netdev-lookup-avx512-gather.c | 1 - lib/dpif-netdev-lookup-generic.c | 1 - lib/dpif-netdev-lookup.h | 2 +- lib/dpif-netdev-private-dfc.h | 244 ++++++++++++ lib/dpif-netdev-private-dpcls.h | 129 ++++++ lib/dpif-netdev-private-flow.h | 162 ++++++++ lib/dpif-netdev-private-thread.h | 206 ++++++++++ lib/dpif-netdev-private.h | 100 +---- lib/dpif-netdev.c | 519 +------------------------ 11 files changed, 760 insertions(+), 609 deletions(-) create mode 100644 lib/dpif-netdev-private-dfc.h create mode 100644 lib/dpif-netdev-private-dpcls.h create mode 100644 lib/dpif-netdev-private-flow.h create mode 100644 lib/dpif-netdev-private-thread.h diff --git a/lib/automake.mk b/lib/automake.mk index 39afbff9d..0e83145b5 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -111,6 +111,10 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-lookup-generic.c \ lib/dpif-netdev.c \ lib/dpif-netdev.h \ + lib/dpif-netdev-private-dfc.h \ + lib/dpif-netdev-private-dpcls.h \ + lib/dpif-netdev-private-flow.h \ + lib/dpif-netdev-private-thread.h \ lib/dpif-netdev-private.h \ lib/dpif-netdev-perf.c \ lib/dpif-netdev-perf.h \ diff --git a/lib/dpif-netdev-lookup-autovalidator.c b/lib/dpif-netdev-lookup-autovalidator.c index 97b59fdd0..475e1ab1e 100644 --- a/lib/dpif-netdev-lookup-autovalidator.c +++ b/lib/dpif-netdev-lookup-autovalidator.c @@ -17,7 +17,6 @@ #include #include "dpif-netdev.h" #include "dpif-netdev-lookup.h" -#include "dpif-netdev-private.h" #include "openvswitch/vlog.h" VLOG_DEFINE_THIS_MODULE(dpif_lookup_autovalidator); diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 5e3634249..8fc1cdfa5 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -21,7 +21,6 @@ #include "dpif-netdev.h" #include "dpif-netdev-lookup.h" -#include "dpif-netdev-private.h" #include "cmap.h" #include "flow.h" #include "pvector.h" diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c index b1a0cfc36..e3b6be4b6 100644 --- a/lib/dpif-netdev-lookup-generic.c +++ b/lib/dpif-netdev-lookup-generic.c @@ -17,7 +17,6 @@ #include #include "dpif-netdev.h" -#include "dpif-netdev-private.h" #include "dpif-netdev-lookup.h" #include "bitmap.h" diff --git a/lib/dpif-netdev-lookup.h b/lib/dpif-netdev-lookup.h index bd72aa29b..59f51faa0 100644 --- a/lib/dpif-netdev-lookup.h +++ b/lib/dpif-netdev-lookup.h @@ -19,7 +19,7 @@ #include #include "dpif-netdev.h" -#include "dpif-netdev-private.h" +#include "dpif-netdev-private-dpcls.h" /* Function to perform a probe for the subtable bit fingerprint. * Returns NULL if not valid, or a valid function pointer to call for this diff --git a/lib/dpif-netdev-private-dfc.h b/lib/dpif-netdev-private-dfc.h new file mode 100644 index 000000000..52349a3fc --- /dev/null +++ b/lib/dpif-netdev-private-dfc.h @@ -0,0 +1,244 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020, 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_DFC_H +#define DPIF_NETDEV_PRIVATE_DFC_H 1 + +#include +#include + +#include "dpif.h" +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-flow.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* EMC cache and SMC cache compose the datapath flow cache (DFC) + * + * Exact match cache for frequently used flows + * + * The cache uses a 32-bit hash of the packet (which can be the RSS hash) to + * search its entries for a miniflow that matches exactly the miniflow of the + * packet. It stores the 'dpcls_rule' (rule) that matches the miniflow. + * + * A cache entry holds a reference to its 'dp_netdev_flow'. + * + * A miniflow with a given hash can be in one of EM_FLOW_HASH_SEGS different + * entries. The 32-bit hash is split into EM_FLOW_HASH_SEGS values (each of + * them is EM_FLOW_HASH_SHIFT bits wide and the remainder is thrown away). Each + * value is the index of a cache entry where the miniflow could be. + * + * + * Signature match cache (SMC) + * + * This cache stores a 16-bit signature for each flow without storing keys, and + * stores the corresponding 16-bit flow_table index to the 'dp_netdev_flow'. + * Each flow thus occupies 32bit which is much more memory efficient than EMC. + * SMC uses a set-associative design that each bucket contains + * SMC_ENTRY_PER_BUCKET number of entries. + * Since 16-bit flow_table index is used, if there are more than 2^16 + * dp_netdev_flow, SMC will miss them that cannot be indexed by a 16-bit value. + * + * + * Thread-safety + * ============= + * + * Each pmd_thread has its own private exact match cache. + * If dp_netdev_input is not called from a pmd thread, a mutex is used. + */ + +#define EM_FLOW_HASH_SHIFT 13 +#define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT) +#define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) +#define EM_FLOW_HASH_SEGS 2 + +/* SMC uses a set-associative design. A bucket contains a set of entries that + * a flow item can occupy. For now, it uses one hash function rather than two + * as for the EMC design. */ +#define SMC_ENTRY_PER_BUCKET 4 +#define SMC_ENTRIES (1u << 20) +#define SMC_BUCKET_CNT (SMC_ENTRIES / SMC_ENTRY_PER_BUCKET) +#define SMC_MASK (SMC_BUCKET_CNT - 1) + +/* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ +#define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 +#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ + DEFAULT_EM_FLOW_INSERT_INV_PROB) + +struct emc_entry { + struct dp_netdev_flow *flow; + struct netdev_flow_key key; /* key.hash used for emc hash value. */ +}; + +struct emc_cache { + struct emc_entry entries[EM_FLOW_HASH_ENTRIES]; + int sweep_idx; /* For emc_cache_slow_sweep(). */ +}; + +struct smc_bucket { + uint16_t sig[SMC_ENTRY_PER_BUCKET]; + uint16_t flow_idx[SMC_ENTRY_PER_BUCKET]; +}; + +/* Signature match cache, differentiate from EMC cache */ +struct smc_cache { + struct smc_bucket buckets[SMC_BUCKET_CNT]; +}; + +struct dfc_cache { + struct emc_cache emc_cache; + struct smc_cache smc_cache; +}; + +/* Iterate in the exact match cache through every entry that might contain a + * miniflow with hash 'HASH'. */ +#define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH) \ + for (uint32_t i__ = 0, srch_hash__ = (HASH); \ + (CURRENT_ENTRY) = &(EMC)->entries[srch_hash__ & EM_FLOW_HASH_MASK], \ + i__ < EM_FLOW_HASH_SEGS; \ + i__++, srch_hash__ >>= EM_FLOW_HASH_SHIFT) + +static inline bool +emc_entry_alive(struct emc_entry *ce) +{ + return ce->flow && !ce->flow->dead; +} + +static inline void +emc_clear_entry(struct emc_entry *ce) +{ + if (ce->flow) { + dp_netdev_flow_unref(ce->flow); + ce->flow = NULL; + } +} + +static inline void +smc_clear_entry(struct smc_bucket *b, int idx) +{ + b->flow_idx[idx] = UINT16_MAX; +} + +static inline void +emc_cache_init(struct emc_cache *flow_cache) +{ + int i; + + flow_cache->sweep_idx = 0; + for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { + flow_cache->entries[i].flow = NULL; + flow_cache->entries[i].key.hash = 0; + flow_cache->entries[i].key.len = sizeof(struct miniflow); + flowmap_init(&flow_cache->entries[i].key.mf.map); + } +} + +static inline void +smc_cache_init(struct smc_cache *smc_cache) +{ + int i, j; + for (i = 0; i < SMC_BUCKET_CNT; i++) { + for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { + smc_cache->buckets[i].flow_idx[j] = UINT16_MAX; + } + } +} + +static inline void +dfc_cache_init(struct dfc_cache *flow_cache) +{ + emc_cache_init(&flow_cache->emc_cache); + smc_cache_init(&flow_cache->smc_cache); +} + +static inline void +emc_cache_uninit(struct emc_cache *flow_cache) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { + emc_clear_entry(&flow_cache->entries[i]); + } +} + +static inline void +smc_cache_uninit(struct smc_cache *smc) +{ + int i, j; + + for (i = 0; i < SMC_BUCKET_CNT; i++) { + for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { + smc_clear_entry(&(smc->buckets[i]), j); + } + } +} + +static inline void +dfc_cache_uninit(struct dfc_cache *flow_cache) +{ + smc_cache_uninit(&flow_cache->smc_cache); + emc_cache_uninit(&flow_cache->emc_cache); +} + +/* Check and clear dead flow references slowly (one entry at each + * invocation). */ +static inline void +emc_cache_slow_sweep(struct emc_cache *flow_cache) +{ + struct emc_entry *entry = &flow_cache->entries[flow_cache->sweep_idx]; + + if (!emc_entry_alive(entry)) { + emc_clear_entry(entry); + } + flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) & EM_FLOW_HASH_MASK; +} + +/* Used to compare 'netdev_flow_key' in the exact match cache to a miniflow. + * The maps are compared bitwise, so both 'key->mf' and 'mf' must have been + * generated by miniflow_extract. */ +static inline bool +emc_flow_key_equal_mf(const struct netdev_flow_key *key, + const struct miniflow *mf) +{ + return !memcmp(&key->mf, mf, key->len); +} + +static inline struct dp_netdev_flow * +emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) +{ + struct emc_entry *current_entry; + + EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) { + if (current_entry->key.hash == key->hash + && emc_entry_alive(current_entry) + && emc_flow_key_equal_mf(¤t_entry->key, &key->mf)) { + + /* We found the entry with the 'key->mf' miniflow */ + return current_entry->flow; + } + } + + return NULL; +} + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-dfc.h */ diff --git a/lib/dpif-netdev-private-dpcls.h b/lib/dpif-netdev-private-dpcls.h new file mode 100644 index 000000000..f223a93e4 --- /dev/null +++ b/lib/dpif-netdev-private-dpcls.h @@ -0,0 +1,129 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020, 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_DPCLS_H +#define DPIF_NETDEV_PRIVATE_DPCLS_H 1 + +#include +#include + +#include "dpif.h" +#include "cmap.h" +#include "openvswitch/thread.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* Forward declaration for lookup_func typedef. */ +struct dpcls_subtable; +struct dpcls_rule; + +/* Must be public as it is instantiated in subtable struct below. */ +struct netdev_flow_key { + uint32_t hash; /* Hash function differs for different users. */ + uint32_t len; /* Length of the following miniflow (incl. map). */ + struct miniflow mf; + uint64_t buf[FLOW_MAX_PACKET_U64S]; +}; + +/* A rule to be inserted to the classifier. */ +struct dpcls_rule { + struct cmap_node cmap_node; /* Within struct dpcls_subtable 'rules'. */ + struct netdev_flow_key *mask; /* Subtable's mask. */ + struct netdev_flow_key flow; /* Matching key. */ + /* 'flow' must be the last field, additional space is allocated here. */ +}; + +/* Lookup function for a subtable in the dpcls. This function is called + * by each subtable with an array of packets, and a bitmask of packets to + * perform the lookup on. Using a function pointer gives flexibility to + * optimize the lookup function based on subtable properties and the + * CPU instruction set available at runtime. + */ +typedef +uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable *subtable, + uint32_t keys_map, + const struct netdev_flow_key *keys[], + struct dpcls_rule **rules); + +/* A set of rules that all have the same fields wildcarded. */ +struct dpcls_subtable { + /* The fields are only used by writers. */ + struct cmap_node cmap_node OVS_GUARDED; /* Within dpcls 'subtables_map'. */ + + /* These fields are accessed by readers. */ + struct cmap rules; /* Contains "struct dpcls_rule"s. */ + uint32_t hit_cnt; /* Number of match hits in subtable in current + optimization interval. */ + + /* Miniflow fingerprint that the subtable matches on. The miniflow "bits" + * are used to select the actual dpcls lookup implementation at subtable + * creation time. + */ + uint8_t mf_bits_set_unit0; + uint8_t mf_bits_set_unit1; + + /* The lookup function to use for this subtable. If there is a known + * property of the subtable (eg: only 3 bits of miniflow metadata is + * used for the lookup) then this can point at an optimized version of + * the lookup function for this particular subtable. */ + dpcls_subtable_lookup_func lookup_func; + + /* Caches the masks to match a packet to, reducing runtime calculations. */ + uint64_t *mf_masks; + + struct netdev_flow_key mask; /* Wildcards for fields (const). */ + /* 'mask' must be the last field, additional space is allocated here. */ +}; + +/* Iterate through netdev_flow_key TNL u64 values specified by 'FLOWMAP'. */ +#define NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP(VALUE, KEY, FLOWMAP) \ + MINIFLOW_FOR_EACH_IN_FLOWMAP (VALUE, &(KEY)->mf, FLOWMAP) + +/* Generates a mask for each bit set in the subtable's miniflow. */ +void +netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, + uint64_t *mf_masks, + const uint32_t mf_bits_u0, + const uint32_t mf_bits_u1); + +/* Matches a dpcls rule against the incoming packet in 'target' */ +bool dpcls_rule_matches_key(const struct dpcls_rule *rule, + const struct netdev_flow_key *target); + +static inline uint32_t +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet, + const struct miniflow *mf) +{ + uint32_t hash; + + if (OVS_LIKELY(dp_packet_rss_valid(packet))) { + hash = dp_packet_get_rss_hash(packet); + } else { + hash = miniflow_hash_5tuple(mf, 0); + dp_packet_set_rss_hash(packet, hash); + } + + return hash; +} + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-dpcls.h */ diff --git a/lib/dpif-netdev-private-flow.h b/lib/dpif-netdev-private-flow.h new file mode 100644 index 000000000..027d68e0b --- /dev/null +++ b/lib/dpif-netdev-private-flow.h @@ -0,0 +1,162 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020, 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_FLOW_H +#define DPIF_NETDEV_PRIVATE_FLOW_H 1 + +#include +#include + +#include "dpif.h" +#include "dpif-netdev-private-dpcls.h" +#include "cmap.h" +#include "openvswitch/thread.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* Contained by struct dp_netdev_flow's 'stats' member. */ +struct dp_netdev_flow_stats { + atomic_llong used; /* Last used time, in monotonic msecs. */ + atomic_ullong packet_count; /* Number of packets matched. */ + atomic_ullong byte_count; /* Number of bytes matched. */ + atomic_uint16_t tcp_flags; /* Bitwise-OR of seen tcp_flags values. */ +}; + +/* Contained by struct dp_netdev_flow's 'last_attrs' member. */ +struct dp_netdev_flow_attrs { + atomic_bool offloaded; /* True if flow is offloaded to HW. */ + ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */ +}; + +/* A flow in 'dp_netdev_pmd_thread's 'flow_table'. + * + * + * Thread-safety + * ============= + * + * Except near the beginning or ending of its lifespan, rule 'rule' belongs to + * its pmd thread's classifier. The text below calls this classifier 'cls'. + * + * Motivation + * ---------- + * + * The thread safety rules described here for "struct dp_netdev_flow" are + * motivated by two goals: + * + * - Prevent threads that read members of "struct dp_netdev_flow" from + * reading bad data due to changes by some thread concurrently modifying + * those members. + * + * - Prevent two threads making changes to members of a given "struct + * dp_netdev_flow" from interfering with each other. + * + * + * Rules + * ----- + * + * A flow 'flow' may be accessed without a risk of being freed during an RCU + * grace period. Code that needs to hold onto a flow for a while + * should try incrementing 'flow->ref_cnt' with dp_netdev_flow_ref(). + * + * 'flow->ref_cnt' protects 'flow' from being freed. It doesn't protect the + * flow from being deleted from 'cls' and it doesn't protect members of 'flow' + * from modification. + * + * Some members, marked 'const', are immutable. Accessing other members + * requires synchronization, as noted in more detail below. + */ +struct dp_netdev_flow { + const struct flow flow; /* Unmasked flow that created this entry. */ + /* Hash table index by unmasked flow. */ + const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */ + /* 'flow_table'. */ + const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */ + const ovs_u128 ufid; /* Unique flow identifier. */ + const ovs_u128 mega_ufid; /* Unique mega flow identifier. */ + const unsigned pmd_id; /* The 'core_id' of pmd thread owning this */ + /* flow. */ + + /* Number of references. + * The classifier owns one reference. + * Any thread trying to keep a rule from being freed should hold its own + * reference. */ + struct ovs_refcount ref_cnt; + + bool dead; + uint32_t mark; /* Unique flow mark assigned to a flow */ + + /* Statistics. */ + struct dp_netdev_flow_stats stats; + + /* Statistics and attributes received from the netdev offload provider. */ + atomic_int netdev_flow_get_result; + struct dp_netdev_flow_stats last_stats; + struct dp_netdev_flow_attrs last_attrs; + + /* Actions. */ + OVSRCU_TYPE(struct dp_netdev_actions *) actions; + + /* While processing a group of input packets, the datapath uses the next + * member to store a pointer to the output batch for the flow. It is + * reset after the batch has been sent out (See dp_netdev_queue_batches(), + * packet_batch_per_flow_init() and packet_batch_per_flow_execute()). */ + struct packet_batch_per_flow *batch; + + /* Packet classification. */ + char *dp_extra_info; /* String to return in a flow dump/get. */ + struct dpcls_rule cr; /* In owning dp_netdev's 'cls'. */ + /* 'cr' must be the last member. */ +}; + +static inline uint32_t +dp_netdev_flow_hash(const ovs_u128 *ufid) +{ + return ufid->u32[0]; +} + +/* Given the number of bits set in miniflow's maps, returns the size of the + * 'netdev_flow_key.mf' */ +static inline size_t +netdev_flow_key_size(size_t flow_u64s) +{ + return sizeof(struct miniflow) + MINIFLOW_VALUES_SIZE(flow_u64s); +} + +/* forward declaration required for EMC to unref flows */ +void dp_netdev_flow_unref(struct dp_netdev_flow *); + +/* A set of datapath actions within a "struct dp_netdev_flow". + * + * + * Thread-safety + * ============= + * + * A struct dp_netdev_actions 'actions' is protected with RCU. */ +struct dp_netdev_actions { + /* These members are immutable: they do not change during the struct's + * lifetime. */ + unsigned int size; /* Size of 'actions', in bytes. */ + struct nlattr actions[]; /* Sequence of OVS_ACTION_ATTR_* attributes. */ +}; + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-flow.h */ diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h new file mode 100644 index 000000000..5e5308b96 --- /dev/null +++ b/lib/dpif-netdev-private-thread.h @@ -0,0 +1,206 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2019, 2020, 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_THREAD_H +#define DPIF_NETDEV_PRIVATE_THREAD_H 1 + +#include +#include + +#include "dpif.h" +#include "cmap.h" + +#include "dpif-netdev-private-dfc.h" +#include "dpif-netdev-perf.h" +#include "openvswitch/thread.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* PMD Thread Structures */ + +/* A set of properties for the current processing loop that is not directly + * associated with the pmd thread itself, but with the packets being + * processed or the short-term system configuration (for example, time). + * Contained by struct dp_netdev_pmd_thread's 'ctx' member. */ +struct dp_netdev_pmd_thread_ctx { + /* Latest measured time. See 'pmd_thread_ctx_time_update()'. */ + long long now; + /* RX queue from which last packet was received. */ + struct dp_netdev_rxq *last_rxq; + /* EMC insertion probability context for the current processing cycle. */ + uint32_t emc_insert_min; +}; + +/* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate + * the performance overhead of interrupt processing. Therefore netdev can + * not implement rx-wait for these devices. dpif-netdev needs to poll + * these device to check for recv buffer. pmd-thread does polling for + * devices assigned to itself. + * + * DPDK used PMD for accessing NIC. + * + * Note, instance with cpu core id NON_PMD_CORE_ID will be reserved for + * I/O of all non-pmd threads. There will be no actual thread created + * for the instance. + * + * Each struct has its own flow cache and classifier per managed ingress port. + * For packets received on ingress port, a look up is done on corresponding PMD + * thread's flow cache and in case of a miss, lookup is performed in the + * corresponding classifier of port. Packets are executed with the found + * actions in either case. + * */ +struct dp_netdev_pmd_thread { + struct dp_netdev *dp; + struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */ + struct cmap_node node; /* In 'dp->poll_threads'. */ + + /* Per thread exact-match cache. Note, the instance for cpu core + * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly + * need to be protected by 'non_pmd_mutex'. Every other instance + * will only be accessed by its own pmd thread. */ + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct dfc_cache flow_cache; + + /* Flow-Table and classifiers + * + * Writers of 'flow_table' must take the 'flow_mutex'. Corresponding + * changes to 'classifiers' must be made while still holding the + * 'flow_mutex'. + */ + struct ovs_mutex flow_mutex; + struct cmap flow_table OVS_GUARDED; /* Flow table. */ + + /* One classifier per in_port polled by the pmd */ + struct cmap classifiers; + /* Periodically sort subtable vectors according to hit frequencies */ + long long int next_optimization; + /* End of the next time interval for which processing cycles + are stored for each polled rxq. */ + long long int rxq_next_cycle_store; + + /* Last interval timestamp. */ + uint64_t intrvl_tsc_prev; + /* Last interval cycles. */ + atomic_ullong intrvl_cycles; + + /* Current context of the PMD thread. */ + struct dp_netdev_pmd_thread_ctx ctx; + + struct seq *reload_seq; + uint64_t last_reload_seq; + + /* These are atomic variables used as a synchronization and configuration + * points for thread reload/exit. + * + * 'reload' atomic is the main one and it's used as a memory + * synchronization point for all other knobs and data. + * + * For a thread that requests PMD reload: + * + * * All changes that should be visible to the PMD thread must be made + * before setting the 'reload'. These changes could use any memory + * ordering model including 'relaxed'. + * * Setting the 'reload' atomic should occur in the same thread where + * all other PMD configuration options updated. + * * Setting the 'reload' atomic should be done with 'release' memory + * ordering model or stricter. This will guarantee that all previous + * changes (including non-atomic and 'relaxed') will be visible to + * the PMD thread. + * * To check that reload is done, thread should poll the 'reload' atomic + * to become 'false'. Polling should be done with 'acquire' memory + * ordering model or stricter. This ensures that PMD thread completed + * the reload process. + * + * For the PMD thread: + * + * * PMD thread should read 'reload' atomic with 'acquire' memory + * ordering model or stricter. This will guarantee that all changes + * made before setting the 'reload' in the requesting thread will be + * visible to the PMD thread. + * * All other configuration data could be read with any memory + * ordering model (including non-atomic and 'relaxed') but *only after* + * reading the 'reload' atomic set to 'true'. + * * When the PMD reload done, PMD should (optionally) set all the below + * knobs except the 'reload' to their default ('false') values and + * (mandatory), as the last step, set the 'reload' to 'false' using + * 'release' memory ordering model or stricter. This will inform the + * requesting thread that PMD has completed a reload cycle. + */ + atomic_bool reload; /* Do we need to reload ports? */ + atomic_bool wait_for_reload; /* Can we busy wait for the next reload? */ + atomic_bool reload_tx_qid; /* Do we need to reload static_tx_qid? */ + atomic_bool exit; /* For terminating the pmd thread. */ + + pthread_t thread; + unsigned core_id; /* CPU core id of this pmd thread. */ + int numa_id; /* numa node id of this pmd thread. */ + bool isolated; + + /* Queue id used by this pmd thread to send packets on all netdevs if + * XPS disabled for this netdev. All static_tx_qid's are unique and less + * than 'cmap_count(dp->poll_threads)'. */ + uint32_t static_tx_qid; + + /* Number of filled output batches. */ + int n_output_batches; + + struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and 'tx_ports'. */ + /* List of rx queues to poll. */ + struct hmap poll_list OVS_GUARDED; + /* Map of 'tx_port's used for transmission. Written by the main thread, + * read by the pmd thread. */ + struct hmap tx_ports OVS_GUARDED; + + struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */ + /* Map of 'tx_bond's used for transmission. Written by the main thread + * and read by the pmd thread. */ + struct cmap tx_bonds; + + /* These are thread-local copies of 'tx_ports'. One contains only tunnel + * ports (that support push_tunnel/pop_tunnel), the other contains ports + * with at least one txq (that support send). A port can be in both. + * + * There are two separate maps to make sure that we don't try to execute + * OUTPUT on a device which has 0 txqs or PUSH/POP on a non-tunnel device. + * + * The instances for cpu core NON_PMD_CORE_ID can be accessed by multiple + * threads, and thusly need to be protected by 'non_pmd_mutex'. Every + * other instance will only be accessed by its own pmd thread. */ + struct hmap tnl_port_cache; + struct hmap send_port_cache; + + /* Keep track of detailed PMD performance statistics. */ + struct pmd_perf_stats perf_stats; + + /* Stats from previous iteration used by automatic pmd + * load balance logic. */ + uint64_t prev_stats[PMD_N_STATS]; + atomic_count pmd_overloaded; + + /* Set to true if the pmd thread needs to be reloaded. */ + bool need_reload; + + /* Next time when PMD should try RCU quiescing. */ + long long next_rcu_quiesce; +}; + +#ifdef __cplusplus +} +#endif + +#endif /* dpif-netdev-private-thread.h */ diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h index 4fda1220b..d7b6fd7ec 100644 --- a/lib/dpif-netdev-private.h +++ b/lib/dpif-netdev-private.h @@ -18,95 +18,17 @@ #ifndef DPIF_NETDEV_PRIVATE_H #define DPIF_NETDEV_PRIVATE_H 1 -#include -#include - -#include "dpif.h" -#include "cmap.h" - -#ifdef __cplusplus -extern "C" { -#endif - -/* Forward declaration for lookup_func typedef. */ -struct dpcls_subtable; -struct dpcls_rule; - -/* Must be public as it is instantiated in subtable struct below. */ -struct netdev_flow_key { - uint32_t hash; /* Hash function differs for different users. */ - uint32_t len; /* Length of the following miniflow (incl. map). */ - struct miniflow mf; - uint64_t buf[FLOW_MAX_PACKET_U64S]; -}; - -/* A rule to be inserted to the classifier. */ -struct dpcls_rule { - struct cmap_node cmap_node; /* Within struct dpcls_subtable 'rules'. */ - struct netdev_flow_key *mask; /* Subtable's mask. */ - struct netdev_flow_key flow; /* Matching key. */ - /* 'flow' must be the last field, additional space is allocated here. */ -}; - -/* Lookup function for a subtable in the dpcls. This function is called - * by each subtable with an array of packets, and a bitmask of packets to - * perform the lookup on. Using a function pointer gives flexibility to - * optimize the lookup function based on subtable properties and the - * CPU instruction set available at runtime. +/* This header includes the various dpif-netdev components' header + * files in the appropriate order. Unfortunately there is a strict + * requirement in the include order due to dependences between components. + * E.g: + * DFC/EMC/SMC requires the netdev_flow_key struct + * PMD thread requires DFC_flow struct + * */ -typedef -uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable *subtable, - uint32_t keys_map, - const struct netdev_flow_key *keys[], - struct dpcls_rule **rules); - -/* A set of rules that all have the same fields wildcarded. */ -struct dpcls_subtable { - /* The fields are only used by writers. */ - struct cmap_node cmap_node OVS_GUARDED; /* Within dpcls 'subtables_map'. */ - - /* These fields are accessed by readers. */ - struct cmap rules; /* Contains "struct dpcls_rule"s. */ - uint32_t hit_cnt; /* Number of match hits in subtable in current - optimization interval. */ - - /* Miniflow fingerprint that the subtable matches on. The miniflow "bits" - * are used to select the actual dpcls lookup implementation at subtable - * creation time. - */ - uint8_t mf_bits_set_unit0; - uint8_t mf_bits_set_unit1; - - /* The lookup function to use for this subtable. If there is a known - * property of the subtable (eg: only 3 bits of miniflow metadata is - * used for the lookup) then this can point at an optimized version of - * the lookup function for this particular subtable. */ - dpcls_subtable_lookup_func lookup_func; - - /* Caches the masks to match a packet to, reducing runtime calculations. */ - uint64_t *mf_masks; - - struct netdev_flow_key mask; /* Wildcards for fields (const). */ - /* 'mask' must be the last field, additional space is allocated here. */ -}; - -/* Iterate through netdev_flow_key TNL u64 values specified by 'FLOWMAP'. */ -#define NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP(VALUE, KEY, FLOWMAP) \ - MINIFLOW_FOR_EACH_IN_FLOWMAP (VALUE, &(KEY)->mf, FLOWMAP) - -/* Generates a mask for each bit set in the subtable's miniflow. */ -void -netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, - uint64_t *mf_masks, - const uint32_t mf_bits_u0, - const uint32_t mf_bits_u1); - -/* Matches a dpcls rule against the incoming packet in 'target' */ -bool dpcls_rule_matches_key(const struct dpcls_rule *rule, - const struct netdev_flow_key *target); - -#ifdef __cplusplus -} -#endif +#include "dpif-netdev-private-flow.h" +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-dfc.h" +#include "dpif-netdev-private-thread.h" #endif /* netdev-private.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 251788b04..298bfe444 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -17,6 +17,7 @@ #include #include "dpif-netdev.h" #include "dpif-netdev-private.h" +#include "dpif-netdev-private-dfc.h" #include #include @@ -44,6 +45,7 @@ #include "dpif.h" #include "dpif-netdev-lookup.h" #include "dpif-netdev-perf.h" +#include "dpif-netdev-private-dfc.h" #include "dpif-provider.h" #include "dummy.h" #include "fat-rwlock.h" @@ -142,90 +144,6 @@ static struct odp_support dp_netdev_support = { .ct_orig_tuple6 = true, }; -/* EMC cache and SMC cache compose the datapath flow cache (DFC) - * - * Exact match cache for frequently used flows - * - * The cache uses a 32-bit hash of the packet (which can be the RSS hash) to - * search its entries for a miniflow that matches exactly the miniflow of the - * packet. It stores the 'dpcls_rule' (rule) that matches the miniflow. - * - * A cache entry holds a reference to its 'dp_netdev_flow'. - * - * A miniflow with a given hash can be in one of EM_FLOW_HASH_SEGS different - * entries. The 32-bit hash is split into EM_FLOW_HASH_SEGS values (each of - * them is EM_FLOW_HASH_SHIFT bits wide and the remainder is thrown away). Each - * value is the index of a cache entry where the miniflow could be. - * - * - * Signature match cache (SMC) - * - * This cache stores a 16-bit signature for each flow without storing keys, and - * stores the corresponding 16-bit flow_table index to the 'dp_netdev_flow'. - * Each flow thus occupies 32bit which is much more memory efficient than EMC. - * SMC uses a set-associative design that each bucket contains - * SMC_ENTRY_PER_BUCKET number of entries. - * Since 16-bit flow_table index is used, if there are more than 2^16 - * dp_netdev_flow, SMC will miss them that cannot be indexed by a 16-bit value. - * - * - * Thread-safety - * ============= - * - * Each pmd_thread has its own private exact match cache. - * If dp_netdev_input is not called from a pmd thread, a mutex is used. - */ - -#define EM_FLOW_HASH_SHIFT 13 -#define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT) -#define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) -#define EM_FLOW_HASH_SEGS 2 - -/* SMC uses a set-associative design. A bucket contains a set of entries that - * a flow item can occupy. For now, it uses one hash function rather than two - * as for the EMC design. */ -#define SMC_ENTRY_PER_BUCKET 4 -#define SMC_ENTRIES (1u << 20) -#define SMC_BUCKET_CNT (SMC_ENTRIES / SMC_ENTRY_PER_BUCKET) -#define SMC_MASK (SMC_BUCKET_CNT - 1) - -/* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ -#define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ - DEFAULT_EM_FLOW_INSERT_INV_PROB) - -struct emc_entry { - struct dp_netdev_flow *flow; - struct netdev_flow_key key; /* key.hash used for emc hash value. */ -}; - -struct emc_cache { - struct emc_entry entries[EM_FLOW_HASH_ENTRIES]; - int sweep_idx; /* For emc_cache_slow_sweep(). */ -}; - -struct smc_bucket { - uint16_t sig[SMC_ENTRY_PER_BUCKET]; - uint16_t flow_idx[SMC_ENTRY_PER_BUCKET]; -}; - -/* Signature match cache, differentiate from EMC cache */ -struct smc_cache { - struct smc_bucket buckets[SMC_BUCKET_CNT]; -}; - -struct dfc_cache { - struct emc_cache emc_cache; - struct smc_cache smc_cache; -}; - -/* Iterate in the exact match cache through every entry that might contain a - * miniflow with hash 'HASH'. */ -#define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH) \ - for (uint32_t i__ = 0, srch_hash__ = (HASH); \ - (CURRENT_ENTRY) = &(EMC)->entries[srch_hash__ & EM_FLOW_HASH_MASK], \ - i__ < EM_FLOW_HASH_SEGS; \ - i__++, srch_hash__ >>= EM_FLOW_HASH_SHIFT) /* Simple non-wildcarding single-priority classifier. */ @@ -487,119 +405,10 @@ struct dp_netdev_port { char *rxq_affinity_list; /* Requested affinity of rx queues. */ }; -/* Contained by struct dp_netdev_flow's 'stats' member. */ -struct dp_netdev_flow_stats { - atomic_llong used; /* Last used time, in monotonic msecs. */ - atomic_ullong packet_count; /* Number of packets matched. */ - atomic_ullong byte_count; /* Number of bytes matched. */ - atomic_uint16_t tcp_flags; /* Bitwise-OR of seen tcp_flags values. */ -}; - -/* Contained by struct dp_netdev_flow's 'last_attrs' member. */ -struct dp_netdev_flow_attrs { - atomic_bool offloaded; /* True if flow is offloaded to HW. */ - ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */ -}; - -/* A flow in 'dp_netdev_pmd_thread's 'flow_table'. - * - * - * Thread-safety - * ============= - * - * Except near the beginning or ending of its lifespan, rule 'rule' belongs to - * its pmd thread's classifier. The text below calls this classifier 'cls'. - * - * Motivation - * ---------- - * - * The thread safety rules described here for "struct dp_netdev_flow" are - * motivated by two goals: - * - * - Prevent threads that read members of "struct dp_netdev_flow" from - * reading bad data due to changes by some thread concurrently modifying - * those members. - * - * - Prevent two threads making changes to members of a given "struct - * dp_netdev_flow" from interfering with each other. - * - * - * Rules - * ----- - * - * A flow 'flow' may be accessed without a risk of being freed during an RCU - * grace period. Code that needs to hold onto a flow for a while - * should try incrementing 'flow->ref_cnt' with dp_netdev_flow_ref(). - * - * 'flow->ref_cnt' protects 'flow' from being freed. It doesn't protect the - * flow from being deleted from 'cls' and it doesn't protect members of 'flow' - * from modification. - * - * Some members, marked 'const', are immutable. Accessing other members - * requires synchronization, as noted in more detail below. - */ -struct dp_netdev_flow { - const struct flow flow; /* Unmasked flow that created this entry. */ - /* Hash table index by unmasked flow. */ - const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */ - /* 'flow_table'. */ - const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */ - const ovs_u128 ufid; /* Unique flow identifier. */ - const ovs_u128 mega_ufid; /* Unique mega flow identifier. */ - const unsigned pmd_id; /* The 'core_id' of pmd thread owning this */ - /* flow. */ - - /* Number of references. - * The classifier owns one reference. - * Any thread trying to keep a rule from being freed should hold its own - * reference. */ - struct ovs_refcount ref_cnt; - - bool dead; - uint32_t mark; /* Unique flow mark assigned to a flow */ - - /* Statistics. */ - struct dp_netdev_flow_stats stats; - - /* Statistics and attributes received from the netdev offload provider. */ - atomic_int netdev_flow_get_result; - struct dp_netdev_flow_stats last_stats; - struct dp_netdev_flow_attrs last_attrs; - - /* Actions. */ - OVSRCU_TYPE(struct dp_netdev_actions *) actions; - - /* While processing a group of input packets, the datapath uses the next - * member to store a pointer to the output batch for the flow. It is - * reset after the batch has been sent out (See dp_netdev_queue_batches(), - * packet_batch_per_flow_init() and packet_batch_per_flow_execute()). */ - struct packet_batch_per_flow *batch; - - /* Packet classification. */ - char *dp_extra_info; /* String to return in a flow dump/get. */ - struct dpcls_rule cr; /* In owning dp_netdev's 'cls'. */ - /* 'cr' must be the last member. */ -}; - -static void dp_netdev_flow_unref(struct dp_netdev_flow *); static bool dp_netdev_flow_ref(struct dp_netdev_flow *); static int dpif_netdev_flow_from_nlattrs(const struct nlattr *, uint32_t, struct flow *, bool); -/* A set of datapath actions within a "struct dp_netdev_flow". - * - * - * Thread-safety - * ============= - * - * A struct dp_netdev_actions 'actions' is protected with RCU. */ -struct dp_netdev_actions { - /* These members are immutable: they do not change during the struct's - * lifetime. */ - unsigned int size; /* Size of 'actions', in bytes. */ - struct nlattr actions[]; /* Sequence of OVS_ACTION_ATTR_* attributes. */ -}; - struct dp_netdev_actions *dp_netdev_actions_create(const struct nlattr *, size_t); struct dp_netdev_actions *dp_netdev_flow_get_actions( @@ -646,171 +455,6 @@ struct tx_bond { struct member_entry member_buckets[BOND_BUCKETS]; }; -/* A set of properties for the current processing loop that is not directly - * associated with the pmd thread itself, but with the packets being - * processed or the short-term system configuration (for example, time). - * Contained by struct dp_netdev_pmd_thread's 'ctx' member. */ -struct dp_netdev_pmd_thread_ctx { - /* Latest measured time. See 'pmd_thread_ctx_time_update()'. */ - long long now; - /* RX queue from which last packet was received. */ - struct dp_netdev_rxq *last_rxq; - /* EMC insertion probability context for the current processing cycle. */ - uint32_t emc_insert_min; -}; - -/* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate - * the performance overhead of interrupt processing. Therefore netdev can - * not implement rx-wait for these devices. dpif-netdev needs to poll - * these device to check for recv buffer. pmd-thread does polling for - * devices assigned to itself. - * - * DPDK used PMD for accessing NIC. - * - * Note, instance with cpu core id NON_PMD_CORE_ID will be reserved for - * I/O of all non-pmd threads. There will be no actual thread created - * for the instance. - * - * Each struct has its own flow cache and classifier per managed ingress port. - * For packets received on ingress port, a look up is done on corresponding PMD - * thread's flow cache and in case of a miss, lookup is performed in the - * corresponding classifier of port. Packets are executed with the found - * actions in either case. - * */ -struct dp_netdev_pmd_thread { - struct dp_netdev *dp; - struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */ - struct cmap_node node; /* In 'dp->poll_threads'. */ - - /* Per thread exact-match cache. Note, the instance for cpu core - * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly - * need to be protected by 'non_pmd_mutex'. Every other instance - * will only be accessed by its own pmd thread. */ - OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct dfc_cache flow_cache; - - /* Flow-Table and classifiers - * - * Writers of 'flow_table' must take the 'flow_mutex'. Corresponding - * changes to 'classifiers' must be made while still holding the - * 'flow_mutex'. - */ - struct ovs_mutex flow_mutex; - struct cmap flow_table OVS_GUARDED; /* Flow table. */ - - /* One classifier per in_port polled by the pmd */ - struct cmap classifiers; - /* Periodically sort subtable vectors according to hit frequencies */ - long long int next_optimization; - /* End of the next time interval for which processing cycles - are stored for each polled rxq. */ - long long int rxq_next_cycle_store; - - /* Last interval timestamp. */ - uint64_t intrvl_tsc_prev; - /* Last interval cycles. */ - atomic_ullong intrvl_cycles; - - /* Current context of the PMD thread. */ - struct dp_netdev_pmd_thread_ctx ctx; - - struct seq *reload_seq; - uint64_t last_reload_seq; - - /* These are atomic variables used as a synchronization and configuration - * points for thread reload/exit. - * - * 'reload' atomic is the main one and it's used as a memory - * synchronization point for all other knobs and data. - * - * For a thread that requests PMD reload: - * - * * All changes that should be visible to the PMD thread must be made - * before setting the 'reload'. These changes could use any memory - * ordering model including 'relaxed'. - * * Setting the 'reload' atomic should occur in the same thread where - * all other PMD configuration options updated. - * * Setting the 'reload' atomic should be done with 'release' memory - * ordering model or stricter. This will guarantee that all previous - * changes (including non-atomic and 'relaxed') will be visible to - * the PMD thread. - * * To check that reload is done, thread should poll the 'reload' atomic - * to become 'false'. Polling should be done with 'acquire' memory - * ordering model or stricter. This ensures that PMD thread completed - * the reload process. - * - * For the PMD thread: - * - * * PMD thread should read 'reload' atomic with 'acquire' memory - * ordering model or stricter. This will guarantee that all changes - * made before setting the 'reload' in the requesting thread will be - * visible to the PMD thread. - * * All other configuration data could be read with any memory - * ordering model (including non-atomic and 'relaxed') but *only after* - * reading the 'reload' atomic set to 'true'. - * * When the PMD reload done, PMD should (optionally) set all the below - * knobs except the 'reload' to their default ('false') values and - * (mandatory), as the last step, set the 'reload' to 'false' using - * 'release' memory ordering model or stricter. This will inform the - * requesting thread that PMD has completed a reload cycle. - */ - atomic_bool reload; /* Do we need to reload ports? */ - atomic_bool wait_for_reload; /* Can we busy wait for the next reload? */ - atomic_bool reload_tx_qid; /* Do we need to reload static_tx_qid? */ - atomic_bool exit; /* For terminating the pmd thread. */ - - pthread_t thread; - unsigned core_id; /* CPU core id of this pmd thread. */ - int numa_id; /* numa node id of this pmd thread. */ - bool isolated; - - /* Queue id used by this pmd thread to send packets on all netdevs if - * XPS disabled for this netdev. All static_tx_qid's are unique and less - * than 'cmap_count(dp->poll_threads)'. */ - uint32_t static_tx_qid; - - /* Number of filled output batches. */ - int n_output_batches; - - struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and 'tx_ports'. */ - /* List of rx queues to poll. */ - struct hmap poll_list OVS_GUARDED; - /* Map of 'tx_port's used for transmission. Written by the main thread, - * read by the pmd thread. */ - struct hmap tx_ports OVS_GUARDED; - - struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */ - /* Map of 'tx_bond's used for transmission. Written by the main thread - * and read by the pmd thread. */ - struct cmap tx_bonds; - - /* These are thread-local copies of 'tx_ports'. One contains only tunnel - * ports (that support push_tunnel/pop_tunnel), the other contains ports - * with at least one txq (that support send). A port can be in both. - * - * There are two separate maps to make sure that we don't try to execute - * OUTPUT on a device which has 0 txqs or PUSH/POP on a non-tunnel device. - * - * The instances for cpu core NON_PMD_CORE_ID can be accessed by multiple - * threads, and thusly need to be protected by 'non_pmd_mutex'. Every - * other instance will only be accessed by its own pmd thread. */ - struct hmap tnl_port_cache; - struct hmap send_port_cache; - - /* Keep track of detailed PMD performance statistics. */ - struct pmd_perf_stats perf_stats; - - /* Stats from previous iteration used by automatic pmd - * load balance logic. */ - uint64_t prev_stats[PMD_N_STATS]; - atomic_count pmd_overloaded; - - /* Set to true if the pmd thread needs to be reloaded. */ - bool need_reload; - - /* Next time when PMD should try RCU quiescing. */ - long long next_rcu_quiesce; -}; - /* Interface to netdev-based datapath. */ struct dpif_netdev { struct dpif dpif; @@ -915,90 +559,12 @@ static inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port); -static inline bool emc_entry_alive(struct emc_entry *ce); -static void emc_clear_entry(struct emc_entry *ce); -static void smc_clear_entry(struct smc_bucket *b, int idx); - static void dp_netdev_request_reconfigure(struct dp_netdev *dp); static inline bool pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd); static void queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow); -static void -emc_cache_init(struct emc_cache *flow_cache) -{ - int i; - - flow_cache->sweep_idx = 0; - for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { - flow_cache->entries[i].flow = NULL; - flow_cache->entries[i].key.hash = 0; - flow_cache->entries[i].key.len = sizeof(struct miniflow); - flowmap_init(&flow_cache->entries[i].key.mf.map); - } -} - -static void -smc_cache_init(struct smc_cache *smc_cache) -{ - int i, j; - for (i = 0; i < SMC_BUCKET_CNT; i++) { - for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { - smc_cache->buckets[i].flow_idx[j] = UINT16_MAX; - } - } -} - -static void -dfc_cache_init(struct dfc_cache *flow_cache) -{ - emc_cache_init(&flow_cache->emc_cache); - smc_cache_init(&flow_cache->smc_cache); -} - -static void -emc_cache_uninit(struct emc_cache *flow_cache) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { - emc_clear_entry(&flow_cache->entries[i]); - } -} - -static void -smc_cache_uninit(struct smc_cache *smc) -{ - int i, j; - - for (i = 0; i < SMC_BUCKET_CNT; i++) { - for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { - smc_clear_entry(&(smc->buckets[i]), j); - } - } -} - -static void -dfc_cache_uninit(struct dfc_cache *flow_cache) -{ - smc_cache_uninit(&flow_cache->smc_cache); - emc_cache_uninit(&flow_cache->emc_cache); -} - -/* Check and clear dead flow references slowly (one entry at each - * invocation). */ -static void -emc_cache_slow_sweep(struct emc_cache *flow_cache) -{ - struct emc_entry *entry = &flow_cache->entries[flow_cache->sweep_idx]; - - if (!emc_entry_alive(entry)) { - emc_clear_entry(entry); - } - flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) & EM_FLOW_HASH_MASK; -} - /* Updates the time in PMD threads context and should be called in three cases: * * 1. PMD structure initialization: @@ -2347,19 +1913,13 @@ dp_netdev_flow_free(struct dp_netdev_flow *flow) free(flow); } -static void dp_netdev_flow_unref(struct dp_netdev_flow *flow) +void dp_netdev_flow_unref(struct dp_netdev_flow *flow) { if (ovs_refcount_unref_relaxed(&flow->ref_cnt) == 1) { ovsrcu_postpone(dp_netdev_flow_free, flow); } } -static uint32_t -dp_netdev_flow_hash(const ovs_u128 *ufid) -{ - return ufid->u32[0]; -} - static inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port) @@ -2976,14 +2536,6 @@ static bool dp_netdev_flow_ref(struct dp_netdev_flow *flow) * single memcmp(). * - These functions can be inlined by the compiler. */ -/* Given the number of bits set in miniflow's maps, returns the size of the - * 'netdev_flow_key.mf' */ -static inline size_t -netdev_flow_key_size(size_t flow_u64s) -{ - return sizeof(struct miniflow) + MINIFLOW_VALUES_SIZE(flow_u64s); -} - static inline bool netdev_flow_key_equal(const struct netdev_flow_key *a, const struct netdev_flow_key *b) @@ -2992,16 +2544,6 @@ netdev_flow_key_equal(const struct netdev_flow_key *a, return a->hash == b->hash && !memcmp(&a->mf, &b->mf, a->len); } -/* Used to compare 'netdev_flow_key' in the exact match cache to a miniflow. - * The maps are compared bitwise, so both 'key->mf' and 'mf' must have been - * generated by miniflow_extract. */ -static inline bool -netdev_flow_key_equal_mf(const struct netdev_flow_key *key, - const struct miniflow *mf) -{ - return !memcmp(&key->mf, mf, key->len); -} - static inline void netdev_flow_key_clone(struct netdev_flow_key *dst, const struct netdev_flow_key *src) @@ -3068,21 +2610,6 @@ netdev_flow_key_init_masked(struct netdev_flow_key *dst, (dst_u64 - miniflow_get_values(&dst->mf)) * 8); } -static inline bool -emc_entry_alive(struct emc_entry *ce) -{ - return ce->flow && !ce->flow->dead; -} - -static void -emc_clear_entry(struct emc_entry *ce) -{ - if (ce->flow) { - dp_netdev_flow_unref(ce->flow); - ce->flow = NULL; - } -} - static inline void emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow, const struct netdev_flow_key *key) @@ -3148,24 +2675,6 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, } } -static inline struct dp_netdev_flow * -emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) -{ - struct emc_entry *current_entry; - - EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) { - if (current_entry->key.hash == key->hash - && emc_entry_alive(current_entry) - && netdev_flow_key_equal_mf(¤t_entry->key, &key->mf)) { - - /* We found the entry with the 'key->mf' miniflow */ - return current_entry->flow; - } - } - - return NULL; -} - static inline const struct cmap_node * smc_entry_get(struct dp_netdev_pmd_thread *pmd, const uint32_t hash) { @@ -3186,12 +2695,6 @@ smc_entry_get(struct dp_netdev_pmd_thread *pmd, const uint32_t hash) return NULL; } -static void -smc_clear_entry(struct smc_bucket *b, int idx) -{ - b->flow_idx[idx] = UINT16_MAX; -} - /* Insert the flow_table index into SMC. Insertion may fail when 1) SMC is * turned off, 2) the flow_table index is larger than uint16_t can handle. * If there is already an SMC entry having same signature, the index will be @@ -6875,22 +6378,6 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, actions, wc, put_actions, dp->upcall_aux); } -static inline uint32_t -dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet, - const struct miniflow *mf) -{ - uint32_t hash; - - if (OVS_LIKELY(dp_packet_rss_valid(packet))) { - hash = dp_packet_get_rss_hash(packet); - } else { - hash = miniflow_hash_5tuple(mf, 0); - dp_packet_set_rss_hash(packet, hash); - } - - return hash; -} - static inline uint32_t dpif_netdev_packet_get_rss_hash(struct dp_packet *packet, const struct miniflow *mf) From patchwork Wed Apr 7 09:34:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463257 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfLk6Ymnz9sV5 for ; Wed, 7 Apr 2021 19:32:30 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 9857883F56; Wed, 7 Apr 2021 09:32:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qmzl1QdQ7eTC; Wed, 7 Apr 2021 09:32:27 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTP id BDA6E83F21; Wed, 7 Apr 2021 09:32:26 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 97DB7C000C; Wed, 7 Apr 2021 09:32:26 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 683C8C000A for ; Wed, 7 Apr 2021 09:32:25 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 5536A60BD4 for ; Wed, 7 Apr 2021 09:32:25 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E7iTHarbGxwt for ; Wed, 7 Apr 2021 09:32:24 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 3D16860A4E for ; Wed, 7 Apr 2021 09:32:24 +0000 (UTC) IronPort-SDR: s66WVHb5zMaM0n+1ESBL4BD6rmgNCJkCSPYiT+OfdF1T0sQsBmyKeiaUTmOWquupQt4DAFEEie bovTctRFGwXg== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344871" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344871" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:23 -0700 IronPort-SDR: k2k+rwZepETX7bswvAWjJw3jkc1r02ED6HoFQ3ZT85+Q78bN470PNcgR261ye5r0l+LQ1U73+f +D0Vq24vOc1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253811" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:22 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:29 +0100 Message-Id: <20210407093442.41568-3-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 02/15] dpif-netdev: Split HWOL out to own header file. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit moves the datapath lookup functions required for hardware offload to a seperate file. This allows other DPIF implementations to access the lookup functions, encouraging code reuse. Signed-off-by: Harry van Haaren Signed-off-by: Cian Ferriter --- lib/automake.mk | 1 + lib/dpif-netdev-private-hwol.h | 63 ++++++++++++++++++++++++++++++++++ lib/dpif-netdev.c | 39 ++------------------- 3 files changed, 67 insertions(+), 36 deletions(-) create mode 100644 lib/dpif-netdev-private-hwol.h diff --git a/lib/automake.mk b/lib/automake.mk index 0e83145b5..9b3e06db6 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -114,6 +114,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-private-dfc.h \ lib/dpif-netdev-private-dpcls.h \ lib/dpif-netdev-private-flow.h \ + lib/dpif-netdev-private-hwol.h \ lib/dpif-netdev-private-thread.h \ lib/dpif-netdev-private.h \ lib/dpif-netdev-perf.c \ diff --git a/lib/dpif-netdev-private-hwol.h b/lib/dpif-netdev-private-hwol.h new file mode 100644 index 000000000..b93297a74 --- /dev/null +++ b/lib/dpif-netdev-private-hwol.h @@ -0,0 +1,63 @@ +/* + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. + * Copyright (c) 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_HWOL_H +#define DPIF_NETDEV_PRIVATE_HWOL_H 1 + +#include "dpif-netdev-private-flow.h" + +#define MAX_FLOW_MARK (UINT32_MAX - 1) +#define INVALID_FLOW_MARK 0 +/* Zero flow mark is used to indicate the HW to remove the mark. A packet + * marked with zero mark is received in SW without a mark at all, so it + * cannot be used as a valid mark. + */ + +struct megaflow_to_mark_data { + const struct cmap_node node; + ovs_u128 mega_ufid; + uint32_t mark; +}; + +struct flow_mark { + struct cmap megaflow_to_mark; + struct cmap mark_to_flow; + struct id_pool *pool; +}; + +/* allocated in dpif-netdev.c */ +extern struct flow_mark flow_mark; + +static inline struct dp_netdev_flow * +mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, + const uint32_t mark) +{ + struct dp_netdev_flow *flow; + + CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0), + &flow_mark.mark_to_flow) { + if (flow->mark == mark && flow->pmd_id == pmd->core_id && + flow->dead == false) { + return flow; + } + } + + return NULL; +} + + +#endif /* dpif-netdev-private-hwol.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 298bfe444..88f37c505 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -84,6 +84,8 @@ #include "util.h" #include "uuid.h" +#include "dpif-netdev-private-hwol.h" + VLOG_DEFINE_THIS_MODULE(dpif_netdev); /* Auto Load Balancing Defaults */ @@ -1954,26 +1956,8 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread *pmd, return cls; } -#define MAX_FLOW_MARK (UINT32_MAX - 1) -#define INVALID_FLOW_MARK 0 -/* Zero flow mark is used to indicate the HW to remove the mark. A packet - * marked with zero mark is received in SW without a mark at all, so it - * cannot be used as a valid mark. - */ - -struct megaflow_to_mark_data { - const struct cmap_node node; - ovs_u128 mega_ufid; - uint32_t mark; -}; - -struct flow_mark { - struct cmap megaflow_to_mark; - struct cmap mark_to_flow; - struct id_pool *pool; -}; -static struct flow_mark flow_mark = { +struct flow_mark flow_mark = { .megaflow_to_mark = CMAP_INITIALIZER, .mark_to_flow = CMAP_INITIALIZER, }; @@ -2142,23 +2126,6 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd) } } -static struct dp_netdev_flow * -mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, - const uint32_t mark) -{ - struct dp_netdev_flow *flow; - - CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0), - &flow_mark.mark_to_flow) { - if (flow->mark == mark && flow->pmd_id == pmd->core_id && - flow->dead == false) { - return flow; - } - } - - return NULL; -} - static struct dp_flow_offload_item * dp_netdev_alloc_flow_offload(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow, From patchwork Wed Apr 7 09:34:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463258 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfLq5tGsz9sV5 for ; Wed, 7 Apr 2021 19:32:35 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 92EC240607; Wed, 7 Apr 2021 09:32:32 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kh9EFSrBIqSU; Wed, 7 Apr 2021 09:32:30 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTP id 4AECE405B0; Wed, 7 Apr 2021 09:32:29 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id C4FFCC001A; Wed, 7 Apr 2021 09:32:28 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 51DA5C000A for ; Wed, 7 Apr 2021 09:32:26 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id A8A8160A4E for ; Wed, 7 Apr 2021 09:32:25 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r-yl997RulVd for ; Wed, 7 Apr 2021 09:32:25 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 0F7FA60BC7 for ; Wed, 7 Apr 2021 09:32:25 +0000 (UTC) IronPort-SDR: cn36t9M3XxQ4cJsXgRAI2mBH2ukQptdojutwb4jex8/u6j3bt+oW1+ZPDnsRQ8BzPQb6wG2Nzx XfExfesBHUzA== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344872" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344872" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:24 -0700 IronPort-SDR: KYnivQ/nB1QNDVPVBiU4JXMyqAhzd/TzXkG1/Vpaj2SJvg0HqpAT4jjxoHbU/fWa7T91iUJFIf mxmMyutAgQbQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253820" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:23 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:30 +0100 Message-Id: <20210407093442.41568-4-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 03/15] dpif-netdev: Add function pointer for netdev input. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit adds a function pointer to the pmd thread data structure, giving the pmd thread flexibility in its dpif-input function choice. This allows choosing of the implementation based on ISA capabilities of the runtime CPU, leading to optimizations and higher performance. Signed-off-by: Harry van Haaren Signed-off-by: Cian Ferriter --- lib/dpif-netdev-private-thread.h | 12 ++++++++++++ lib/dpif-netdev.c | 7 ++++++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h index 5e5308b96..a5f39d963 100644 --- a/lib/dpif-netdev-private-thread.h +++ b/lib/dpif-netdev-private-thread.h @@ -47,6 +47,13 @@ struct dp_netdev_pmd_thread_ctx { uint32_t emc_insert_min; }; +/* Forward declaration for typedef */ +struct dp_netdev_pmd_thread; + +typedef void (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t port_no); + /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can * not implement rx-wait for these devices. dpif-netdev needs to poll @@ -101,6 +108,11 @@ struct dp_netdev_pmd_thread { /* Current context of the PMD thread. */ struct dp_netdev_pmd_thread_ctx ctx; + /* Function pointer to call for dp_netdev_input() functionality. */ + dp_netdev_input_func netdev_input_func; + /* Pointer for per-DPIF implementation scratch space. */ + void *netdev_input_func_userdata; + struct seq *reload_seq; uint64_t last_reload_seq; diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 88f37c505..7486171de 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4234,8 +4234,9 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, } } } + /* Process packet batch. */ - dp_netdev_input(pmd, &batch, port_no); + pmd->netdev_input_func(pmd, &batch, port_no); /* Assign processing cycles to rx queue. */ cycles = cycle_timer_stop(&pmd->perf_stats, &timer); @@ -6033,6 +6034,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, hmap_init(&pmd->tnl_port_cache); hmap_init(&pmd->send_port_cache); cmap_init(&pmd->tx_bonds); + + /* Initialize the DPIF function pointer to the default scalar version */ + pmd->netdev_input_func = dp_netdev_input; + /* init the 'flow_cache' since there is no * actual thread created for NON_PMD_CORE_ID. */ if (core_id == NON_PMD_CORE_ID) { From patchwork Wed Apr 7 09:34:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463262 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfM52bm5z9sVb for ; Wed, 7 Apr 2021 19:32:49 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id B733E41A3C; Wed, 7 Apr 2021 09:32:47 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L2Hx4R2zXSX8; Wed, 7 Apr 2021 09:32:44 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTP id 1052B405D2; Wed, 7 Apr 2021 09:32:34 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 192DBC0022; Wed, 7 Apr 2021 09:32:32 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 20E77C0016 for ; Wed, 7 Apr 2021 09:32:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 8565F60BDD for ; Wed, 7 Apr 2021 09:32:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vlMsrz0m5CBy for ; Wed, 7 Apr 2021 09:32:26 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id BBF5360595 for ; Wed, 7 Apr 2021 09:32:26 +0000 (UTC) IronPort-SDR: fhkNqgIX3oqLR+JnG/6wRxhdAGt6ar45LvkD8GGCbJPwXZfRZae0avO5AbMAdY816XZPd+Cgat kZhXOi+KsGBw== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344877" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344877" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:26 -0700 IronPort-SDR: 03F5ADEcS2l/Y3/Sd1bb9JVUxDObCmwkRFXhS25mxumrV4AcKZgF/CXrO7equ4SFMNN2HcgpXy vmFMMkEFxKEw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253829" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:24 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:31 +0100 Message-Id: <20210407093442.41568-5-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 04/15] dpif-avx512: Add ISA implementation of dpif. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit adds the AVX512 implementation of DPIF functionality, specifically the dp_netdev_input_outer_avx512 function. This function only handles outer (no re-circulations), and is optimized to use the AVX512 ISA for packet batching and other DPIF work. Sparse is not able to handle the AVX512 intrinsics, causing compile time failures, so it is disabled for this file. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- v8: - Fixup AVX512 mask to uint32_t conversion compilation warning. --- lib/automake.mk | 5 +- lib/dpif-netdev-avx512.c | 264 +++++++++++++++++++++++++++++++ lib/dpif-netdev-private-dfc.h | 8 + lib/dpif-netdev-private-dpif.h | 32 ++++ lib/dpif-netdev-private-thread.h | 11 +- lib/dpif-netdev-private.h | 25 +++ lib/dpif-netdev.c | 70 ++++++-- 7 files changed, 399 insertions(+), 16 deletions(-) create mode 100644 lib/dpif-netdev-avx512.c create mode 100644 lib/dpif-netdev-private-dpif.h diff --git a/lib/automake.mk b/lib/automake.mk index 9b3e06db6..d945d935e 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \ -mavx512f \ -mavx512bw \ -mavx512dq \ + -mbmi \ -mbmi2 \ -fPIC \ $(AM_CFLAGS) lib_libopenvswitchavx512_la_SOURCES = \ - lib/dpif-netdev-lookup-avx512-gather.c + lib/dpif-netdev-lookup-avx512-gather.c \ + lib/dpif-netdev-avx512.c lib_libopenvswitchavx512_la_LDFLAGS = \ -static endif @@ -113,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev.h \ lib/dpif-netdev-private-dfc.h \ lib/dpif-netdev-private-dpcls.h \ + lib/dpif-netdev-private-dpif.h \ lib/dpif-netdev-private-flow.h \ lib/dpif-netdev-private-hwol.h \ lib/dpif-netdev-private-thread.h \ diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c new file mode 100644 index 000000000..811d3cb86 --- /dev/null +++ b/lib/dpif-netdev-avx512.c @@ -0,0 +1,264 @@ +/* + * Copyright (c) 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifdef __x86_64__ +/* Sparse cannot handle the AVX512 instructions */ +#if !defined(__CHECKER__) + +#include + +#include "dpif-netdev.h" +#include "dpif-netdev-perf.h" + +#include "dpif-netdev-private.h" +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-flow.h" +#include "dpif-netdev-private-thread.h" + +#include "dp-packet.h" +#include "netdev.h" + +#include "immintrin.h" + +/* Structure to contain per-packet metadata that must be attributed to the + * dp netdev flow. This is unfortunate to have to track per packet, however + * it's a bit awkward to maintain them in a performant way. This structure + * helps to keep two variables on a single cache line per packet. + */ +struct pkt_flow_meta { + uint16_t bytes; + uint16_t tcp_flags; +}; + +/* Structure of heap allocated memory for DPIF internals. */ +struct dpif_userdata { + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) + struct netdev_flow_key keys[NETDEV_MAX_BURST]; + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) + struct netdev_flow_key *key_ptrs[NETDEV_MAX_BURST]; + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) + struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST]; +}; + +int32_t +dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port) +{ + /* Allocate DPIF userdata. */ + if (OVS_UNLIKELY(!pmd->netdev_input_func_userdata)) { + pmd->netdev_input_func_userdata = + xmalloc_pagealign(sizeof(struct dpif_userdata)); + } + + struct dpif_userdata *ud = pmd->netdev_input_func_userdata; + struct netdev_flow_key *keys = ud->keys; + struct netdev_flow_key **key_ptrs = ud->key_ptrs; + struct pkt_flow_meta *pkt_meta = ud->pkt_meta; + + /* Stores the computed output: a rule pointer for each packet */ + /* The AVX512 DPIF implementation handles rules in a way that is optimized + * for reducing data-movement between HWOL/EMC/SMC and DPCLS. This is + * achieved by separating the rule arrays. Bitmasks are kept for each + * packet, indicating if it matched in the HWOL/EMC/SMC array or DPCLS + * array. Later the two arrays are merged by AVX-512 expand instructions. + */ + struct dpcls_rule *rules[NETDEV_MAX_BURST]; + struct dpcls_rule *dpcls_rules[NETDEV_MAX_BURST]; + uint32_t dpcls_key_idx = 0; + + for (uint32_t i = 0; i < NETDEV_MAX_BURST; i += 8) { + _mm512_storeu_si512(&rules[i], _mm512_setzero_si512()); + _mm512_storeu_si512(&dpcls_rules[i], _mm512_setzero_si512()); + } + + /* Prefetch each packet's metadata */ + const size_t batch_size = dp_packet_batch_size(packets); + for (int i = 0; i < batch_size; i++) { + struct dp_packet *packet = packets->packets[i]; + OVS_PREFETCH(dp_packet_data(packet)); + pkt_metadata_prefetch_init(&packet->md); + } + + /* Check if EMC or SMC are enabled */ + struct dfc_cache *cache = &pmd->flow_cache; + const uint32_t emc_enabled = pmd->ctx.emc_insert_min != 0; + const uint32_t smc_enabled = pmd->ctx.smc_enable_db; + + uint32_t emc_hits = 0; + uint32_t smc_hits = 0; + + /* a 1 bit in this mask indidcates a hit, so no DPCLS lookup on the pkt. */ + uint32_t hwol_emc_smc_hitmask = 0; + + /* Perform first packet interation */ + uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; + uint32_t iter = lookup_pkts_bitmask; + while (iter) { + uint32_t i = __builtin_ctz(iter); + iter = _blsr_u64(iter); + + /* Initialize packet md and do miniflow extract */ + struct dp_packet *packet = packets->packets[i]; + pkt_metadata_init(&packet->md, in_port); + struct netdev_flow_key *key = &keys[i]; + miniflow_extract(packet, &key->mf); + + /* Cache TCP and byte values for all packets */ + pkt_meta[i].bytes = dp_packet_size(packet); + pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(&key->mf); + + key->len = netdev_flow_key_size(miniflow_n_values(&key->mf)); + key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, &key->mf); + + struct dp_netdev_flow *f = NULL; + + if (emc_enabled) { + f = emc_lookup(&cache->emc_cache, key); + + if (f) { + rules[i] = &f->cr; + emc_hits++; + hwol_emc_smc_hitmask |= (1 << i); + continue; + } + }; + + if (smc_enabled && !f) { + f = smc_lookup_single(pmd, packet, key); + if (f) { + rules[i] = &f->cr; + smc_hits++; + hwol_emc_smc_hitmask |= (1 << i); + continue; + } + } + + /* The flow pointer was not found in HWOL/EMC/SMC, so add it to the + * dpcls input keys array for batch lookup later. + */ + key_ptrs[dpcls_key_idx] = &keys[i]; + dpcls_key_idx++; + } + + + /* DPCLS handles any packets missed by HWOL/EMC/SMC. It operates on the + * key_ptrs[] for input miniflows to match, storing results in the + * dpcls_rules[] array. + */ + if (dpcls_key_idx > 0) { + struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); + if (OVS_UNLIKELY(!cls)) { + return -1; + } + int any_miss = !dpcls_lookup(cls, + (const struct netdev_flow_key **) key_ptrs, + dpcls_rules, dpcls_key_idx, NULL); + if (OVS_UNLIKELY(any_miss)) { + return -1; + } + + /* Merge DPCLS rules and HWOL/EMC/SMC rules. */ + uint32_t dpcls_idx = 0; + for (int i = 0; i < NETDEV_MAX_BURST; i += 8) { + /* Indexing here is somewhat complicated due to DPCLS output rule + * load index depending on the hitmask of HWOL/EMC/SMC. More + * packets from HWOL/EMC/SMC bitmask means less DPCLS rules are + * used. + */ + __m512i v_cache_rules = _mm512_loadu_si512(&rules[i]); + __m512i v_merged_rules = + _mm512_mask_expandloadu_epi64(v_cache_rules, + ~hwol_emc_smc_hitmask, + &dpcls_rules[dpcls_idx]); + _mm512_storeu_si512(&rules[i], v_merged_rules); + + /* Update DPCLS load index and bitmask for HWOL/EMC/SMC hits. + * There are 8 output pointer per register, subtract the + * HWOL/EMC/SMC lanes equals the number of DPCLS rules consumed. + */ + uint32_t hitmask_FF = (hwol_emc_smc_hitmask & 0xFF); + dpcls_idx += 8 - __builtin_popcountll(hitmask_FF); + hwol_emc_smc_hitmask = (hwol_emc_smc_hitmask >> 8); + } + } + + /* At this point we don't return error anymore, so commit stats here. */ + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_RECV, batch_size); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, emc_hits); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, smc_hits); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, + dpcls_key_idx); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, + dpcls_key_idx); + + /* Initialize the "Action Batch" for each flow handled below. */ + struct dp_packet_batch action_batch; + action_batch.trunc = 0; + action_batch.do_not_steal = false; + + while (lookup_pkts_bitmask) { + uint32_t rule_pkt_idx = __builtin_ctz(lookup_pkts_bitmask); + uint64_t needle = (uintptr_t) rules[rule_pkt_idx]; + + /* Parallel compare 8 flow* 's to the needle, create a bitmask. */ + uint32_t batch_bitmask = 0; + for (uint32_t j = 0; j < NETDEV_MAX_BURST; j += 8) { + /* Pre-calculate store addr */ + uint32_t num_pkts_in_batch = __builtin_popcountll(batch_bitmask); + void *store_addr = &action_batch.packets[num_pkts_in_batch]; + + /* Search for identical flow* in burst, update bitmask. */ + __m512i v_needle = _mm512_set1_epi64(needle); + __m512i v_hay = _mm512_loadu_si512(&rules[j]); + __mmask8 k_cmp_bits = _mm512_cmpeq_epi64_mask(v_needle, v_hay); + uint32_t cmp_bits = k_cmp_bits; + batch_bitmask |= cmp_bits << j; + + /* Compress & Store the batched packets. */ + struct dp_packet **packets_ptrs = &packets->packets[j]; + __m512i v_pkt_ptrs = _mm512_loadu_si512(packets_ptrs); + _mm512_mask_compressstoreu_epi64(store_addr, cmp_bits, v_pkt_ptrs); + } + + /* Strip all packets in this batch from the lookup_pkts_bitmask. */ + lookup_pkts_bitmask &= (~batch_bitmask); + action_batch.count = __builtin_popcountll(batch_bitmask); + + /* Loop over all packets in this batch, to gather the byte and tcp_flag + * values, and pass them to the execute function. It would be nice to + * optimize this away, however it is not easy to refactor in dpif. + */ + uint32_t bytes = 0; + uint16_t tcp_flags = 0; + uint32_t bitmask_iter = batch_bitmask; + for (int i = 0; i < action_batch.count; i++) { + uint32_t idx = __builtin_ctzll(bitmask_iter); + bitmask_iter = _blsr_u64(bitmask_iter); + + bytes += pkt_meta[idx].bytes; + tcp_flags |= pkt_meta[idx].tcp_flags; + } + + dp_netdev_batch_execute(pmd, &action_batch, rules[rule_pkt_idx], + bytes, tcp_flags); + } + + return 0; +} + +#endif +#endif diff --git a/lib/dpif-netdev-private-dfc.h b/lib/dpif-netdev-private-dfc.h index 52349a3fc..bd18bd3fd 100644 --- a/lib/dpif-netdev-private-dfc.h +++ b/lib/dpif-netdev-private-dfc.h @@ -81,6 +81,9 @@ extern "C" { #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ DEFAULT_EM_FLOW_INSERT_INV_PROB) +/* Forward declaration for SMC function prototype. */ +struct dp_netdev_pmd_thread; + struct emc_entry { struct dp_netdev_flow *flow; struct netdev_flow_key key; /* key.hash used for emc hash value. */ @@ -237,6 +240,11 @@ emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) return NULL; } +struct dp_netdev_flow * +smc_lookup_single(struct dp_netdev_pmd_thread *pmd, + struct dp_packet *packet, + struct netdev_flow_key *key); + #ifdef __cplusplus } #endif diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h new file mode 100644 index 000000000..27d58d19e --- /dev/null +++ b/lib/dpif-netdev-private-dpif.h @@ -0,0 +1,32 @@ +/* + * Copyright (c) 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PRIVATE_DPIF_H +#define DPIF_NETDEV_PRIVATE_DPIF_H 1 + +#include "openvswitch/types.h" + +/* Forward declarations to avoid including files */ +struct dp_netdev_pmd_thread; +struct dp_packet_batch; + +/* Available implementations for dpif work */ +int32_t +dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port); + +#endif /* netdev-private.h */ diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h index a5f39d963..5572e0b42 100644 --- a/lib/dpif-netdev-private-thread.h +++ b/lib/dpif-netdev-private-thread.h @@ -45,14 +45,19 @@ struct dp_netdev_pmd_thread_ctx { struct dp_netdev_rxq *last_rxq; /* EMC insertion probability context for the current processing cycle. */ uint32_t emc_insert_min; + /* Enable the SMC cache from ovsdb config */ + bool smc_enable_db; }; /* Forward declaration for typedef */ struct dp_netdev_pmd_thread; -typedef void (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, - struct dp_packet_batch *packets, - odp_port_t port_no); +/* Typedef for DPIF functions. + * Returns a bitmask of packets to handle, possibly including upcall/misses. + */ +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t port_no); /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h index d7b6fd7ec..9c2237fac 100644 --- a/lib/dpif-netdev-private.h +++ b/lib/dpif-netdev-private.h @@ -31,4 +31,29 @@ #include "dpif-netdev-private-dfc.h" #include "dpif-netdev-private-thread.h" +/* Allow other implementations to lookup the DPCLS instances */ +struct dpcls * +dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, + odp_port_t in_port); + +/* Allow other implementations to call dpcls_lookup() for subtable search */ +bool +dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key *keys[], + struct dpcls_rule **rules, const size_t cnt, + int *num_lookups_p); + +/* Allow other implementations to execute actions on a batch */ +void +dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + struct dpcls_rule *rule, + uint32_t bytes, + uint16_t tcp_flags); + +/* Available implementations for dpif work */ +int32_t +dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port); + #endif /* netdev-private.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 7486171de..f202a6037 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -185,10 +185,6 @@ static uint32_t dpcls_subtable_lookup_reprobe(struct dpcls *cls); static void dpcls_insert(struct dpcls *, struct dpcls_rule *, const struct netdev_flow_key *mask); static void dpcls_remove(struct dpcls *, struct dpcls_rule *); -static bool dpcls_lookup(struct dpcls *cls, - const struct netdev_flow_key *keys[], - struct dpcls_rule **rules, size_t cnt, - int *num_lookups_p); /* Set of supported meter flags */ #define DP_SUPPORTED_METER_FLAGS_MASK \ @@ -485,7 +481,7 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, const struct flow *flow, const struct nlattr *actions, size_t actions_len); -static void dp_netdev_input(struct dp_netdev_pmd_thread *, +static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *, struct dp_packet_batch *, odp_port_t port_no); static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, struct dp_packet_batch *); @@ -557,7 +553,7 @@ dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, bool purge); static int dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd, struct tx_port *tx); -static inline struct dpcls * +inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port); @@ -1922,7 +1918,7 @@ void dp_netdev_flow_unref(struct dp_netdev_flow *flow) } } -static inline struct dpcls * +inline struct dpcls * dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port) { @@ -2722,7 +2718,7 @@ dp_netdev_pmd_lookup_flow(struct dp_netdev_pmd_thread *pmd, int *lookup_num_p) { struct dpcls *cls; - struct dpcls_rule *rule; + struct dpcls_rule *rule = NULL; odp_port_t in_port = u32_to_odp(MINIFLOW_GET_U32(&key->mf, in_port.odp_port)); struct dp_netdev_flow *netdev_flow = NULL; @@ -4236,7 +4232,10 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, } /* Process packet batch. */ - pmd->netdev_input_func(pmd, &batch, port_no); + int32_t ret = pmd->netdev_input_func(pmd, &batch, port_no); + if (ret) { + dp_netdev_input(pmd, &batch, port_no); + } /* Assign processing cycles to rx queue. */ cycles = cycle_timer_stop(&pmd->perf_stats, &timer); @@ -5254,6 +5253,8 @@ dpif_netdev_run(struct dpif *dpif) non_pmd->ctx.emc_insert_min = 0; } + non_pmd->ctx.smc_enable_db = dp->smc_enable_db; + for (i = 0; i < port->n_rxq; i++) { if (!netdev_rxq_enabled(port->rxqs[i].rx)) { @@ -5525,6 +5526,8 @@ reload: pmd->ctx.emc_insert_min = 0; } + pmd->ctx.smc_enable_db = pmd->dp->smc_enable_db; + process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, poll_list[i].port_no); @@ -6419,6 +6422,24 @@ packet_batch_per_flow_execute(struct packet_batch_per_flow *batch, actions->actions, actions->size); } +void +dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + struct dpcls_rule *rule, + uint32_t bytes, + uint16_t tcp_flags) +{ + /* Gets action* from the rule */ + struct dp_netdev_flow *flow = dp_netdev_flow_cast(rule); + struct dp_netdev_actions *actions = dp_netdev_flow_get_actions(flow); + + dp_netdev_flow_used(flow, dp_packet_batch_size(packets), bytes, + tcp_flags, pmd->ctx.now / 1000); + const uint32_t steal = 1; + dp_netdev_execute_actions(pmd, packets, steal, &flow->flow, + actions->actions, actions->size); +} + static inline void dp_netdev_queue_batches(struct dp_packet *pkt, struct dp_netdev_flow *flow, uint16_t tcp_flags, @@ -6523,6 +6544,30 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd, pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, n_smc_hit); } +struct dp_netdev_flow * +smc_lookup_single(struct dp_netdev_pmd_thread *pmd, + struct dp_packet *packet, + struct netdev_flow_key *key) +{ + const struct cmap_node *flow_node = smc_entry_get(pmd, key->hash); + + if (OVS_LIKELY(flow_node != NULL)) { + struct dp_netdev_flow *flow = NULL; + + CMAP_NODE_FOR_EACH (flow, node, flow_node) { + /* Since we dont have per-port megaflow to check the port + * number, we need to verify that the input ports match. */ + if (OVS_LIKELY(dpcls_rule_matches_key(&flow->cr, key) && + flow->flow.in_port.odp_port == packet->md.in_port.odp_port)) { + + return (void *) flow; + } + } + } + + return NULL; +} + /* Try to process all ('cnt') the 'packets' using only the datapath flow cache * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the * miniflow is copied into 'keys' and the packet pointer is moved at the @@ -6928,12 +6973,13 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, } } -static void +static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, odp_port_t port_no) { dp_netdev_input__(pmd, packets, false, port_no); + return 0; } static void @@ -8374,7 +8420,7 @@ netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, /* Returns true if 'target' satisfies 'key' in 'mask', that is, if each 1-bit * in 'mask' the values in 'key' and 'target' are the same. */ -bool +inline bool ALWAYS_INLINE dpcls_rule_matches_key(const struct dpcls_rule *rule, const struct netdev_flow_key *target) { @@ -8400,7 +8446,7 @@ dpcls_rule_matches_key(const struct dpcls_rule *rule, * priorities, instead returning any rule which matches the flow. * * Returns true if all miniflows found a corresponding rule. */ -static bool +bool dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key *keys[], struct dpcls_rule **rules, const size_t cnt, int *num_lookups_p) From patchwork Wed Apr 7 09:34:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463261 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfM44ZM8z9sV5 for ; Wed, 7 Apr 2021 19:32:48 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 04F284187D; Wed, 7 Apr 2021 09:32:42 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7jViHIY6P3NR; Wed, 7 Apr 2021 09:32:41 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTP id D4D3D41873; Wed, 7 Apr 2021 09:32:35 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id DA61BC0018; Wed, 7 Apr 2021 09:32:32 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 3C9A4C0012 for ; Wed, 7 Apr 2021 09:32:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 0778360A4E for ; Wed, 7 Apr 2021 09:32:30 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DbQ3lf2C4VTR for ; Wed, 7 Apr 2021 09:32:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id DA4CA60BD4 for ; Wed, 7 Apr 2021 09:32:27 +0000 (UTC) IronPort-SDR: hhn7fUq5+EL0Vbe37upDdbRxUuRQzQZG2K2i918CiV/sLgw+VIk3omhO4epH/8Y7vFe2KX/qs3 NBK/aqw05B3g== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344884" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344884" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:27 -0700 IronPort-SDR: HcG7H2irXiljZzyuGCer6zBmV5+BUJp9nrgdDyWWM9jMI7cNvKxQZvNMXv2rJPdS95v/NXCFuT TxC4OvXn+rng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253839" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:26 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:32 +0100 Message-Id: <20210407093442.41568-6-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 05/15] dpif-avx512: Add HWOL support to avx512 dpif. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren Partial hardware offload is implemented in a very similar way to the scalar dpif. Signed-off-by: Harry van Haaren Signed-off-by: Cian Ferriter --- lib/dpif-netdev-avx512.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index 811d3cb86..a108417cc 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -27,6 +27,7 @@ #include "dpif-netdev-private-dpcls.h" #include "dpif-netdev-private-flow.h" #include "dpif-netdev-private-thread.h" +#include "dpif-netdev-private-hwol.h" #include "dp-packet.h" #include "netdev.h" @@ -111,9 +112,32 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, uint32_t i = __builtin_ctz(iter); iter = _blsr_u64(iter); - /* Initialize packet md and do miniflow extract */ + /* Get packet pointer from bitmask and packet md */ struct dp_packet *packet = packets->packets[i]; pkt_metadata_init(&packet->md, in_port); + + struct dp_netdev_flow *f = NULL; + + /* Check for partial hardware offload mark */ + uint32_t mark; + if (dp_packet_has_flow_mark(packet, &mark)) { + f = mark_to_flow_find(pmd, mark); + if (f) { + rules[i] = &f->cr; + + /* This is nasty - instead of using the HWOL provided flow, + * parse the packet data anyway to find the location of the TCP + * header to extract the TCP flags for the rule. + */ + pkt_meta[i].tcp_flags = parse_tcp_flags(packet); + + pkt_meta[i].bytes = dp_packet_size(packet); + hwol_emc_smc_hitmask |= (1 << i); + continue; + } + } + + /* Do miniflow extract into keys */ struct netdev_flow_key *key = &keys[i]; miniflow_extract(packet, &key->mf); @@ -124,8 +148,6 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, key->len = netdev_flow_key_size(miniflow_n_values(&key->mf)); key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, &key->mf); - struct dp_netdev_flow *f = NULL; - if (emc_enabled) { f = emc_lookup(&cache->emc_cache, key); From patchwork Wed Apr 7 09:34:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463266 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMK1lf8z9sVb for ; Wed, 7 Apr 2021 19:33:01 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 6882941A9B; Wed, 7 Apr 2021 09:32:58 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YCD6H4RmjyW9; Wed, 7 Apr 2021 09:32:55 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTP id 4AAEE405E8; Wed, 7 Apr 2021 09:32:39 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 02622C001C; Wed, 7 Apr 2021 09:32:36 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 14D0DC0028 for ; Wed, 7 Apr 2021 09:32:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 889A360BF6 for ; Wed, 7 Apr 2021 09:32:32 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3OOwlyDsTIJG for ; Wed, 7 Apr 2021 09:32:29 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 7834760BE6 for ; Wed, 7 Apr 2021 09:32:29 +0000 (UTC) IronPort-SDR: yLP29DbSLUQSF1emb3tAYSdGOdiND+P1fuwQF/bk/NnkUnzRH+Sw3E6hbn+43zZMlE2L12GsiM vpuWb/k/E67A== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344886" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344886" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:29 -0700 IronPort-SDR: trCaq5maAlCDfuRlBUTKC5R9OW7A9sOJ99tgF9ASJlCBECTCo5MMoLSt4GzAFa3zCGXWr/TIez I6szfls0CpSw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253846" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:27 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:33 +0100 Message-Id: <20210407093442.41568-7-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 06/15] dpif-netdev: Add command to switch dpif implementation. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit adds a new command to allow the user to switch the active DPIF implementation at runtime. A probe function is executed before switching the DPIF implementation, to ensure the CPU is capable of running the ISA required. For example, the below code will switch to the AVX512 enabled DPIF assuming that the runtime CPU is capable of running AVX512 instructions: $ ovs-appctl dpif-netdev/dpif-set dpif_avx512 A new configuration flag is added to allow selection of the default DPIF. This is useful for running the unit-tests against the available DPIF implementations, without modifying each unit test. The design of the testing & validation for ISA optimized DPIF implementations is based around the work already upstream for DPCLS. Note however that a DPCLS lookup has no state or side-effects, allowing the auto-validator implementation to perform multiple lookups and provide consistent statistic counters. The DPIF component does have state, so running two implementations in parallel and comparing output is not a valid testing method, as there are changes in DPIF statistic counters (side effects). As a result, the DPIF is tested directly against the unit-tests. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- acinclude.m4 | 15 ++++++ configure.ac | 1 + lib/automake.mk | 1 + lib/dpif-netdev-avx512.c | 14 +++++ lib/dpif-netdev-private-dpif.c | 92 ++++++++++++++++++++++++++++++++ lib/dpif-netdev-private-dpif.h | 43 ++++++++++++++- lib/dpif-netdev-private-thread.h | 12 +---- lib/dpif-netdev.c | 86 +++++++++++++++++++++++++++-- 8 files changed, 248 insertions(+), 16 deletions(-) create mode 100644 lib/dpif-netdev-private-dpif.c diff --git a/acinclude.m4 b/acinclude.m4 index 15a54d636..5fbcd9872 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [ fi ]) +dnl Set OVS DPIF default implementation at configure time for running the unit +dnl tests on the whole codebase without modifying tests per DPIF impl +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [ + AC_ARG_ENABLE([dpif-default-avx512], + [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation as default.])], + [dpifavx512=yes],[dpifavx512=no]) + AC_MSG_CHECKING([whether DPIF AVX512 is default implementation]) + if test "$dpifavx512" != yes; then + AC_MSG_RESULT([no]) + else + OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT" + AC_MSG_RESULT([yes]) + fi +]) + dnl OVS_ENABLE_WERROR AC_DEFUN([OVS_ENABLE_WERROR], [AC_ARG_ENABLE( diff --git a/configure.ac b/configure.ac index c077034d4..e45685a6c 100644 --- a/configure.ac +++ b/configure.ac @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR OVS_ENABLE_SPARSE OVS_CTAGS_IDENTIFIERS OVS_CHECK_DPCLS_AUTOVALIDATOR +OVS_CHECK_DPIF_AVX512_DEFAULT OVS_CHECK_BINUTILS_AVX512 AC_ARG_VAR(KARCH, [Kernel Architecture String]) diff --git a/lib/automake.mk b/lib/automake.mk index d945d935e..5e493ebaf 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev.h \ lib/dpif-netdev-private-dfc.h \ lib/dpif-netdev-private-dpcls.h \ + lib/dpif-netdev-private-dpif.c \ lib/dpif-netdev-private-dpif.h \ lib/dpif-netdev-private-flow.h \ lib/dpif-netdev-private-hwol.h \ diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index a108417cc..391563d4e 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -19,6 +19,7 @@ #if !defined(__CHECKER__) #include +#include #include "dpif-netdev.h" #include "dpif-netdev-perf.h" @@ -54,6 +55,19 @@ struct dpif_userdata { struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST]; }; +int32_t +dp_netdev_input_outer_avx512_probe(void) +{ + int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f"); + int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2"); + + if (!avx512f_available || !bmi2_available) { + return -ENOTSUP; + } + + return 0; +} + int32_t dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c new file mode 100644 index 000000000..28143c82c --- /dev/null +++ b/lib/dpif-netdev-private-dpif.c @@ -0,0 +1,92 @@ +/* + * Copyright (c) 2021 Intel Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include + +#include "dpif-netdev-private-dpif.h" +#include "util.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl); + +/* Actual list of implementations goes here. */ +static struct dpif_netdev_impl_info_t dpif_impls[] = { + /* The default scalar C code implementation. */ + { .func = dp_netdev_input, + .probe = NULL, + .name = "dpif_scalar", }, + +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__) + /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */ + { .func = dp_netdev_input_outer_avx512, + .probe = dp_netdev_input_outer_avx512_probe, + .name = "dpif_avx512", }, +#endif +}; + +dp_netdev_input_func +dp_netdev_impl_get_default(void) +{ + int dpif_idx = 0; + +/* Configure time overriding to run test suite on all implementations. */ +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__) +#ifdef DPIF_AVX512_DEFAULT + ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512); + if (!dp_netdev_input_outer_avx512_probe()) { + dpif_idx = 1; + }; +#endif +#endif + + VLOG_INFO("Default DPIF implementation is %s.\n", + dpif_impls[dpif_idx].name); + dp_netdev_input_func func = dpif_impls[dpif_idx].func; + + return func; +} + + +/* This function checks all available DPIF implementations, and selects the + * returns the function pointer to the one requested by "name". + */ +int32_t +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func) +{ + ovs_assert(name); + ovs_assert(out_func); + + uint32_t i; + + for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) { + if (strcmp(dpif_impls[i].name, name) == 0) { + /* Probe function is optional - so check it is set before exec. */ + if (dpif_impls[i].probe) { + int probe_ok = dpif_impls[i].probe(); + if (probe_ok) { + *out_func = NULL; + return probe_ok; + } + } + *out_func = dpif_impls[i].func; + return 0; + } + } + + return -EINVAL; +} diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h index 27d58d19e..e6793364c 100644 --- a/lib/dpif-netdev-private-dpif.h +++ b/lib/dpif-netdev-private-dpif.h @@ -23,7 +23,48 @@ struct dp_netdev_pmd_thread; struct dp_packet_batch; -/* Available implementations for dpif work */ +/* Typedef for DPIF functions. + * Returns a bitmask of packets to handle, possibly including upcall/misses. + */ +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t port_no); + +/* Probe a DPIF implementation. This allows the implementation to validate CPU + * ISA availability. Returns 0 if not available, returns 1 is valid to use. + */ +typedef int32_t (*dp_netdev_input_func_probe)(void); + +/* Structure describing each available DPIF implmeentation. */ +struct dpif_netdev_impl_info_t { + /* Function pointer to execute to have this DPIF implementation run. */ + dp_netdev_input_func func; + /* Function pointer to execute to check the CPU ISA is available to run. + * May be NULL, which implies that it is always valid to use. + */ + dp_netdev_input_func_probe probe; + /* Name used to select this DPIF implementation. */ + const char *name; +}; + +/* This function checks all available DPIF implementations, and selects the + * returns the function pointer to the one requested by "name". + */ +int32_t +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func); + +/* Returns the ./configure selected DPIF as default, used to initialize. */ +dp_netdev_input_func dp_netdev_impl_get_default(void); + +/* Available implementations of DPIF below */ +int32_t +dp_netdev_input(struct dp_netdev_pmd_thread *pmd, + struct dp_packet_batch *packets, + odp_port_t in_port); + +/* AVX512 enabled DPIF implementation and probe functions */ +int32_t +dp_netdev_input_outer_avx512_probe(void); int32_t dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h index 5572e0b42..13b9d46ac 100644 --- a/lib/dpif-netdev-private-thread.h +++ b/lib/dpif-netdev-private-thread.h @@ -28,6 +28,8 @@ #include "dpif-netdev-perf.h" #include "openvswitch/thread.h" +#include "dpif-netdev-private-dpif.h" + #ifdef __cplusplus extern "C" { #endif @@ -49,16 +51,6 @@ struct dp_netdev_pmd_thread_ctx { bool smc_enable_db; }; -/* Forward declaration for typedef */ -struct dp_netdev_pmd_thread; - -/* Typedef for DPIF functions. - * Returns a bitmask of packets to handle, possibly including upcall/misses. - */ -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd, - struct dp_packet_batch *packets, - odp_port_t port_no); - /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can * not implement rx-wait for these devices. dpif-netdev needs to poll diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f202a6037..8e3d773d4 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -481,8 +481,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, const struct flow *flow, const struct nlattr *actions, size_t actions_len); -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *, - struct dp_packet_batch *, odp_port_t port_no); +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *, + struct dp_packet_batch *, odp_port_t port_no); static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, struct dp_packet_batch *); @@ -993,6 +993,78 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, ds_destroy(&reply); } +static void +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc, + const char *argv[], void *aux OVS_UNUSED) +{ + /* This function requires just one parameter, the DPIF name. + * A second optional parameter can identify the datapath instance. + */ + const char *dpif_name = argv[1]; + + static const char *error_description[2] = { + "Unknown DPIF implementation", + "CPU doesn't support the required instruction for", + }; + + dp_netdev_input_func new_func; + int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func); + if (err) { + struct ds reply = DS_EMPTY_INITIALIZER; + ds_put_format(&reply, "DPIF implementation not available: %s %s.\n", + error_description[ (err == -ENOTSUP) ], dpif_name); + const char *reply_str = ds_cstr(&reply); + unixctl_command_reply(conn, reply_str); + VLOG_INFO("%s", reply_str); + ds_destroy(&reply); + return; + } + + /* argv[2] is optional datapath instance. If no datapath name is provided + * and only one datapath exists, the one existing datapath is reprobed. + */ + ovs_mutex_lock(&dp_netdev_mutex); + struct dp_netdev *dp = NULL; + + if (argc == 3) { + dp = shash_find_data(&dp_netdevs, argv[2]); + } else if (shash_count(&dp_netdevs) == 1) { + dp = shash_first(&dp_netdevs)->data; + } + + if (!dp) { + ovs_mutex_unlock(&dp_netdev_mutex); + unixctl_command_reply_error(conn, + "please specify an existing datapath"); + return; + } + + /* Get PMD threads list */ + size_t n; + struct dp_netdev_pmd_thread **pmd_list; + sorted_poll_thread_list(dp, &pmd_list, &n); + + for (size_t i = 0; i < n; i++) { + struct dp_netdev_pmd_thread *pmd = pmd_list[i]; + if (pmd->core_id == NON_PMD_CORE_ID) { + continue; + } + + /* Set PMD threads DPIF implementation to requested one */ + pmd->netdev_input_func = *new_func; + }; + + ovs_mutex_unlock(&dp_netdev_mutex); + + /* Reply with success to command */ + struct ds reply = DS_EMPTY_INITIALIZER; + ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name); + const char *reply_str = ds_cstr(&reply); + unixctl_command_reply(conn, reply_str); + VLOG_INFO("%s", reply_str); + ds_destroy(&reply); +} + static void dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc, const char *argv[], void *aux OVS_UNUSED) @@ -1215,6 +1287,10 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "", 0, 0, dpif_netdev_subtable_lookup_get, NULL); + unixctl_command_register("dpif-netdev/dpif-set", + "[dpif implementation name] [dp]", + 1, 2, dpif_netdev_impl_set, + NULL); return 0; } @@ -6038,8 +6114,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, hmap_init(&pmd->send_port_cache); cmap_init(&pmd->tx_bonds); - /* Initialize the DPIF function pointer to the default scalar version */ - pmd->netdev_input_func = dp_netdev_input; + /* Initialize DPIF function pointer to the default configured version. */ + pmd->netdev_input_func = dp_netdev_impl_get_default(); /* init the 'flow_cache' since there is no * actual thread created for NON_PMD_CORE_ID. */ @@ -6973,7 +7049,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, } } -static int32_t +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, odp_port_t port_no) From patchwork Wed Apr 7 09:34:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463263 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMF56p4z9sV5 for ; Wed, 7 Apr 2021 19:32:57 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 8FA4441A7B; Wed, 7 Apr 2021 09:32:54 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ElzwRFVmjTvL; Wed, 7 Apr 2021 09:32:52 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTP id A843840637; Wed, 7 Apr 2021 09:32:37 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 06469C0026; Wed, 7 Apr 2021 09:32:34 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 22166C0023 for ; Wed, 7 Apr 2021 09:32:32 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 9A4D260BE0 for ; Wed, 7 Apr 2021 09:32:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7R9usHm-nez0 for ; Wed, 7 Apr 2021 09:32:30 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 9B77B60BDC for ; Wed, 7 Apr 2021 09:32:30 +0000 (UTC) IronPort-SDR: w3tMdLOtreHtZYZvq6kNjldV+p/lL+PdthYS6DLy6QobC8wMyLS4wZ16qiHWLFwLt3oJhnDVuE Zr3H2wizL72Q== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344887" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344887" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:30 -0700 IronPort-SDR: ZqxCaocFPrU7f20Vy5gE9hvfupqhySJ747oK58B3PNJuSziW8RD6DUqEtiLhx8vcF64nAa0v/2 qk0vHEDi7Okw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253862" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:29 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:34 +0100 Message-Id: <20210407093442.41568-8-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 07/15] dpif-netdev: Add command to get dpif implementations. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit adds a new command to retrieve the list of available DPIF implementations. This can be used by to check what implementations of the DPIF are available in any given OVS binary. Usage: $ ovs-appctl dpif-netdev/dpif-get Signed-off-by: Harry van Haaren Signed-off-by: Cian Ferriter --- lib/dpif-netdev-private-dpif.c | 7 +++++++ lib/dpif-netdev-private-dpif.h | 6 ++++++ lib/dpif-netdev.c | 24 ++++++++++++++++++++++++ 3 files changed, 37 insertions(+) diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c index 28143c82c..545b36654 100644 --- a/lib/dpif-netdev-private-dpif.c +++ b/lib/dpif-netdev-private-dpif.c @@ -61,6 +61,13 @@ dp_netdev_impl_get_default(void) return func; } +uint32_t +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls) +{ + ovs_assert(out_impls); + *out_impls = dpif_impls; + return ARRAY_SIZE(dpif_impls); +} /* This function checks all available DPIF implementations, and selects the * returns the function pointer to the one requested by "name". diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h index e6793364c..d1dae3d58 100644 --- a/lib/dpif-netdev-private-dpif.h +++ b/lib/dpif-netdev-private-dpif.h @@ -47,6 +47,12 @@ struct dpif_netdev_impl_info_t { const char *name; }; +/* This function returns all available implementations to the caller. The + * quantity of implementations is returned by the int return value. + */ +uint32_t +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls); + /* This function checks all available DPIF implementations, and selects the * returns the function pointer to the one requested by "name". */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8e3d773d4..21c097b10 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -993,6 +993,27 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, ds_destroy(&reply); } +static void +dpif_netdev_impl_get(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED) +{ + const struct dpif_netdev_impl_info_t *dpif_impls; + uint32_t count = dp_netdev_impl_get(&dpif_impls); + if (count == 0) { + unixctl_command_reply_error(conn, "error getting dpif names"); + return; + } + + /* Add all dpif functions to reply string. */ + struct ds reply = DS_EMPTY_INITIALIZER; + ds_put_cstr(&reply, "Available DPIF implementations:\n"); + for (uint32_t i = 0; i < count; i++) { + ds_put_format(&reply, " %s\n", dpif_impls[i].name); + } + unixctl_command_reply(conn, ds_cstr(&reply)); + ds_destroy(&reply); +} + static void dpif_netdev_impl_set(struct unixctl_conn *conn, int argc, const char *argv[], void *aux OVS_UNUSED) @@ -1291,6 +1312,9 @@ dpif_netdev_init(void) "[dpif implementation name] [dp]", 1, 2, dpif_netdev_impl_set, NULL); + unixctl_command_register("dpif-netdev/dpif-get", "", + 0, 0, dpif_netdev_impl_get, + NULL); return 0; } From patchwork Wed Apr 7 09:34:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463267 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMN40mYz9sV5 for ; Wed, 7 Apr 2021 19:33:04 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 6319C41AB0; Wed, 7 Apr 2021 09:33:02 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IFkTBrku4lGb; Wed, 7 Apr 2021 09:33:00 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTP id B1CE740E6E; Wed, 7 Apr 2021 09:32:40 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0EF00C0024; Wed, 7 Apr 2021 09:32:37 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 01648C0017 for ; Wed, 7 Apr 2021 09:32:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3EE6760BE5 for ; Wed, 7 Apr 2021 09:32:33 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PVFP9o_hRNMe for ; Wed, 7 Apr 2021 09:32:32 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id ECCC260BF5 for ; Wed, 7 Apr 2021 09:32:31 +0000 (UTC) IronPort-SDR: KdYhe0da/+gqBlyWg+sprQh5Q7QWsH4TGtwhuc/H+M/31OCn15lxsprkXl2X+E5B46t27DKVMH i9k6BHpbd6Yw== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344888" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344888" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:31 -0700 IronPort-SDR: rT/xG0UeLHva8vjq1GZQS2+JkXMI2i6Nn2x1nsTRIdBVf2qq4iwvZTMJD2L0NRg5z2cfsBb4qP pDtp0PdPzqgw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253871" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:30 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:35 +0100 Message-Id: <20210407093442.41568-9-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 08/15] docs/dpdk/bridge: Add dpif performance section. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This section details how two new commands can be used to list and select the different dpif implementations. It also details how a non default dpif implementation can be tested with the OVS unit test suite. Add NEWS updates for the dpif-netdev.c refactor and the new dpif implementations/commands. Signed-off-by: Cian Ferriter --- v8: - Merge NEWS file items into one Userspace Datapath: heading --- Documentation/topics/dpdk/bridge.rst | 37 ++++++++++++++++++++++++++++ NEWS | 4 +++ 2 files changed, 41 insertions(+) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 526d5c959..ca90d7bdb 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -214,3 +214,40 @@ implementation :: Compile OVS in debug mode to have `ovs_assert` statements error out if there is a mis-match in the DPCLS lookup implementation. + +Datapath Interface Performance +------------------------------ + +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking +packets through the major components of the userspace datapath; such as +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance +stats associated with the datapath. + +Just like with the SIMD DPCLS work above, SIMD can be applied to the DPIF to +improve performance. + +OVS provides multiple implementations of the DPIF. These can be listed with the +following command :: + + $ ovs-appctl dpif-netdev/dpif-get + Available DPIF implementations: + dpif_scalar + dpif_avx512 + +By default, dpif_scalar is used. The DPIF implementation can be selected by +name :: + + $ ovs-appctl dpif-netdev/dpif-set dpif_avx512 + DPIF implementation set to dpif_avx512. + + $ ovs-appctl dpif-netdev/dpif-set dpif_scalar + DPIF implementation set to dpif_scalar. + +Running Unit Tests with AVX512 DPIF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Since the AVX512 DPIF is disabled by default, a compile time option is +available in order to test it with the OVS unit test suite. When building with +a CPU that supports AVX512, use the following configure option :: + + $ ./configure --enable-dpif-default-avx512 diff --git a/NEWS b/NEWS index 95cf922aa..71e7b9047 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,10 @@ Post-v2.15.0 * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. * New command 'record-hostname-if-not-set' to update hostname in ovsdb. + * Refactor lib/dpif-netdev.c to multiple header files. + * Add avx512 implementation of dpif which can process non recirculated + packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. + * Add commands to get and set the dpif implementations. v2.15.0 - 15 Feb 2021 From patchwork Wed Apr 7 09:34:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463265 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMK0C2Cz9sV5 for ; Wed, 7 Apr 2021 19:33:00 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 353EF60D58; Wed, 7 Apr 2021 09:32:58 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i0QHB1L4uzBc; Wed, 7 Apr 2021 09:32:54 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTP id 8E8D460D59; Wed, 7 Apr 2021 09:32:42 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8051AC0012; Wed, 7 Apr 2021 09:32:40 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 182DAC0012 for ; Wed, 7 Apr 2021 09:32:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id C45F560BFE for ; Wed, 7 Apr 2021 09:32:34 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uAdxHDnUWpjf for ; Wed, 7 Apr 2021 09:32:33 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id B4A6960BE3 for ; Wed, 7 Apr 2021 09:32:33 +0000 (UTC) IronPort-SDR: Jq0zFjwBeidedUJnyc44p4+BEh25eLyoGdGd2X0H/rlISEOX3YG+h+PNX45c3N+BOH1x/ZcHqA duhgVVzvqsmQ== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344892" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344892" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:33 -0700 IronPort-SDR: rLxIFfMdvjdR7IhI8t81YD8omVpSKCDPuM+X0XLMUFCtDmnteg/ETc6OaAZd7CFuvnE/7W9sDn QpaGoyLliURQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253882" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:31 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:36 +0100 Message-Id: <20210407093442.41568-10-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 09/15] dpif-netdev/dpcls: Refactor function names to dpcls. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit refactors the function names from netdev_* namespace to the dpcls_* namespace, as they are only used by dpcls code. With the name change, it becomes more obvious that the functions belong to dpcls functionality, and in the dpif-netdev-private-dpcls.h header file. Signed-off-by: Harry van Haaren Signed-off-by: Cian Ferriter --- lib/dpif-netdev-private-dpcls.h | 6 ++---- lib/dpif-netdev.c | 21 ++++++++++----------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/lib/dpif-netdev-private-dpcls.h b/lib/dpif-netdev-private-dpcls.h index f223a93e4..28c6a10ff 100644 --- a/lib/dpif-netdev-private-dpcls.h +++ b/lib/dpif-netdev-private-dpcls.h @@ -97,10 +97,8 @@ struct dpcls_subtable { /* Generates a mask for each bit set in the subtable's miniflow. */ void -netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, - uint64_t *mf_masks, - const uint32_t mf_bits_u0, - const uint32_t mf_bits_u1); +dpcls_flow_key_gen_masks(const struct netdev_flow_key *tbl, uint64_t *mf_masks, + const uint32_t mf_bits_u0, const uint32_t mf_bits_u1); /* Matches a dpcls rule against the incoming packet in 'target' */ bool dpcls_rule_matches_key(const struct dpcls_rule *rule, diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 21c097b10..6e3e67a21 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -8306,7 +8306,7 @@ dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask) subtable->mf_bits_set_unit0 = unit0; subtable->mf_bits_set_unit1 = unit1; subtable->mf_masks = xmalloc(sizeof(uint64_t) * (unit0 + unit1)); - netdev_flow_key_gen_masks(mask, subtable->mf_masks, unit0, unit1); + dpcls_flow_key_gen_masks(mask, subtable->mf_masks, unit0, unit1); /* Get the preferred subtable search function for this (u0,u1) subtable. * The function is guaranteed to always return a valid implementation, and @@ -8481,11 +8481,10 @@ dpcls_remove(struct dpcls *cls, struct dpcls_rule *rule) } } -/* Inner loop for mask generation of a unit, see netdev_flow_key_gen_masks. */ +/* Inner loop for mask generation of a unit, see dpcls_flow_key_gen_masks. */ static inline void -netdev_flow_key_gen_mask_unit(uint64_t iter, - const uint64_t count, - uint64_t *mf_masks) +dpcls_flow_key_gen_mask_unit(uint64_t iter, const uint64_t count, + uint64_t *mf_masks) { int i; for (i = 0; i < count; i++) { @@ -8506,16 +8505,16 @@ netdev_flow_key_gen_mask_unit(uint64_t iter, * @param mf_bits_unit0 Number of bits set in unit0 of the miniflow */ void -netdev_flow_key_gen_masks(const struct netdev_flow_key *tbl, - uint64_t *mf_masks, - const uint32_t mf_bits_u0, - const uint32_t mf_bits_u1) +dpcls_flow_key_gen_masks(const struct netdev_flow_key *tbl, + uint64_t *mf_masks, + const uint32_t mf_bits_u0, + const uint32_t mf_bits_u1) { uint64_t iter_u0 = tbl->mf.map.bits[0]; uint64_t iter_u1 = tbl->mf.map.bits[1]; - netdev_flow_key_gen_mask_unit(iter_u0, mf_bits_u0, &mf_masks[0]); - netdev_flow_key_gen_mask_unit(iter_u1, mf_bits_u1, &mf_masks[mf_bits_u0]); + dpcls_flow_key_gen_mask_unit(iter_u0, mf_bits_u0, &mf_masks[0]); + dpcls_flow_key_gen_mask_unit(iter_u1, mf_bits_u1, &mf_masks[mf_bits_u0]); } /* Returns true if 'target' satisfies 'key' in 'mask', that is, if each 1-bit From patchwork Wed Apr 7 09:34:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463264 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMH2sqYz9sV5 for ; Wed, 7 Apr 2021 19:32:59 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 28394418DB; Wed, 7 Apr 2021 09:32:57 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iy3ECWkpgVO4; Wed, 7 Apr 2021 09:32:54 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTP id B165E418E7; Wed, 7 Apr 2021 09:32:47 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7CC01C000C; Wed, 7 Apr 2021 09:32:47 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 10763C000A for ; Wed, 7 Apr 2021 09:32:46 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 0235260BFD for ; Wed, 7 Apr 2021 09:32:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7UaTGoyyvUVi for ; Wed, 7 Apr 2021 09:32:35 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 442A360BF4 for ; Wed, 7 Apr 2021 09:32:35 +0000 (UTC) IronPort-SDR: 8o9CcilGLpfmLXjA9ruZsbYbrmnOkG10pCJZ96A/RUBixur1Tq2qUTHXCh6zxzZ7RJO4byVMsa IzCPMSqeSXFA== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344896" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344896" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:34 -0700 IronPort-SDR: nR6pQNJFEUVBm4T8dBI7Q1vPxtvggCLzv0elfcTIABH4QM2SuHhuCSXqbMmnE9mje1Wlt7KICe MIU82q/SCLMg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253890" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:33 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:37 +0100 Message-Id: <20210407093442.41568-11-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 10/15] dpif-netdev/dpcls-avx512: enable 16 block processing. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit implements larger subtable searches in avx512. A limitation of the previous implementation was that up to 8 blocks of miniflow data could be matched on (so a subtable with 8 blocks was handled in avx, but 9 blocks or more would fall back to scalar/generic). This limitation is removed in this patch, where up to 16 blocks of subtable can be matched on. From an implementation perspective, the key to enabling 16 blocks over 8 blocks was to do bitmask calculation up front, and then use the pre-calculated bitmasks for 2x passes of the "blocks gather" routine. The bitmasks need to be shifted for k-mask usage in the upper (8-15) block range, but it is relatively trivial. This also helps in case expanding to 24 blocks is desired in future. The implementation of the 2nd iteration to handle > 8 blocks is behind a conditional branch which checks the total number of bits. This helps the specialized versions of the function that have a miniflow fingerprint of less-than-or-equal 8 blocks, as the code can be statically stripped out of those functions. Specialized functions that do require more than 8 blocks will have the branch removed and unconditionally execute the 2nd blocks gather routine. Lastly, the _any() flavour will have the conditional branch, and the branch predictor may mispredict a bit, but per burst will likely get most packets correct (particularly towards the middle and end of a burst). The code has been run with unit tests under autovalidation and passes all cases, and unit test coverage has been checked to ensure the 16 block code paths are executing. Signed-off-by: Harry van Haaren --- v9: Fixup post 2.15 rebase on NEWS v8: Add NEWS entry Signed-off-by: Cian Ferriter --- NEWS | 1 + lib/dpif-netdev-lookup-avx512-gather.c | 203 ++++++++++++++++++------- 2 files changed, 147 insertions(+), 57 deletions(-) diff --git a/NEWS b/NEWS index 71e7b9047..26cfae908 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,7 @@ Post-v2.15.0 * Add avx512 implementation of dpif which can process non recirculated packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. * Add commands to get and set the dpif implementations. + * Enable AVX512 optimized DPCLS to search subtables with larger miniflows. v2.15.0 - 15 Feb 2021 diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 8fc1cdfa5..1f27c0536 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -34,7 +34,21 @@ * AVX512 code at a time. */ #define NUM_U64_IN_ZMM_REG (8) -#define BLOCKS_CACHE_SIZE (NETDEV_MAX_BURST * NUM_U64_IN_ZMM_REG) + +/* This implementation of AVX512 gather allows up to 16 blocks of MF data to be + * present in the blocks_cache, hence the multiply by 2 in the blocks count. + */ +#define MF_BLOCKS_PER_PACKET (NUM_U64_IN_ZMM_REG * 2) + +/* Blocks cache size is the maximum number of miniflow blocks that this + * implementation of lookup can handle. + */ +#define BLOCKS_CACHE_SIZE (NETDEV_MAX_BURST * MF_BLOCKS_PER_PACKET) + +/* The gather instruction can handle a scale for the size of the items to + * gather. For uint64_t data, this scale is 8. + */ +#define GATHER_SCALE_8 (8) VLOG_DEFINE_THIS_MODULE(dpif_lookup_avx512_gather); @@ -69,22 +83,83 @@ netdev_rule_matches_key(const struct dpcls_rule *rule, { const uint64_t *keyp = miniflow_get_values(&rule->flow.mf); const uint64_t *maskp = miniflow_get_values(&rule->mask->mf); - const uint32_t lane_mask = (1 << mf_bits_total) - 1; + const uint32_t lane_mask = (1ULL << mf_bits_total) - 1; /* Always load a full cache line from blocks_cache. Other loads must be * trimmed to the amount of data required for mf_bits_total blocks. */ - __m512i v_blocks = _mm512_loadu_si512(&block_cache[0]); - __m512i v_mask = _mm512_maskz_loadu_epi64(lane_mask, &maskp[0]); - __m512i v_key = _mm512_maskz_loadu_epi64(lane_mask, &keyp[0]); + uint32_t res_mask; + + { + __m512i v_blocks = _mm512_loadu_si512(&block_cache[0]); + __m512i v_mask = _mm512_maskz_loadu_epi64(lane_mask, &maskp[0]); + __m512i v_key = _mm512_maskz_loadu_epi64(lane_mask, &keyp[0]); + __m512i v_data = _mm512_and_si512(v_blocks, v_mask); + res_mask = _mm512_mask_cmpeq_epi64_mask(lane_mask, v_data, v_key); + } - __m512i v_data = _mm512_and_si512(v_blocks, v_mask); - uint32_t res_mask = _mm512_mask_cmpeq_epi64_mask(lane_mask, v_data, v_key); + if (mf_bits_total > 8) { + uint32_t lane_mask_gt8 = lane_mask >> 8; + __m512i v_blocks = _mm512_loadu_si512(&block_cache[8]); + __m512i v_mask = _mm512_maskz_loadu_epi64(lane_mask_gt8, &maskp[8]); + __m512i v_key = _mm512_maskz_loadu_epi64(lane_mask_gt8, &keyp[8]); + __m512i v_data = _mm512_and_si512(v_blocks, v_mask); + uint32_t c = _mm512_mask_cmpeq_epi64_mask(lane_mask_gt8, v_data, + v_key); + res_mask |= (c << 8); + } - /* returns 1 assuming result of SIMD compare is all blocks. */ + /* returns 1 assuming result of SIMD compare is all blocks matching. */ return res_mask == lane_mask; } +/* Takes u0 and u1 inputs, and gathers the next 8 blocks to be stored + * contigously into the blocks cache. Note that the pointers and bitmasks + * passed into this function must be incremented for handling next 8 blocks. + */ +static inline ALWAYS_INLINE __m512i +avx512_blocks_gather(__m512i v_u0, /* reg of u64 of all u0 bits */ + __m512i v_u1, /* reg of u64 of all u1 bits */ + const uint64_t *pkt_blocks, /* ptr pkt blocks to load */ + const void *tbl_blocks, /* ptr to blocks in tbl */ + const void *tbl_mf_masks, /* ptr to subtable mf masks */ + __mmask64 u1_bcast_msk, /* mask of u1 lanes */ + const uint64_t pkt_mf_u0_pop, /* num bits in u0 of pkt */ + __mmask64 zero_mask, /* maskz if pkt not have mf bit */ + __mmask64 u64_lanes_mask) /* total lane count to use */ +{ + /* Suggest to compiler to load tbl blocks ahead of gather() */ + __m512i v_tbl_blocks = _mm512_maskz_loadu_epi64(u64_lanes_mask, + tbl_blocks); + + /* Blend u0 and u1 bits together for these 8 blocks */ + __m512i v_pkt_bits = _mm512_mask_blend_epi64(u1_bcast_msk, v_u0, v_u1); + + /* Load pre-created tbl miniflow bitmasks, bitwise AND with them */ + __m512i v_tbl_masks = _mm512_maskz_loadu_epi64(u64_lanes_mask, + tbl_mf_masks); + __m512i v_masks = _mm512_and_si512(v_pkt_bits, v_tbl_masks); + + /* Manual AVX512 popcount for u64 lanes. */ + __m512i v_popcnts = _mm512_popcnt_epi64_manual(v_masks); + + /* Add popcounts and offset for u1 bits. */ + __m512i v_idx_u0_offset = _mm512_maskz_set1_epi64(u1_bcast_msk, + pkt_mf_u0_pop); + __m512i v_indexes = _mm512_add_epi64(v_popcnts, v_idx_u0_offset); + + /* Gather u64 blocks from packet miniflow. */ + __m512i v_zeros = _mm512_setzero_si512(); + __m512i v_blocks = _mm512_mask_i64gather_epi64(v_zeros, u64_lanes_mask, + v_indexes, pkt_blocks, + GATHER_SCALE_8); + + /* Mask pkt blocks with subtable blocks, k-mask to zero lanes */ + __m512i v_masked_blocks = _mm512_maskz_and_epi64(zero_mask, v_blocks, + v_tbl_blocks); + return v_masked_blocks; +} + static inline uint32_t ALWAYS_INLINE avx512_lookup_impl(struct dpcls_subtable *subtable, uint32_t keys_map, @@ -94,76 +169,86 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, const uint32_t bit_count_u1) { OVS_ALIGNED_VAR(CACHE_LINE_SIZE)uint64_t block_cache[BLOCKS_CACHE_SIZE]; - - const uint32_t bit_count_total = bit_count_u0 + bit_count_u1; - int i; uint32_t hashes[NETDEV_MAX_BURST]; + const uint32_t n_pkts = __builtin_popcountll(keys_map); ovs_assert(NETDEV_MAX_BURST >= n_pkts); + const uint32_t bit_count_total = bit_count_u0 + bit_count_u1; + const uint64_t bit_count_total_mask = (1ULL << bit_count_total) - 1; + const uint64_t tbl_u0 = subtable->mask.mf.map.bits[0]; const uint64_t tbl_u1 = subtable->mask.mf.map.bits[1]; - /* Load subtable blocks for masking later. */ const uint64_t *tbl_blocks = miniflow_get_values(&subtable->mask.mf); - const __m512i v_tbl_blocks = _mm512_loadu_si512(&tbl_blocks[0]); - - /* Load pre-created subtable masks for each block in subtable. */ - const __mmask8 bit_count_total_mask = (1 << bit_count_total) - 1; - const __m512i v_mf_masks = _mm512_maskz_loadu_epi64(bit_count_total_mask, - subtable->mf_masks); + const uint64_t *tbl_mf_masks = subtable->mf_masks; + int i; ULLONG_FOR_EACH_1 (i, keys_map) { + /* Create mask register with packet-specific u0 offset. + * Note that as 16 blocks can be handled in total, the width of the + * mask register must be >=16. + */ const uint64_t pkt_mf_u0_bits = keys[i]->mf.map.bits[0]; const uint64_t pkt_mf_u0_pop = __builtin_popcountll(pkt_mf_u0_bits); - - /* Pre-create register with *PER PACKET* u0 offset. */ - const __mmask8 u1_bcast_mask = (UINT8_MAX << bit_count_u0); - const __m512i v_idx_u0_offset = _mm512_maskz_set1_epi64(u1_bcast_mask, - pkt_mf_u0_pop); + const __mmask64 u1_bcast_mask = (UINT64_MAX << bit_count_u0); /* Broadcast u0, u1 bitmasks to 8x u64 lanes. */ - __m512i v_u0 = _mm512_set1_epi64(pkt_mf_u0_bits); - __m512i v_pkt_bits = _mm512_mask_set1_epi64(v_u0, u1_bcast_mask, - keys[i]->mf.map.bits[1]); - - /* Bitmask by pre-created masks. */ - __m512i v_masks = _mm512_and_si512(v_pkt_bits, v_mf_masks); - - /* Manual AVX512 popcount for u64 lanes. */ - __m512i v_popcnts = _mm512_popcnt_epi64_manual(v_masks); - - /* Offset popcounts for u1 with pre-created offset register. */ - __m512i v_indexes = _mm512_add_epi64(v_popcnts, v_idx_u0_offset); - - /* Gather u64 blocks from packet miniflow. */ - const __m512i v_zeros = _mm512_setzero_si512(); - const void *pkt_data = miniflow_get_values(&keys[i]->mf); - __m512i v_all_blocks = _mm512_mask_i64gather_epi64(v_zeros, - bit_count_total_mask, v_indexes, - pkt_data, 8); + __m512i v_u0 = _mm512_set1_epi64(keys[i]->mf.map.bits[0]); + __m512i v_u1 = _mm512_set1_epi64(keys[i]->mf.map.bits[1]); /* Zero out bits that pkt doesn't have: * - 2x pext() to extract bits from packet miniflow as needed by TBL * - Shift u1 over by bit_count of u0, OR to create zero bitmask */ - uint64_t u0_to_zero = _pext_u64(keys[i]->mf.map.bits[0], tbl_u0); - uint64_t u1_to_zero = _pext_u64(keys[i]->mf.map.bits[1], tbl_u1); - uint64_t zero_mask = (u1_to_zero << bit_count_u0) | u0_to_zero; - - /* Mask blocks using AND with subtable blocks, use k-mask to zero - * where lanes as required for this packet. - */ - __m512i v_masked_blocks = _mm512_maskz_and_epi64(zero_mask, - v_all_blocks, v_tbl_blocks); + uint64_t u0_to_zero = _pext_u64(keys[i]->mf.map.bits[0], tbl_u0); + uint64_t u1_to_zero = _pext_u64(keys[i]->mf.map.bits[1], tbl_u1); + const uint64_t zero_mask_wip = (u1_to_zero << bit_count_u0) | + u0_to_zero; + const uint64_t zero_mask = zero_mask_wip & bit_count_total_mask; + + /* Get ptr to packet data blocks */ + const uint64_t *pkt_blocks = miniflow_get_values(&keys[i]->mf); + + /* Store first 8 blocks cache, full cache line aligned. */ + __m512i v_blocks = avx512_blocks_gather(v_u0, v_u1, + &pkt_blocks[0], + &tbl_blocks[0], + &tbl_mf_masks[0], + u1_bcast_mask, + pkt_mf_u0_pop, + zero_mask, + bit_count_total_mask); + _mm512_storeu_si512(&block_cache[i * MF_BLOCKS_PER_PACKET], v_blocks); + + if (bit_count_total > 8) { + /* Shift masks over by 8. + * Pkt blocks pointer remains 0, it is incremented by popcount. + * Move tbl and mf masks pointers forward. + * Increase offsets by 8. + * Re-run same gather code. + */ + uint64_t zero_mask_gt8 = (zero_mask >> 8); + uint64_t u1_bcast_mask_gt8 = (u1_bcast_mask >> 8); + uint64_t bit_count_gt8_mask = bit_count_total_mask >> 8; + + __m512i v_blocks_gt8 = avx512_blocks_gather(v_u0, v_u1, + &pkt_blocks[0], + &tbl_blocks[8], + &tbl_mf_masks[8], + u1_bcast_mask_gt8, + pkt_mf_u0_pop, + zero_mask_gt8, + bit_count_gt8_mask); + _mm512_storeu_si512(&block_cache[(i * MF_BLOCKS_PER_PACKET) + 8], + v_blocks_gt8); + } - /* Store to blocks cache, full cache line aligned. */ - _mm512_storeu_si512(&block_cache[i * 8], v_masked_blocks); } /* Hash the now linearized blocks of packet metadata. */ ULLONG_FOR_EACH_1 (i, keys_map) { - uint64_t *block_ptr = &block_cache[i * 8]; + uint64_t *block_ptr = &block_cache[i * MF_BLOCKS_PER_PACKET]; uint32_t hash = hash_add_words64(0, block_ptr, bit_count_total); hashes[i] = hash_finish(hash, bit_count_total * 8); } @@ -183,7 +268,7 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, struct dpcls_rule *rule; CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) { - const uint32_t cidx = i * 8; + const uint32_t cidx = i * MF_BLOCKS_PER_PACKET; uint32_t match = netdev_rule_matches_key(rule, bit_count_total, &block_cache[cidx]); if (OVS_LIKELY(match)) { @@ -220,7 +305,7 @@ DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) /* Check if a specialized function is valid for the required subtable. */ #define CHECK_LOOKUP_FUNCTION(U0, U1) \ - ovs_assert((U0 + U1) <= NUM_U64_IN_ZMM_REG); \ + ovs_assert((U0 + U1) <= (NUM_U64_IN_ZMM_REG * 2)); \ if (!f && u0_bits == U0 && u1_bits == U1) { \ f = dpcls_avx512_gather_mf_##U0##_##U1; \ } @@ -250,7 +335,11 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, uint32_t u1_bits) CHECK_LOOKUP_FUNCTION(4, 1); CHECK_LOOKUP_FUNCTION(4, 0); - if (!f && (u0_bits + u1_bits) < NUM_U64_IN_ZMM_REG) { + /* Check if the _any looping version of the code can perform this miniflow + * lookup. Performance gain may be less pronounced due to non-specialized + * hashing, however there is usually a good performance win overall. + */ + if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) { f = dpcls_avx512_gather_mf_any; VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n", u0_bits, u1_bits); From patchwork Wed Apr 7 09:34:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463268 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMV1tMGz9sV5 for ; Wed, 7 Apr 2021 19:33:10 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 6A4356164E; Wed, 7 Apr 2021 09:33:07 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XCiqfMJbPSkn; Wed, 7 Apr 2021 09:33:05 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTP id 2F5B960D55; Wed, 7 Apr 2021 09:32:51 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 34EFCC0019; Wed, 7 Apr 2021 09:32:50 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1BA5CC0017 for ; Wed, 7 Apr 2021 09:32:49 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id B3A4760C32 for ; Wed, 7 Apr 2021 09:32:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p73JBMaA34wU for ; Wed, 7 Apr 2021 09:32:38 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id A8B4860C08 for ; Wed, 7 Apr 2021 09:32:36 +0000 (UTC) IronPort-SDR: KJUHbK2vlGfN3I4MZ+g2iWJuW1Q893J9adsKvmz+FyvQXpwOYjFPky7u/uXXKh/oGHjju7tRp4 TPTT2/YbSUDQ== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344899" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344899" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:36 -0700 IronPort-SDR: 4RkiNcHYRyr98jHVe1CAD8Rp5rjxBTixx/qY+hLXjT/Kkr2Wh0QMlsx6/L8rs4j4mIgUWhQYBY rIwBiiEgLblA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253899" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:34 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:38 +0100 Message-Id: <20210407093442.41568-12-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 11/15] dpif-netdev/dpcls: specialize more subtable signatures. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit adds more subtables to be specialized. The traffic pattern here being matched is VXLAN traffic subtables, which commonly have (5,3), (9,1) and (9,4) subtable fingerprints. Signed-off-by: Harry van Haaren --- v8: Add NEWS entry. Signed-off-by: Cian Ferriter --- NEWS | 2 ++ lib/dpif-netdev-lookup-avx512-gather.c | 6 ++++++ lib/dpif-netdev-lookup-generic.c | 6 ++++++ 3 files changed, 14 insertions(+) diff --git a/NEWS b/NEWS index 26cfae908..31def36b3 100644 --- a/NEWS +++ b/NEWS @@ -14,6 +14,8 @@ Post-v2.15.0 packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. * Add commands to get and set the dpif implementations. * Enable AVX512 optimized DPCLS to search subtables with larger miniflows. + * Add more specialized DPCLS subtables to cover common rules, enhancing + the lookup performance. v2.15.0 - 15 Feb 2021 diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 1f27c0536..3a684fadf 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -299,6 +299,9 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, return avx512_lookup_impl(subtable, keys_map, keys, rules, U0, U1); \ } \ +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) @@ -331,6 +334,9 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, uint32_t u1_bits) return NULL; } + CHECK_LOOKUP_FUNCTION(9, 4); + CHECK_LOOKUP_FUNCTION(9, 1); + CHECK_LOOKUP_FUNCTION(5, 3); CHECK_LOOKUP_FUNCTION(5, 1); CHECK_LOOKUP_FUNCTION(4, 1); CHECK_LOOKUP_FUNCTION(4, 0); diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c index e3b6be4b6..6c74ac3a1 100644 --- a/lib/dpif-netdev-lookup-generic.c +++ b/lib/dpif-netdev-lookup-generic.c @@ -282,6 +282,9 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, return lookup_generic_impl(subtable, keys_map, keys, rules, U0, U1); \ } \ +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1) +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) @@ -303,6 +306,9 @@ dpcls_subtable_generic_probe(uint32_t u0_bits, uint32_t u1_bits) { dpcls_subtable_lookup_func f = NULL; + CHECK_LOOKUP_FUNCTION(9, 4); + CHECK_LOOKUP_FUNCTION(9, 1); + CHECK_LOOKUP_FUNCTION(5, 3); CHECK_LOOKUP_FUNCTION(5, 1); CHECK_LOOKUP_FUNCTION(4, 1); CHECK_LOOKUP_FUNCTION(4, 0); From patchwork Wed Apr 7 09:34:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463269 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMY0jbqz9sV5 for ; Wed, 7 Apr 2021 19:33:13 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 56FAC84AE6; Wed, 7 Apr 2021 09:33:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id paO5godI7Nbg; Wed, 7 Apr 2021 09:33:08 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTP id DF61E84BCD; Wed, 7 Apr 2021 09:33:02 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B8061C000C; Wed, 7 Apr 2021 09:33:02 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1BD93C000F for ; Wed, 7 Apr 2021 09:33:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 81FB160C0E for ; Wed, 7 Apr 2021 09:32:45 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ThkDcZL8oGRh for ; Wed, 7 Apr 2021 09:32:43 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 0158460C1A for ; Wed, 7 Apr 2021 09:32:37 +0000 (UTC) IronPort-SDR: 38o54/xcL/ERaImbPS58DZzPETpHtBqA5RQPgDSpeIatihTCQHzMpg8A99Y5E6e/Xp0uiBcv9o 283lJO7iPSMA== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344902" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344902" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:37 -0700 IronPort-SDR: fdUyWNwdijf8os3TgevJ1MCSSTKQ+N7FMQLrZ9Lc4zaaW96jmLrh3fRY4RrofqlDM3zmcrE3zN h6jDlOp4UohQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253909" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:36 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:39 +0100 Message-Id: <20210407093442.41568-13-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 12/15] dpdk: Cache result of CPU ISA checks. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren As a small optimization, this patch caches the result of a CPU ISA check from DPDK. Particularly in the case of running the DPCLS autovalidator (which repeatedly probes subtables) this reduces the amount of CPU ISA lookups from the DPDK level. By caching them at the OVS/dpdk.c level, the ISA checks remain runtime for the CPU where they are executed, but subsequent checks for the same ISA feature become much cheaper. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- v8: Add NEWS entry. --- NEWS | 1 + lib/dpdk.c | 28 ++++++++++++++++++++++++---- 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 31def36b3..c3102427e 100644 --- a/NEWS +++ b/NEWS @@ -40,6 +40,7 @@ v2.15.0 - 15 Feb 2021 - DPDK: * Removed support for vhost-user dequeue zero-copy. * Add support for DPDK 20.11. + * Cache results for CPU ISA checks, reduces overhead on repeated lookups. - Userspace datapath: * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which restricts a flow dump to a single PMD thread if set. diff --git a/lib/dpdk.c b/lib/dpdk.c index 319540394..c883a4b8b 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -614,13 +614,33 @@ print_dpdk_version(void) puts(rte_version()); } +/* Avoid calling rte_cpu_get_flag_enabled() excessively, by caching the + * result of the call for each CPU flag in a static variable. To avoid + * allocating large numbers of static variables, use a uint8 as a bitfield. + * Note the macro must only return if the ISA check is done and available. + */ +#define ISA_CHECK_DONE_BIT (1 << 0) +#define ISA_AVAILABLE_BIT (1 << 1) + #define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG) \ do { \ if (strncmp(feature, name_str, strlen(name_str)) == 0) { \ - int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ - VLOG_DBG("CPU flag %s, available %s\n", name_str, \ - has_isa ? "yes" : "no"); \ - return true; \ + static uint8_t isa_check_##RTE_CPUFLAG; \ + int check = isa_check_##RTE_CPUFLAG & ISA_CHECK_DONE_BIT; \ + if (OVS_UNLIKELY(!check)) { \ + int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ + VLOG_DBG("CPU flag %s, available %s\n", \ + name_str, has_isa ? "yes" : "no"); \ + isa_check_##RTE_CPUFLAG = ISA_CHECK_DONE_BIT; \ + if (has_isa) { \ + isa_check_##RTE_CPUFLAG |= ISA_AVAILABLE_BIT; \ + } \ + } \ + if (isa_check_##RTE_CPUFLAG & ISA_AVAILABLE_BIT) { \ + return true; \ + } else { \ + return false; \ + } \ } \ } while (0) From patchwork Wed Apr 7 09:34:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463270 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfMy0QLfz9sV5 for ; Wed, 7 Apr 2021 19:33:34 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id AC55384BB0; Wed, 7 Apr 2021 09:33:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id b9lW_GXdTXAw; Wed, 7 Apr 2021 09:33:27 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTP id 0242184B52; Wed, 7 Apr 2021 09:33:18 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B6CFFC000C; Wed, 7 Apr 2021 09:33:18 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0E79BC000C for ; Wed, 7 Apr 2021 09:33:17 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 977A260D92 for ; Wed, 7 Apr 2021 09:32:49 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cAFnWkwcUy58 for ; Wed, 7 Apr 2021 09:32:47 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 7771E60C29 for ; Wed, 7 Apr 2021 09:32:39 +0000 (UTC) IronPort-SDR: chqDymX12EMyenrwSt+Ja2Vg0USKdQW5FyIvm/dYA/7MHCyj7xr54rUlyMVA5LmGlXbNDUHgQ6 tPhyvrff9L8A== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344904" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344904" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:39 -0700 IronPort-SDR: ObZC6giAkDSWL58DrhdLoT0RjqhRljaYBmB8uKTtu+7DonHANp6R5awdFz3CPF3Tk/EzeRdLZU L34URCq/TgAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253915" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:37 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:40 +0100 Message-Id: <20210407093442.41568-14-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 13/15] dpcls-avx512: enabling avx512 vector popcount instruction. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit enables the AVX512-VPOPCNTDQ Vector Popcount instruction. This instruction is not available on every CPU that supports the AVX512-F Foundation ISA, hence it is enabled only when the additional VPOPCNTDQ ISA check is passed. The vector popcount instruction is used instead of the AVX512 popcount emulation code present in the avx512 optimized DPCLS today. It provides higher performance in the SIMD miniflow processing as that requires the popcount to calculate the miniflow block indexes. Signed-off-by: Harry van Haaren --- v8: Add NEWS entry. Signed-off-by: Cian Ferriter --- NEWS | 3 + lib/dpdk.c | 1 + lib/dpif-netdev-lookup-avx512-gather.c | 84 ++++++++++++++++++++------ 3 files changed, 70 insertions(+), 18 deletions(-) diff --git a/NEWS b/NEWS index c3102427e..61f34ffc1 100644 --- a/NEWS +++ b/NEWS @@ -16,6 +16,9 @@ Post-v2.15.0 * Enable AVX512 optimized DPCLS to search subtables with larger miniflows. * Add more specialized DPCLS subtables to cover common rules, enhancing the lookup performance. + * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the + CPU supports it. This enhances performance by using the native vpopcount + instructions, instead of the emulated version of vpopcount. v2.15.0 - 15 Feb 2021 diff --git a/lib/dpdk.c b/lib/dpdk.c index c883a4b8b..a9494a40f 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -655,6 +655,7 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature) #if __x86_64__ /* CPU flags only defined for the architecture that support it. */ CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F); + CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ); CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2); #endif diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index 3a684fadf..9a3273dc6 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -53,6 +53,15 @@ VLOG_DEFINE_THIS_MODULE(dpif_lookup_avx512_gather); + +/* Wrapper function required to enable ISA. */ +static inline __m512i +__attribute__((__target__("avx512vpopcntdq"))) +_mm512_popcnt_epi64_wrapper(__m512i v_in) +{ + return _mm512_popcnt_epi64(v_in); +} + static inline __m512i _mm512_popcnt_epi64_manual(__m512i v_in) { @@ -126,7 +135,8 @@ avx512_blocks_gather(__m512i v_u0, /* reg of u64 of all u0 bits */ __mmask64 u1_bcast_msk, /* mask of u1 lanes */ const uint64_t pkt_mf_u0_pop, /* num bits in u0 of pkt */ __mmask64 zero_mask, /* maskz if pkt not have mf bit */ - __mmask64 u64_lanes_mask) /* total lane count to use */ + __mmask64 u64_lanes_mask, /* total lane count to use */ + const uint32_t use_vpop) /* use AVX512 vpopcntdq */ { /* Suggest to compiler to load tbl blocks ahead of gather() */ __m512i v_tbl_blocks = _mm512_maskz_loadu_epi64(u64_lanes_mask, @@ -140,8 +150,15 @@ avx512_blocks_gather(__m512i v_u0, /* reg of u64 of all u0 bits */ tbl_mf_masks); __m512i v_masks = _mm512_and_si512(v_pkt_bits, v_tbl_masks); - /* Manual AVX512 popcount for u64 lanes. */ - __m512i v_popcnts = _mm512_popcnt_epi64_manual(v_masks); + /* Calculate AVX512 popcount for u64 lanes using the native instruction + * if available, or using emulation if not available. + */ + __m512i v_popcnts; + if (use_vpop) { + v_popcnts = _mm512_popcnt_epi64_wrapper(v_masks); + } else { + v_popcnts = _mm512_popcnt_epi64_manual(v_masks); + } /* Add popcounts and offset for u1 bits. */ __m512i v_idx_u0_offset = _mm512_maskz_set1_epi64(u1_bcast_msk, @@ -166,7 +183,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, const struct netdev_flow_key *keys[], struct dpcls_rule **rules, const uint32_t bit_count_u0, - const uint32_t bit_count_u1) + const uint32_t bit_count_u1, + const uint32_t use_vpop) { OVS_ALIGNED_VAR(CACHE_LINE_SIZE)uint64_t block_cache[BLOCKS_CACHE_SIZE]; uint32_t hashes[NETDEV_MAX_BURST]; @@ -218,7 +236,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, u1_bcast_mask, pkt_mf_u0_pop, zero_mask, - bit_count_total_mask); + bit_count_total_mask, + use_vpop); _mm512_storeu_si512(&block_cache[i * MF_BLOCKS_PER_PACKET], v_blocks); if (bit_count_total > 8) { @@ -239,7 +258,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, u1_bcast_mask_gt8, pkt_mf_u0_pop, zero_mask_gt8, - bit_count_gt8_mask); + bit_count_gt8_mask, + use_vpop); _mm512_storeu_si512(&block_cache[(i * MF_BLOCKS_PER_PACKET) + 8], v_blocks_gt8); } @@ -288,7 +308,11 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, return found_map; } -/* Expand out specialized functions with U0 and U1 bit attributes. */ +/* Expand out specialized functions with U0 and U1 bit attributes. As the + * AVX512 vpopcnt instruction is not supported on all AVX512 capable CPUs, + * create two functions for each miniflow signature. This allows the runtime + * CPU detection in probe() to select the ideal implementation. + */ #define DECLARE_OPTIMIZED_LOOKUP_FUNCTION(U0, U1) \ static uint32_t \ dpcls_avx512_gather_mf_##U0##_##U1(struct dpcls_subtable *subtable, \ @@ -296,7 +320,20 @@ avx512_lookup_impl(struct dpcls_subtable *subtable, const struct netdev_flow_key *keys[], \ struct dpcls_rule **rules) \ { \ - return avx512_lookup_impl(subtable, keys_map, keys, rules, U0, U1); \ + const uint32_t use_vpop = 0; \ + return avx512_lookup_impl(subtable, keys_map, keys, rules, \ + U0, U1, use_vpop); \ + } \ + \ + static uint32_t __attribute__((__target__("avx512vpopcntdq"))) \ + dpcls_avx512_gather_mf_##U0##_##U1##_vpop(struct dpcls_subtable *subtable,\ + uint32_t keys_map, \ + const struct netdev_flow_key *keys[], \ + struct dpcls_rule **rules) \ + { \ + const uint32_t use_vpop = 1; \ + return avx512_lookup_impl(subtable, keys_map, keys, rules, \ + U0, U1, use_vpop); \ } \ DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4) @@ -306,11 +343,18 @@ DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1) DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0) -/* Check if a specialized function is valid for the required subtable. */ -#define CHECK_LOOKUP_FUNCTION(U0, U1) \ +/* Check if a specialized function is valid for the required subtable. + * The use_vpop variable is used to decide if the VPOPCNT instruction can be + * used or not. + */ +#define CHECK_LOOKUP_FUNCTION(U0, U1, use_vpop) \ ovs_assert((U0 + U1) <= (NUM_U64_IN_ZMM_REG * 2)); \ if (!f && u0_bits == U0 && u1_bits == U1) { \ - f = dpcls_avx512_gather_mf_##U0##_##U1; \ + if (use_vpop) { \ + f = dpcls_avx512_gather_mf_##U0##_##U1##_vpop; \ + } else { \ + f = dpcls_avx512_gather_mf_##U0##_##U1; \ + } \ } static uint32_t @@ -318,9 +362,11 @@ dpcls_avx512_gather_mf_any(struct dpcls_subtable *subtable, uint32_t keys_map, const struct netdev_flow_key *keys[], struct dpcls_rule **rules) { + const uint32_t use_vpop = 0; return avx512_lookup_impl(subtable, keys_map, keys, rules, subtable->mf_bits_set_unit0, - subtable->mf_bits_set_unit1); + subtable->mf_bits_set_unit1, + use_vpop); } dpcls_subtable_lookup_func @@ -334,12 +380,14 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, uint32_t u1_bits) return NULL; } - CHECK_LOOKUP_FUNCTION(9, 4); - CHECK_LOOKUP_FUNCTION(9, 1); - CHECK_LOOKUP_FUNCTION(5, 3); - CHECK_LOOKUP_FUNCTION(5, 1); - CHECK_LOOKUP_FUNCTION(4, 1); - CHECK_LOOKUP_FUNCTION(4, 0); + int use_vpop = dpdk_get_cpu_has_isa("x86_64", "avx512vpopcntdq"); + + CHECK_LOOKUP_FUNCTION(9, 4, use_vpop); + CHECK_LOOKUP_FUNCTION(9, 1, use_vpop); + CHECK_LOOKUP_FUNCTION(5, 3, use_vpop); + CHECK_LOOKUP_FUNCTION(5, 1, use_vpop); + CHECK_LOOKUP_FUNCTION(4, 1, use_vpop); + CHECK_LOOKUP_FUNCTION(4, 0, use_vpop); /* Check if the _any looping version of the code can perform this miniflow * lookup. Performance gain may be less pronounced due to non-specialized From patchwork Wed Apr 7 09:34:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463271 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfN35QKKz9sV5 for ; Wed, 7 Apr 2021 19:33:39 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 29FCD418E9; Wed, 7 Apr 2021 09:33:36 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WaQYQKhLCnrK; Wed, 7 Apr 2021 09:33:34 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTP id 40AC44185B; Wed, 7 Apr 2021 09:33:24 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0F674C000F; Wed, 7 Apr 2021 09:33:24 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 26208C0017 for ; Wed, 7 Apr 2021 09:33:22 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id D21DF60BFC for ; Wed, 7 Apr 2021 09:32:50 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TEtqSEd1ze7n for ; Wed, 7 Apr 2021 09:32:48 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 9F9E460BF3 for ; Wed, 7 Apr 2021 09:32:40 +0000 (UTC) IronPort-SDR: ur63I4ExG7ljbTSZE9j1qbDq0I+AlTDvKgT/LxOcFwfnH1gvmdEco2rTMveN8PgECnjTk4+9AK 7z7uwES25jJQ== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344905" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344905" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:40 -0700 IronPort-SDR: yGHtAfrDmG9l/MI5Oz9nbvf9JMu2U7OgIa6cI0cllweq8k1ptyWM1+DQbQ1blZ+zopIUDfPaXS 5foowr0hVHVA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253921" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:39 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:41 +0100 Message-Id: <20210407093442.41568-15-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 14/15] dpif-netdev: Optimize dp output action X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit optimizes the output action, by enabling the compiler to optimize the code better through reducing code complexity. The core concept of this optimization is that the array-length checks have already been performed above the copying code, so can be removed. Removing of the per-packet length checks allows the compiler to auto-vectorize the stores using SIMD registers. Signed-off-by: Harry van Haaren --- v8: Add NEWS entry. Signed-off-by: Cian Ferriter --- NEWS | 1 + lib/dpif-netdev.c | 23 ++++++++++++++++++----- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index 61f34ffc1..0fa195acd 100644 --- a/NEWS +++ b/NEWS @@ -19,6 +19,7 @@ Post-v2.15.0 * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. + * Optimize dp_netdev_output by enhancing compiler optimization potential. v2.15.0 - 15 Feb 2021 diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 6e3e67a21..6f0202f99 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7282,12 +7282,25 @@ dp_execute_output_action(struct dp_netdev_pmd_thread *pmd, pmd->n_output_batches++; } - struct dp_packet *packet; - DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) { - p->output_pkts_rxqs[dp_packet_batch_size(&p->output_pkts)] = - pmd->ctx.last_rxq; - dp_packet_batch_add(&p->output_pkts, packet); + /* The above checks ensure that there is enough space in the output batch. + * Using dp_packet_batch_add() has a branch to check if the batch is full. + * This branch reduces the compiler's ability to optimize efficiently. The + * below code implements packet movement between batches without checks, + * with the required semantics of output batch perhaps containing packets. + */ + int batch_size = dp_packet_batch_size(packets_); + int out_batch_idx = dp_packet_batch_size(&p->output_pkts); + struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq; + struct dp_packet_batch *output_batch = &p->output_pkts; + + for (int i = 0; i < batch_size; i++) { + struct dp_packet *packet = packets_->packets[i]; + p->output_pkts_rxqs[out_batch_idx] = rxq; + output_batch->packets[out_batch_idx] = packet; + out_batch_idx++; } + output_batch->count += batch_size; + return true; } From patchwork Wed Apr 7 09:34:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ferriter, Cian" X-Patchwork-Id: 1463272 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FFfN701v1z9sWX for ; Wed, 7 Apr 2021 19:33:42 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 4FECC84A8A; Wed, 7 Apr 2021 09:33:41 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nhCijbSBFMuv; Wed, 7 Apr 2021 09:33:37 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTP id AD6F684C53; Wed, 7 Apr 2021 09:33:31 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7BD02C000A; Wed, 7 Apr 2021 09:33:31 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 3EF1EC000A for ; Wed, 7 Apr 2021 09:33:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id BFEC160C31 for ; Wed, 7 Apr 2021 09:32:53 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yNA26193ys3D for ; Wed, 7 Apr 2021 09:32:52 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by smtp3.osuosl.org (Postfix) with ESMTPS id 33ED260D54 for ; Wed, 7 Apr 2021 09:32:42 +0000 (UTC) IronPort-SDR: vKaxzTBTqIbtfsKeDIC4yTz2g5EI2Trx5oaPI1/uU1Ut69epBS4/qGL4TYYLQyFOUiKa1CCSG6 jCMKrNyqbIGg== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="173344909" X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="173344909" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2021 02:32:41 -0700 IronPort-SDR: wRl7T1ssuVxtZh5WrOl54W+sK+SmZAM2m/c750/MmhxVytnHX3lNHOfkllJ/3YF5gWcMgi3viy 2nUpMpuFgv6A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,203,1613462400"; d="scan'208";a="381253934" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.223.175]) by orsmga006.jf.intel.com with ESMTP; 07 Apr 2021 02:32:40 -0700 From: Cian Ferriter To: ovs-dev@openvswitch.org Date: Wed, 7 Apr 2021 10:34:42 +0100 Message-Id: <20210407093442.41568-16-cian.ferriter@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210407093442.41568-1-cian.ferriter@intel.com> References: <20210407093442.41568-1-cian.ferriter@intel.com> Cc: i.maximets@ovn.org Subject: [ovs-dev] [v10 15/15] netdev: Optimize netdev_send_prepare_batch X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren Optimize for the best case here where all packets will be compatible with 'netdev_flags'. Signed-off-by: Harry van Haaren Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter --- NEWS | 2 ++ lib/netdev.c | 31 ++++++++++++++++++++++--------- 2 files changed, 24 insertions(+), 9 deletions(-) diff --git a/NEWS b/NEWS index 0fa195acd..34a42250a 100644 --- a/NEWS +++ b/NEWS @@ -20,6 +20,8 @@ Post-v2.15.0 CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. * Optimize dp_netdev_output by enhancing compiler optimization potential. + * Optimize netdev sending by assuming the happy case, and using fallback + for if the netdev doesnt meet the required HWOL needs of a packet. v2.15.0 - 15 Feb 2021 diff --git a/lib/netdev.c b/lib/netdev.c index 91e91955c..29a5f1aa9 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -837,20 +837,33 @@ static void netdev_send_prepare_batch(const struct netdev *netdev, struct dp_packet_batch *batch) { - struct dp_packet *packet; - size_t i, size = dp_packet_batch_size(batch); + struct dp_packet *p; + uint32_t i, size = dp_packet_batch_size(batch); + char *err_msg = NULL; - DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { - char *errormsg = NULL; + for (i = 0; i < size; i++) { + p = batch->packets[i]; + int pkt_ok = netdev_send_prepare_packet(netdev->ol_flags, p, &err_msg); - if (netdev_send_prepare_packet(netdev->ol_flags, packet, &errormsg)) { - dp_packet_batch_refill(batch, packet, i); + if (OVS_UNLIKELY(!pkt_ok)) { + goto refill_loop; + } + } + + return; + +refill_loop: + /* Loop through packets from the start of the batch again. This is the + * exceptional case where packets aren't compatible with 'netdev_flags'. */ + DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, p, batch) { + if (netdev_send_prepare_packet(netdev->ol_flags, p, &err_msg)) { + dp_packet_batch_refill(batch, p, i); } else { - dp_packet_delete(packet); + dp_packet_delete(p); COVERAGE_INC(netdev_send_prepare_drops); VLOG_WARN_RL(&rl, "%s: Packet dropped: %s", - netdev_get_name(netdev), errormsg); - free(errormsg); + netdev_get_name(netdev), err_msg); + free(err_msg); } } }