From patchwork Thu Jan 7 11:56:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1423267 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DBPq32050z9sSC for ; Thu, 7 Jan 2021 22:57:01 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 1382F868B5; Thu, 7 Jan 2021 11:57:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yO6AIheelw0e; Thu, 7 Jan 2021 11:56:55 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id CA18D867B9; Thu, 7 Jan 2021 11:56:54 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 9F777C1DA9; Thu, 7 Jan 2021 11:56:54 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6AC79C0891 for ; Thu, 7 Jan 2021 11:56:52 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 5C672873F5 for ; Thu, 7 Jan 2021 11:56:52 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nJsf9sCv7aTA for ; Thu, 7 Jan 2021 11:56:50 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by hemlock.osuosl.org (Postfix) with ESMTPS id 70572870C7 for ; Thu, 7 Jan 2021 11:56:48 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kxTuI-0006M3-3y; Thu, 07 Jan 2021 11:56:46 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1kxTuC-0006bX-5Z; Thu, 07 Jan 2021 11:56:43 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Thu, 7 Jan 2021 11:56:33 +0000 Message-Id: <20210107115635.22782-2-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210107115635.22782-1-anton.ivanov@cambridgegreys.com> References: <20210107115635.22782-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [OVN Patch v8 1/3] ovn-libs: Add support for parallel processing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov This adds a set of functions and macros intended to process hashes in parallel. The principles of operation are documented in the fasthmap.h If these one day go into the OVS tree, the OVS tree versions would be used in preference. Signed-off-by: Anton Ivanov --- lib/automake.mk | 2 + lib/fasthmap.c | 281 ++++++++++++++++++++++++++++++++++++++++++++++++ lib/fasthmap.h | 206 +++++++++++++++++++++++++++++++++++ 3 files changed, 489 insertions(+) create mode 100644 lib/fasthmap.c create mode 100644 lib/fasthmap.h diff --git a/lib/automake.mk b/lib/automake.mk index 250c7aefa..d7e4b20cf 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \ lib/expr.c \ lib/extend-table.h \ lib/extend-table.c \ + lib/fasthmap.h \ + lib/fasthmap.c \ lib/ip-mcast-index.c \ lib/ip-mcast-index.h \ lib/mcast-group-index.c \ diff --git a/lib/fasthmap.c b/lib/fasthmap.c new file mode 100644 index 000000000..3096c90d3 --- /dev/null +++ b/lib/fasthmap.c @@ -0,0 +1,281 @@ +/* + * Copyright (c) 2020 Red Hat, Inc. + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include +#include +#include +#include "fatal-signal.h" +#include "util.h" +#include "openvswitch/vlog.h" +#include "openvswitch/hmap.h" +#include "openvswitch/thread.h" +#include "fasthmap.h" +#include "ovs-atomic.h" +#include "ovs-thread.h" +#include "ovs-numa.h" + +VLOG_DEFINE_THIS_MODULE(fasthmap); + + +static bool worker_pool_setup = false; +static bool workers_must_exit = false; +static bool can_parallelize = false; + +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools); + +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER; + +static int pool_size; + +static void worker_pool_hook(void *aux OVS_UNUSED) { + int i; + static struct worker_pool *pool; + workers_must_exit = true; /* all workers must honour this flag */ + atomic_thread_fence(memory_order_release); + LIST_FOR_EACH (pool, list_node, &worker_pools) { + for (i = 0; i < pool->size ; i++) { + sem_post(&pool->controls[i].fire); + } + } +} + +static void setup_worker_pools(void) { + int cores, nodes; + + nodes = ovs_numa_get_n_numas(); + if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) { + nodes = 1; + } + cores = ovs_numa_get_n_cores(); + + /* If there is no NUMA config, use 4 cores. + * If there is NUMA config use half the cores on + * one node so that the OS does not start pushing + * threads to other nodes. + */ + if (cores == OVS_CORE_UNSPEC || cores <= 0) { + /* If there is no NUMA we can try the ovs-threads routine. + * It falls back to sysconf and/or affinity mask. + */ + cores = count_cpu_cores(); + pool_size = cores; + } else { + pool_size = cores / nodes; + } + if (pool_size > 16) { + pool_size = 16; + } + can_parallelize = (pool_size >= 3); + fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true); + worker_pool_setup = true; +} + +bool ovn_cease_fire(void) +{ + return workers_must_exit; +} + +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){ + + struct worker_pool *new_pool = NULL; + struct worker_control *new_control; + int i; + + ovs_mutex_lock(&init_mutex); + + if (!worker_pool_setup) { + setup_worker_pools(); + } + + if (can_parallelize) { + new_pool = xmalloc(sizeof(struct worker_pool)); + new_pool->size = pool_size; + sem_init(&new_pool->done, 0, 0); + + ovs_list_push_back(&worker_pools, &new_pool->list_node); + + new_pool->controls = + xmalloc(sizeof(struct worker_control) * new_pool->size); + + for (i = 0; i < new_pool->size; i++) { + new_control = &new_pool->controls[i]; + sem_init(&new_control->fire, 0, 0); + new_control->id = i; + new_control->done = &new_pool->done; + new_control->data = NULL; + ovs_mutex_init(&new_control->mutex); + new_control->finished = ATOMIC_VAR_INIT(false); + } + + for (i = 0; i < pool_size; i++) { + ovs_thread_create("worker pool helper", start, &new_pool->controls[i]); + } + } + ovs_mutex_unlock(&init_mutex); + return new_pool; +} + + +/* Initializes 'hmap' as an empty hash table with mask N. */ +void +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask) +{ + size_t i; + + hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1)); + hmap->one = NULL; + hmap->mask = mask; + hmap->n = 0; + for (i = 0; i <= hmap->mask; i++) { + hmap->buckets[i] = NULL; + } +} + +/* Initializes 'hmap' as an empty hash table of size X. + * Intended for use in parallel processing so that all + * fragments used to store results in a parallel job + * are the same size. + */ +void +ovn_fast_hmap_size_for(struct hmap *hmap, int size) +{ + size_t mask; + mask = size / 2; + mask |= mask >> 1; + mask |= mask >> 2; + mask |= mask >> 4; + mask |= mask >> 8; + mask |= mask >> 16; +#if SIZE_MAX > UINT32_MAX + mask |= mask >> 32; +#endif + + /* If we need to dynamically allocate buckets we might as well allocate at + * least 4 of them. */ + mask |= (mask & 1) << 1; + + fast_hmap_init(hmap, mask); +} + +/* Run a thread pool which uses a callback function to process results + */ + +void ovn_run_pool_callback( + struct worker_pool *pool, + void *fin_result, + void *result_frags, + void (*helper_func)(struct worker_pool *pool, + void *fin_result, void *result_frags, int index)) +{ + int index, completed; + + atomic_thread_fence(memory_order_release); + + for (index = 0; index < pool->size; index++) { + sem_post(&pool->controls[index].fire); + } + + completed = 0; + + do { + bool test; + sem_wait(&pool->done); + for (index = 0; index < pool->size; index++) { + test = true; + if (atomic_compare_exchange_weak( + &pool->controls[index].finished, + &test, + false)) { + if (helper_func) { + (helper_func)(pool, fin_result, result_frags, index); + } + completed++; + pool->controls[index].data = NULL; + } + } + } while (completed < pool->size); +} + +/* Run a thread pool - basic, does not do results processing. + */ + +void ovn_run_pool(struct worker_pool *pool) +{ + ovn_run_pool_callback(pool, NULL, NULL, NULL); +} + +/* Brute force merge of a hashmap into another hashmap. + * Intended for use in parallel processing. The destination + * hashmap MUST be the same size as the one being merged. + * + * This can be achieved by pre-allocating them to correct size + * and using hmap_insert_fast() instead of hmap_insert() + */ + +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc) +{ + size_t i; + + ovs_assert(inc->mask == dest->mask); + + if (!inc->n) { + /* Request to merge an empty frag, nothing to do */ + return; + } + + for (i = 0; i <= dest->mask; i++) { + struct hmap_node **dest_bucket = &dest->buckets[i]; + struct hmap_node **inc_bucket = &inc->buckets[i]; + if (*inc_bucket != NULL) { + struct hmap_node *last_node = *inc_bucket; + while (last_node->next != NULL) { + last_node = last_node->next; + } + last_node->next = *dest_bucket; + *dest_bucket = *inc_bucket; + *inc_bucket = NULL; + } + } + dest->n += inc->n; + inc->n = 0; +} + +/* Run a thread pool which gathers results in an array + * of hashes. Merge results. + */ + +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED, + void *fin_result, void *result_frags, int index) +{ + struct hmap *result = (struct hmap *)fin_result; + struct hmap *res_frags = (struct hmap *)result_frags; + + fast_hmap_merge(result, &res_frags[index]); + hmap_destroy(&res_frags[index]); +} + + +void ovn_run_pool_hash( + struct worker_pool *pool, + struct hmap *result, + struct hmap *result_frags) +{ + ovn_run_pool_callback(pool, result, result_frags, merge_hash_results); +} diff --git a/lib/fasthmap.h b/lib/fasthmap.h new file mode 100644 index 000000000..2a28553d5 --- /dev/null +++ b/lib/fasthmap.h @@ -0,0 +1,206 @@ +/* + * Copyright (c) 2020 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef OVN_PARALLEL_HMAP +#define OVN_PARALLEL_HMAP 1 + +/* Process this include only if OVS does not supply parallel definitions + */ + +#ifndef OVS_HAS_PARALLEL_HMAP + +/* if the parallel macros are defined by hmap.h or any other ovs define + * we skip over the ovn specific definitions. + */ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include +#include +#include "openvswitch/util.h" +#include "openvswitch/hmap.h" +#include "openvswitch/thread.h" +#include "ovs-atomic.h" + +/* A version of the HMAP_FOR_EACH macro intended for iterating as part + * of parallel processing. + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE + * and will iterate hash buckets ThreadID, ThreadID + step, + * ThreadID + step * 2, etc. The actual macro accepts + * ThreadID + step * i as the JOBID parameter. + */ + +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \ + for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \ + (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \ + || ((NODE = NULL), false); \ + ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER)) + +/* We do not have a SAFE version of the macro, because the hash size is not + * atomic and hash removal operations would need to be wrapped with + * locks. This will defeat most of the benefits from doing anything in + * parallel. + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements, + * each thread should store them in a temporary list result instead, merging + * the lists into a combined result at the end */ + +/* Work "Handle" */ + +struct worker_control { + int id; /* Used as a modulo when iterating over a hash. */ + atomic_bool finished; /* Set to true after achunk of work is complete. */ + sem_t fire; /* Work start semaphore - sem_post starts the worker. */ + sem_t *done; /* Work completion semaphore - sem_post on completion. */ + struct ovs_mutex mutex; /* Guards the data. */ + void *data; /* Pointer to data to be processed. */ + void *workload; /* back-pointer to the worker pool structure. */ +}; + +struct worker_pool { + int size; /* Number of threads in the pool. */ + struct ovs_list list_node; /* List of pools - used in cleanup/exit. */ + struct worker_control *controls; /* "Handles" in this pool. */ + sem_t done; /* Work completion semaphorew. */ +}; + +/* Add a worker pool for thread function start() which expects a pointer to + * a worker_control structure as an argument. */ + +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)); + +/* Setting this to true will make all processing threads exit */ + +bool ovn_cease_fire(void); + +/* Build a hmap pre-sized for size elements */ + +void ovn_fast_hmap_size_for(struct hmap *hmap, int size); + +/* Build a hmap with a mask equals to size */ + +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size); + +/* Brute-force merge a hmap into hmap. + * Dest and inc have to have the same mask. The merge is performed + * by extending the element list for bucket N in the dest hmap with the list + * from bucket N in inc. + */ + +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc); + +/* Run a pool, without any default processing of results. + */ + +void ovn_run_pool(struct worker_pool *pool); + +/* Run a pool, merge results from hash frags into a final hash result. + * The hash frags must be pre-sized to the same size. + */ + +void ovn_run_pool_hash(struct worker_pool *pool, + struct hmap *result, struct hmap *result_frags); + +/* Run a pool, call a callback function to perform processing of results. + */ + +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result, + void *result_frags, + void (*helper_func)(struct worker_pool *pool, + void *fin_result, void *result_frags, int index)); + + +/* Returns the first node in 'hmap' in the bucket in which the given 'hash' + * would land, or a null pointer if that bucket is empty. */ + +static inline struct hmap_node * +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num) +{ + return hmap->buckets[num]; +} + +static inline struct hmap_node * +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size) +{ + size_t i; + for (i = start; i <= hmap->mask; i+= pool_size) { + struct hmap_node *node = hmap->buckets[i]; + if (node) { + return node; + } + } + return NULL; +} + +/* Returns the first node in 'hmap', as expected by thread with job_id + * for parallel processing in arbitrary order, or a null pointer if + * the slice of 'hmap' for that job_id is empty. */ +static inline struct hmap_node * +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size) +{ + return parallel_hmap_next__(hmap, job_id, pool_size); +} + +/* Returns the next node in the slice of 'hmap' following 'node', + * in arbitrary order, or a * null pointer if 'node' is the last node in + * the 'hmap' slice. + * + */ +static inline struct hmap_node * +parallel_hmap_next(const struct hmap *hmap, + const struct hmap_node *node, ssize_t pool_size) +{ + return (node->next + ? node->next + : parallel_hmap_next__(hmap, + (node->hash & hmap->mask) + pool_size, pool_size)); +} + +/* Use the OVN library functions for stuff which OVS has not defined + * If OVS has defined these, they will still compile using the OVN + * local names, but will be dropped by the linker in favour of the OVS + * supplied functions. + */ + +#define cease_fire() ovn_cease_fire() + +#define add_worker_pool(start) ovn_add_worker_pool(start) + +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size) + +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size) + +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc) + +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc) + +#define ovn_run_pool(pool) ovn_run_pool(pool) + +#define run_pool_hash(pool, result, result_frags) \ + ovn_run_pool_hash(pool, result, result_frags) + +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \ + ovn_run_pool_callback(pool, fin_result, result_frags, helper_func) + +#ifdef __cplusplus +} +#endif + +#endif + +#endif /* lib/fasthmap.h */ From patchwork Thu Jan 7 11:56:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1423265 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DBPpw4lSpz9sVm for ; Thu, 7 Jan 2021 22:56:56 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 3E65186B03; Thu, 7 Jan 2021 11:56:55 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mRcInwvqiwGK; Thu, 7 Jan 2021 11:56:53 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id A3F6986A74; Thu, 7 Jan 2021 11:56:53 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 880F5C1E72; Thu, 7 Jan 2021 11:56:53 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 68CA4C013A for ; Thu, 7 Jan 2021 11:56:52 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 0FCEA204F0 for ; Thu, 7 Jan 2021 11:56:52 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cc8B9IAlNwlw for ; Thu, 7 Jan 2021 11:56:50 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by silver.osuosl.org (Postfix) with ESMTPS id D29A2204F1 for ; Thu, 7 Jan 2021 11:56:49 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kxTuK-0006M9-2z; Thu, 07 Jan 2021 11:56:48 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1kxTuF-0006bX-GI; Thu, 07 Jan 2021 11:56:46 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Thu, 7 Jan 2021 11:56:34 +0000 Message-Id: <20210107115635.22782-3-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210107115635.22782-1-anton.ivanov@cambridgegreys.com> References: <20210107115635.22782-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [OVN Patch v8 2/3] ovn-northd: Introduce parallel lflow build X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Datapaths, ports, igmp groups and load balancers can now be iterated over in parallel in order to speed up the lflow generation. This decreases the time needed to generate the logical flows by a factor of 4+ on a 6 core/12 thread CPU without datapath groups - from 0.8-1 microseconds per flow down to 0.2-0.3 microseconds per flow on average. The decrease in time to compute lflows with datapath groups enabled is ~2 times for the same hardware - from an average of 2.4 microseconds per flow to 1.2 microseconds per flow. Tested for on an 8 node, 400 pod K8 simulation resulting in > 6K flows. Signed-off-by: Anton Ivanov --- lib/fasthmap.c | 3 + northd/ovn-northd.c | 301 +++++++++++++++++++++++++++++++++++++------- 2 files changed, 262 insertions(+), 42 deletions(-) diff --git a/lib/fasthmap.c b/lib/fasthmap.c index 3096c90d3..e70ac4553 100644 --- a/lib/fasthmap.c +++ b/lib/fasthmap.c @@ -33,6 +33,7 @@ VLOG_DEFINE_THIS_MODULE(fasthmap); +#ifndef OVS_HAS_PARALLEL_HMAP static bool worker_pool_setup = false; static bool workers_must_exit = false; @@ -279,3 +280,5 @@ void ovn_run_pool_hash( { ovn_run_pool_callback(pool, result, result_frags, merge_hash_results); } + +#endif diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index cd13d9fdf..7bf19a5f5 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -37,6 +37,7 @@ #include "lib/ovn-sb-idl.h" #include "lib/ovn-util.h" #include "lib/lb.h" +#include "lib/fasthmap.h" #include "ovn/actions.h" #include "ovn/logical-fields.h" #include "packets.h" @@ -4173,6 +4174,8 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od, * logical datapath only by creating a datapah group. */ static bool use_logical_dp_groups = false; +static struct ovs_mutex *slice_locks = NULL; + /* Adds a row with the specified contents to the Logical_Flow table. */ static void ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od, @@ -4195,16 +4198,25 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od, hash = ovn_lflow_hash(lflow); if (shared && use_logical_dp_groups) { + if (slice_locks) { + ovs_mutex_lock(&slice_locks[hash % lflow_map->mask]); + } old_lflow = ovn_lflow_find_by_lflow(lflow_map, lflow, hash); if (old_lflow) { ovn_lflow_destroy(NULL, lflow); hmapx_add(&old_lflow->od_group, od); + if (slice_locks) { + ovs_mutex_unlock(&slice_locks[hash % lflow_map->mask]); + } return; } } hmapx_add(&lflow->od_group, od); - hmap_insert(lflow_map, &lflow->hmap_node, hash); + hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); + if (shared && use_logical_dp_groups && slice_locks) { + ovs_mutex_unlock(&slice_locks[hash % lflow_map->mask]); + } } /* Adds a row with the specified contents to the Logical_Flow table. */ @@ -11400,6 +11412,127 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, &lsi->match, &lsi->actions); } +struct lflows_thread_pool { + struct worker_pool *pool; +}; + +static void *build_lflows_thread(void *arg) { + struct worker_control *control = (struct worker_control *) arg; + struct lflows_thread_pool *workload; + struct lswitch_flow_build_info *lsi; + + struct ovn_datapath *od; + struct ovn_port *op; + struct ovn_northd_lb *lb; + struct ovn_igmp_group *igmp_group; + int bnum; + + while (!cease_fire()) { + sem_wait(&control->fire); + workload = (struct lflows_thread_pool *) control->workload; + lsi = (struct lswitch_flow_build_info *) control->data; + if (lsi && workload) { + /* Iterate over bucket ThreadID, ThreadID+size, ... */ + for (bnum = control->id; + bnum <= lsi->datapaths->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + od, key_node, bnum, lsi->datapaths) { + if (cease_fire()) { + return NULL; + } + build_lswitch_and_lrouter_iterate_by_od(od, lsi); + } + } + for (bnum = control->id; + bnum <= lsi->ports->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + op, key_node, bnum, lsi->ports) { + if (cease_fire()) { + return NULL; + } + build_lswitch_and_lrouter_iterate_by_op(op, lsi); + } + } + for (bnum = control->id; + bnum <= lsi->lbs->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + lb, hmap_node, bnum, lsi->lbs) { + if (cease_fire()) { + return NULL; + } + build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, + &lsi->match, + &lsi->actions); + } + } + for (bnum = control->id; + bnum <= lsi->igmp_groups->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + igmp_group, hmap_node, bnum, lsi->igmp_groups) { + if (cease_fire()) { + return NULL; + } + build_lswitch_ip_mcast_igmp_mld(igmp_group, lsi->lflows, + &lsi->match, + &lsi->actions); + } + } + atomic_store_relaxed(&control->finished, true); + atomic_thread_fence(memory_order_release); + } + sem_post(control->done); + } + return NULL; +} + +static bool pool_init_done = false; +static struct lflows_thread_pool *build_lflows_pool = NULL; + +static void init_lflows_thread_pool(void) +{ + int index; + + if (!pool_init_done) { + struct worker_pool *pool = add_worker_pool(build_lflows_thread); + pool_init_done = true; + if (pool) { + build_lflows_pool = + xmalloc(sizeof(struct lflows_thread_pool)); + build_lflows_pool->pool = pool; + for (index = 0; index < build_lflows_pool->pool->size; index++) { + build_lflows_pool->pool->controls[index].workload = + build_lflows_pool; + } + } + } +} + +/* TODO: replace hard cutoffs by configurable via commands. These are + * temporary defines to determine single-thread to multi-thread processing + * cutoff. + * Setting to 1 forces "all parallel" lflow build. + */ + +static void +noop_callback(struct worker_pool *pool OVS_UNUSED, + void *fin_result OVS_UNUSED, + void *result_frags OVS_UNUSED, + int index OVS_UNUSED) +{ + /* Do nothing */ +} + + +static bool use_parallel_build = true; + static void build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *port_groups, struct hmap *lflows, @@ -11407,52 +11540,109 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *igmp_groups, struct shash *meter_groups, struct hmap *lbs) { - struct ovn_datapath *od; - struct ovn_port *op; - struct ovn_northd_lb *lb; - struct ovn_igmp_group *igmp_group; char *svc_check_match = xasprintf("eth.dst == %s", svc_monitor_mac); - struct lswitch_flow_build_info lsi = { - .datapaths = datapaths, - .ports = ports, - .port_groups = port_groups, - .lflows = lflows, - .mcgroups = mcgroups, - .igmp_groups = igmp_groups, - .meter_groups = meter_groups, - .lbs = lbs, - .svc_check_match = svc_check_match, - .match = DS_EMPTY_INITIALIZER, - .actions = DS_EMPTY_INITIALIZER, - }; + if (use_parallel_build) { + init_lflows_thread_pool(); + struct hmap *lflow_segs; + struct lswitch_flow_build_info *lsiv; + int index; + + lsiv = xmalloc( + sizeof(struct lswitch_flow_build_info) * + build_lflows_pool->pool->size); + if (use_logical_dp_groups) { + lflow_segs = NULL; + } else { + lflow_segs = xmalloc( + sizeof(struct hmap) * build_lflows_pool->pool->size); + } - /* Combined build - all lflow generation from lswitch and lrouter - * will move here and will be reogranized by iterator type. - */ - HMAP_FOR_EACH (od, key_node, datapaths) { - build_lswitch_and_lrouter_iterate_by_od(od, &lsi); - } - HMAP_FOR_EACH (op, key_node, ports) { - build_lswitch_and_lrouter_iterate_by_op(op, &lsi); - } - HMAP_FOR_EACH (lb, hmap_node, lbs) { - build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, - &lsi.actions, - &lsi.match); - } - HMAP_FOR_EACH (igmp_group, hmap_node, igmp_groups) { - build_lswitch_ip_mcast_igmp_mld(igmp_group, - lsi.lflows, - &lsi.actions, - &lsi.match); - } - free(svc_check_match); + /* Set up "work chunks" for each thread to work on. */ + + for (index = 0; index < build_lflows_pool->pool->size; index++) { + if (use_logical_dp_groups) { + /* if dp_groups are in use we lock a shared lflows hash + * on a per-bucket level instead of merging hash frags */ + lsiv[index].lflows = lflows; + } else { + fast_hmap_init(&lflow_segs[index], lflows->mask); + lsiv[index].lflows = &lflow_segs[index]; + } + + lsiv[index].datapaths = datapaths; + lsiv[index].ports = ports; + lsiv[index].port_groups = port_groups; + lsiv[index].mcgroups = mcgroups; + lsiv[index].igmp_groups = igmp_groups; + lsiv[index].meter_groups = meter_groups; + lsiv[index].lbs = lbs; + lsiv[index].svc_check_match = svc_check_match; + ds_init(&lsiv[index].match); + ds_init(&lsiv[index].actions); + + build_lflows_pool->pool->controls[index].data = &lsiv[index]; + } - ds_destroy(&lsi.match); - ds_destroy(&lsi.actions); + /* Run thread pool. */ + if (use_logical_dp_groups) { + run_pool_callback(build_lflows_pool->pool, NULL, NULL, noop_callback); + } else { + run_pool_hash(build_lflows_pool->pool, lflows, lflow_segs); + } + for (index = 0; index < build_lflows_pool->pool->size; index++) { + ds_destroy(&lsiv[index].match); + ds_destroy(&lsiv[index].actions); + } + free(lflow_segs); + free(lsiv); + } else { + struct ovn_datapath *od; + struct ovn_port *op; + struct ovn_northd_lb *lb; + struct ovn_igmp_group *igmp_group; + struct lswitch_flow_build_info lsi = { + .datapaths = datapaths, + .ports = ports, + .port_groups = port_groups, + .lflows = lflows, + .mcgroups = mcgroups, + .igmp_groups = igmp_groups, + .meter_groups = meter_groups, + .lbs = lbs, + .svc_check_match = svc_check_match, + .match = DS_EMPTY_INITIALIZER, + .actions = DS_EMPTY_INITIALIZER, + }; + + /* Combined build - all lflow generation from lswitch and lrouter + * will move here and will be reogranized by iterator type. + */ + HMAP_FOR_EACH (od, key_node, datapaths) { + build_lswitch_and_lrouter_iterate_by_od(od, &lsi); + } + HMAP_FOR_EACH (op, key_node, ports) { + build_lswitch_and_lrouter_iterate_by_op(op, &lsi); + } + HMAP_FOR_EACH (lb, hmap_node, lbs) { + build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, + &lsi.actions, + &lsi.match); + } + HMAP_FOR_EACH (igmp_group, hmap_node, igmp_groups) { + build_lswitch_ip_mcast_igmp_mld(igmp_group, + lsi.lflows, + &lsi.actions, + &lsi.match); + } + + ds_destroy(&lsi.match); + ds_destroy(&lsi.actions); + } + + free(svc_check_match); build_lswitch_flows(datapaths, lflows); } @@ -11523,6 +11713,25 @@ ovn_sb_set_lflow_logical_dp_group( sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group); } +static ssize_t max_seen_lflow_size = 128; + +static ssize_t recent_lflow_map_mask = 0; + +static void update_lock_array(struct hmap *lflows) +{ + int i; + if (recent_lflow_map_mask != lflows->mask) { + if (slice_locks) { + free(slice_locks); + } + slice_locks = calloc(sizeof(struct ovs_mutex), lflows->mask + 1); + recent_lflow_map_mask = lflows->mask; + for (i = 0; i <= lflows->mask; i++) { + ovs_mutex_init(&slice_locks[i]); + } + } +} + /* Updates the Logical_Flow and Multicast_Group tables in the OVN_SB database, * constructing their contents based on the OVN_NB database. */ static void @@ -11532,12 +11741,20 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, struct shash *meter_groups, struct hmap *lbs) { - struct hmap lflows = HMAP_INITIALIZER(&lflows); + struct hmap lflows; + fast_hmap_size_for(&lflows, max_seen_lflow_size); + if (use_parallel_build) { + update_lock_array(&lflows); + } build_lswitch_and_lrouter_flows(datapaths, ports, port_groups, &lflows, mcgroups, igmp_groups, meter_groups, lbs); + if (hmap_count(&lflows) > max_seen_lflow_size) { + max_seen_lflow_size = hmap_count(&lflows); + } + /* Collecting all unique datapath groups. */ struct hmap dp_groups = HMAP_INITIALIZER(&dp_groups); struct hmapx single_dp_lflows = HMAPX_INITIALIZER(&single_dp_lflows); From patchwork Thu Jan 7 11:56:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1423266 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DBPq16Lcbz9sSC for ; Thu, 7 Jan 2021 22:57:01 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 3510986ADB; Thu, 7 Jan 2021 11:57:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BIxOd0jlNk3r; Thu, 7 Jan 2021 11:56:56 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id BA1AE86AA3; Thu, 7 Jan 2021 11:56:56 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 968E6C0FA8; Thu, 7 Jan 2021 11:56:56 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 506A0C0891 for ; Thu, 7 Jan 2021 11:56:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 33DAB204F0 for ; Thu, 7 Jan 2021 11:56:53 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9CQ9ssTBaT7V for ; Thu, 7 Jan 2021 11:56:52 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by silver.osuosl.org (Postfix) with ESMTPS id DDFA6233A6 for ; Thu, 7 Jan 2021 11:56:51 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kxTuM-0006MF-C1; Thu, 07 Jan 2021 11:56:50 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1kxTuI-0006bX-Cw; Thu, 07 Jan 2021 11:56:48 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Thu, 7 Jan 2021 11:56:35 +0000 Message-Id: <20210107115635.22782-4-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210107115635.22782-1-anton.ivanov@cambridgegreys.com> References: <20210107115635.22782-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [OVN Patch v8 3/3] ovn-northd: Add configuration option for parallel lflow build X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 2 ++ ovn-nb.xml | 13 +++++++++++++ 2 files changed, 15 insertions(+) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 7bf19a5f5..13217b04c 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -12743,6 +12743,8 @@ ovnnb_db_run(struct northd_context *ctx, northd_probe_interval_nb = get_probe_interval(ovnnb_db, nb); northd_probe_interval_sb = get_probe_interval(ovnsb_db, nb); + use_parallel_build = smap_get_bool(&nb->options, + "use_parallel_build", false); use_logical_dp_groups = smap_get_bool(&nb->options, "use_logical_dp_groups", false); controller_event_en = smap_get_bool(&nb->options, diff --git a/ovn-nb.xml b/ovn-nb.xml index ec6405ff5..e2f5f6cd1 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -212,6 +212,19 @@ The default value is false.

+ +

+ If set to true, ovn-northd will attempt + to compute logical flows in parallel. +

+

+ Parallel computation is enabled only if the system has 4 or more + cores/threads available to be used by ovn-northd. +

+

+ The default value is false. +

+