diff mbox series

[ovs-dev,v15,1/3] ovn-libs: Add support for parallel processing

Message ID 20210301130427.21069-1-anton.ivanov@cambridgegreys.com
State Changes Requested
Headers show
Series [ovs-dev,v15,1/3] ovn-libs: Add support for parallel processing | expand

Commit Message

Anton Ivanov March 1, 2021, 1:04 p.m. UTC
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

This adds a set of functions and macros intended to process
hashes in parallel.

The principles of operation are documented in the ovn-parallel-hmap.h

If these one day go into the OVS tree, the OVS tree versions
would be used in preference.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 lib/automake.mk         |   2 +
 lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
 lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
 3 files changed, 758 insertions(+)
 create mode 100644 lib/ovn-parallel-hmap.c
 create mode 100644 lib/ovn-parallel-hmap.h

Comments

0-day Robot March 1, 2021, 1:58 p.m. UTC | #1
Bleep bloop.  Greetings Anton Ivanov, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line has trailing whitespace
#466 FILE: lib/ovn-parallel-hmap.c:425:
    } 

ERROR: Improper whitespace around control block
#563 FILE: lib/ovn-parallel-hmap.h:61:
#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \

WARNING: Line has trailing whitespace
#752 FILE: lib/ovn-parallel-hmap.h:250:
    hrl->row_locks = NULL;   

Lines checked: 806, Warnings: 2, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Numan Siddique March 24, 2021, 3:31 p.m. UTC | #2
On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>
> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>
> This adds a set of functions and macros intended to process
> hashes in parallel.
>
> The principles of operation are documented in the ovn-parallel-hmap.h
>
> If these one day go into the OVS tree, the OVS tree versions
> would be used in preference.
>
> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Hi Anton,

I tested the first 2 patches of this series and it crashes again for me.

This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
E3-1220 v5 @ 3.00GHz

The below trace is seen for both gcc and clang.

----
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ovn-northd -vjsonrpc
--ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
/lib64/libpthread.so.0
[Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
zlib-1.2.11-23.fc33.x86_64
(gdb) bt
#0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
/lib64/libpthread.so.0
#1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
../lib/ovn-parallel-hmap.h:203
#2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
#3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
../lib/ovs-thread.c:383
#4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f2759142903 in clone () from /lib64/libc.so.6
-----

I'm not sure why you're not able to reproduce this issue.

All the test cases passed for me. So maybe something's wrong when
ovn-northd exits.
IMHO, these crashes should be addressed before these patches can be considered.

Thanks
Numan

> ---
>  lib/automake.mk         |   2 +
>  lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>  lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>  3 files changed, 758 insertions(+)
>  create mode 100644 lib/ovn-parallel-hmap.c
>  create mode 100644 lib/ovn-parallel-hmap.h
>
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 250c7aefa..781be2109 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>         lib/expr.c \
>         lib/extend-table.h \
>         lib/extend-table.c \
> +       lib/ovn-parallel-hmap.h \
> +       lib/ovn-parallel-hmap.c \
>         lib/ip-mcast-index.c \
>         lib/ip-mcast-index.h \
>         lib/mcast-group-index.c \
> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
> new file mode 100644
> index 000000000..e83ae23cb
> --- /dev/null
> +++ b/lib/ovn-parallel-hmap.c
> @@ -0,0 +1,455 @@
> +/*
> + * Copyright (c) 2020 Red Hat, Inc.
> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +#include <stdint.h>
> +#include <string.h>
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <semaphore.h>
> +#include "fatal-signal.h"
> +#include "util.h"
> +#include "openvswitch/vlog.h"
> +#include "openvswitch/hmap.h"
> +#include "openvswitch/thread.h"
> +#include "ovn-parallel-hmap.h"
> +#include "ovs-atomic.h"
> +#include "ovs-thread.h"
> +#include "ovs-numa.h"
> +#include "random.h"
> +
> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
> +
> +#ifndef OVS_HAS_PARALLEL_HMAP
> +
> +#define WORKER_SEM_NAME "%x-%p-%x"
> +#define MAIN_SEM_NAME "%x-%p-main"
> +
> +/* These are accessed under mutex inside add_worker_pool().
> + * They do not need to be atomic.
> + */
> +
> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
> +static bool can_parallelize = false;
> +
> +/* This is set only in the process of exit and the set is
> + * accompanied by a fence. It does not need to be atomic or be
> + * accessed under a lock.
> + */
> +
> +static bool workers_must_exit = false;
> +
> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
> +
> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
> +
> +static int pool_size;
> +
> +static int sembase;
> +
> +static void worker_pool_hook(void *aux OVS_UNUSED);
> +static void setup_worker_pools(bool force);
> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
> +                               void *fin_result, void *result_frags,
> +                               int index);
> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
> +                               void *fin_result, void *result_frags,
> +                               int index);
> +
> +bool ovn_stop_parallel_processing(void)
> +{
> +    return workers_must_exit;
> +}
> +
> +bool ovn_can_parallelize_hashes(bool force_parallel)
> +{
> +    bool test = false;
> +
> +    if (atomic_compare_exchange_strong(
> +            &initial_pool_setup,
> +            &test,
> +            true)) {
> +        ovs_mutex_lock(&init_mutex);
> +        setup_worker_pools(force_parallel);
> +        ovs_mutex_unlock(&init_mutex);
> +    }
> +    return can_parallelize;
> +}
> +
> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
> +
> +    struct worker_pool *new_pool = NULL;
> +    struct worker_control *new_control;
> +    bool test = false;
> +    int i;
> +    char sem_name[256];
> +
> +
> +    /* Belt and braces - initialize the pool system just in case if
> +     * if it is not yet initialized.
> +     */
> +
> +    if (atomic_compare_exchange_strong(
> +            &initial_pool_setup,
> +            &test,
> +            true)) {
> +        ovs_mutex_lock(&init_mutex);
> +        setup_worker_pools(false);
> +        ovs_mutex_unlock(&init_mutex);
> +    }
> +
> +    ovs_mutex_lock(&init_mutex);
> +    if (can_parallelize) {
> +        new_pool = xmalloc(sizeof(struct worker_pool));
> +        new_pool->size = pool_size;
> +        new_pool->controls = NULL;
> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
> +        if (new_pool->done == SEM_FAILED) {
> +            goto cleanup;
> +        }
> +
> +        new_pool->controls =
> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
> +
> +        for (i = 0; i < new_pool->size; i++) {
> +            new_control = &new_pool->controls[i];
> +            new_control->id = i;
> +            new_control->done = new_pool->done;
> +            new_control->data = NULL;
> +            ovs_mutex_init(&new_control->mutex);
> +            new_control->finished = ATOMIC_VAR_INIT(false);
> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
> +            if (new_control->fire == SEM_FAILED) {
> +                goto cleanup;
> +            }
> +        }
> +
> +        for (i = 0; i < pool_size; i++) {
> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
> +        }
> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
> +    }
> +    ovs_mutex_unlock(&init_mutex);
> +    return new_pool;
> +cleanup:
> +
> +    /* Something went wrong when opening semaphores. In this case
> +     * it is better to shut off parallel procesing altogether
> +     */
> +
> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
> +    can_parallelize = false;
> +    if (new_pool->controls) {
> +        for (i = 0; i < new_pool->size; i++) {
> +            if (new_pool->controls[i].fire != SEM_FAILED) {
> +                sem_close(new_pool->controls[i].fire);
> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
> +                sem_unlink(sem_name);
> +                break; /* semaphores past this one are uninitialized */
> +            }
> +        }
> +    }
> +    if (new_pool->done != SEM_FAILED) {
> +        sem_close(new_pool->done);
> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
> +        sem_unlink(sem_name);
> +    }
> +    ovs_mutex_unlock(&init_mutex);
> +    return NULL;
> +}
> +
> +
> +/* Initializes 'hmap' as an empty hash table with mask N. */
> +void
> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
> +{
> +    size_t i;
> +
> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
> +    hmap->one = NULL;
> +    hmap->mask = mask;
> +    hmap->n = 0;
> +    for (i = 0; i <= hmap->mask; i++) {
> +        hmap->buckets[i] = NULL;
> +    }
> +}
> +
> +/* Initializes 'hmap' as an empty hash table of size X.
> + * Intended for use in parallel processing so that all
> + * fragments used to store results in a parallel job
> + * are the same size.
> + */
> +void
> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
> +{
> +    size_t mask;
> +    mask = size / 2;
> +    mask |= mask >> 1;
> +    mask |= mask >> 2;
> +    mask |= mask >> 4;
> +    mask |= mask >> 8;
> +    mask |= mask >> 16;
> +#if SIZE_MAX > UINT32_MAX
> +    mask |= mask >> 32;
> +#endif
> +
> +    /* If we need to dynamically allocate buckets we might as well allocate at
> +     * least 4 of them. */
> +    mask |= (mask & 1) << 1;
> +
> +    fast_hmap_init(hmap, mask);
> +}
> +
> +/* Run a thread pool which uses a callback function to process results
> + */
> +
> +void ovn_run_pool_callback(struct worker_pool *pool,
> +                           void *fin_result, void *result_frags,
> +                           void (*helper_func)(struct worker_pool *pool,
> +                                               void *fin_result,
> +                                               void *result_frags, int index))
> +{
> +    int index, completed;
> +
> +    /* Ensure that all worker threads see the same data as the
> +     * main thread.
> +     */
> +
> +    atomic_thread_fence(memory_order_acq_rel);
> +
> +    /* Start workers */
> +
> +    for (index = 0; index < pool->size; index++) {
> +        sem_post(pool->controls[index].fire);
> +    }
> +
> +    completed = 0;
> +
> +    do {
> +        bool test;
> +        /* Note - we do not loop on semaphore until it reaches
> +         * zero, but on pool size/remaining workers.
> +         * This is by design. If the inner loop can handle
> +         * completion for more than one worker within an iteration
> +         * it will do so to ensure no additional iterations and
> +         * waits once all of them are done.
> +         *
> +         * This may result in us having an initial positive value
> +         * of the semaphore when the pool is invoked the next time.
> +         * This is harmless - the loop will spin up a couple of times
> +         * doing nothing while the workers are processing their data
> +         * slices.
> +         */
> +        wait_for_work_completion(pool);
> +        for (index = 0; index < pool->size; index++) {
> +            test = true;
> +            /* If the worker has marked its data chunk as complete,
> +             * invoke the helper function to combine the results of
> +             * this worker into the main result.
> +             *
> +             * The worker must invoke an appropriate memory fence
> +             * (most likely acq_rel) to ensure that the main thread
> +             * sees all of the results produced by the worker.
> +             */
> +            if (atomic_compare_exchange_weak(
> +                    &pool->controls[index].finished,
> +                    &test,
> +                    false)) {
> +                if (helper_func) {
> +                    (helper_func)(pool, fin_result, result_frags, index);
> +                }
> +                completed++;
> +                pool->controls[index].data = NULL;
> +            }
> +        }
> +    } while (completed < pool->size);
> +}
> +
> +/* Run a thread pool - basic, does not do results processing.
> + */
> +
> +void ovn_run_pool(struct worker_pool *pool)
> +{
> +    run_pool_callback(pool, NULL, NULL, NULL);
> +}
> +
> +/* Brute force merge of a hashmap into another hashmap.
> + * Intended for use in parallel processing. The destination
> + * hashmap MUST be the same size as the one being merged.
> + *
> + * This can be achieved by pre-allocating them to correct size
> + * and using hmap_insert_fast() instead of hmap_insert()
> + */
> +
> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
> +{
> +    size_t i;
> +
> +    ovs_assert(inc->mask == dest->mask);
> +
> +    if (!inc->n) {
> +        /* Request to merge an empty frag, nothing to do */
> +        return;
> +    }
> +
> +    for (i = 0; i <= dest->mask; i++) {
> +        struct hmap_node **dest_bucket = &dest->buckets[i];
> +        struct hmap_node **inc_bucket = &inc->buckets[i];
> +        if (*inc_bucket != NULL) {
> +            struct hmap_node *last_node = *inc_bucket;
> +            while (last_node->next != NULL) {
> +                last_node = last_node->next;
> +            }
> +            last_node->next = *dest_bucket;
> +            *dest_bucket = *inc_bucket;
> +            *inc_bucket = NULL;
> +        }
> +    }
> +    dest->n += inc->n;
> +    inc->n = 0;
> +}
> +
> +/* Run a thread pool which gathers results in an array
> + * of hashes. Merge results.
> + */
> +
> +
> +void ovn_run_pool_hash(
> +        struct worker_pool *pool,
> +        struct hmap *result,
> +        struct hmap *result_frags)
> +{
> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
> +}
> +
> +/* Run a thread pool which gathers results in an array of lists.
> + * Merge results.
> + */
> +void ovn_run_pool_list(
> +        struct worker_pool *pool,
> +        struct ovs_list *result,
> +        struct ovs_list *result_frags)
> +{
> +    run_pool_callback(pool, result, result_frags, merge_list_results);
> +}
> +
> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
> +{
> +    int i;
> +    if (hrl->mask != lflows->mask) {
> +        if (hrl->row_locks) {
> +            free(hrl->row_locks);
> +        }
> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
> +        hrl->mask = lflows->mask;
> +        for (i = 0; i <= lflows->mask; i++) {
> +            ovs_mutex_init(&hrl->row_locks[i]);
> +        }
> +    }
> +}
> +
> +static void worker_pool_hook(void *aux OVS_UNUSED) {
> +    int i;
> +    static struct worker_pool *pool;
> +    char sem_name[256];
> +
> +    workers_must_exit = true;
> +
> +    /* All workers must honour the must_exit flag and check for it regularly.
> +     * We can make it atomic and check it via atomics in workers, but that
> +     * is not really necessary as it is set just once - when the program
> +     * terminates. So we use a fence which is invoked before exiting instead.
> +     */
> +    atomic_thread_fence(memory_order_acq_rel);
> +
> +    /* Wake up the workers after the must_exit flag has been set */
> +
> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
> +        for (i = 0; i < pool->size ; i++) {
> +            sem_post(pool->controls[i].fire);
> +        }
> +        for (i = 0; i < pool->size ; i++) {
> +            sem_close(pool->controls[i].fire);
> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
> +            sem_unlink(sem_name);
> +        }
> +        sem_close(pool->done);
> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
> +        sem_unlink(sem_name);
> +    }
> +}
> +
> +static void setup_worker_pools(bool force) {
> +    int cores, nodes;
> +
> +    nodes = ovs_numa_get_n_numas();
> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
> +        nodes = 1;
> +    }
> +    cores = ovs_numa_get_n_cores();
> +
> +    /* If there is no NUMA config, use 4 cores.
> +     * If there is NUMA config use half the cores on
> +     * one node so that the OS does not start pushing
> +     * threads to other nodes.
> +     */
> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
> +        /* If there is no NUMA we can try the ovs-threads routine.
> +         * It falls back to sysconf and/or affinity mask.
> +         */
> +        cores = count_cpu_cores();
> +        pool_size = cores;
> +    } else {
> +        pool_size = cores / nodes;
> +    }
> +    if ((pool_size < 4) && force) {
> +        pool_size = 4;
> +    }
> +    can_parallelize = (pool_size >= 3);
> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
> +    sembase = random_uint32();
> +}
> +
> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
> +                               void *fin_result, void *result_frags,
> +                               int index)
> +{
> +    struct ovs_list *result = (struct ovs_list *)fin_result;
> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
> +
> +    if (!ovs_list_is_empty(&res_frags[index])) {
> +        ovs_list_splice(result->next,
> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
> +    }
> +}
> +
> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
> +                               void *fin_result, void *result_frags,
> +                               int index)
> +{
> +    struct hmap *result = (struct hmap *)fin_result;
> +    struct hmap *res_frags = (struct hmap *)result_frags;
> +
> +    fast_hmap_merge(result, &res_frags[index]);
> +    hmap_destroy(&res_frags[index]);
> +}
> +
> +#endif
> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
> new file mode 100644
> index 000000000..8db61eaba
> --- /dev/null
> +++ b/lib/ovn-parallel-hmap.h
> @@ -0,0 +1,301 @@
> +/*
> + * Copyright (c) 2020 Red Hat, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef OVN_PARALLEL_HMAP
> +#define OVN_PARALLEL_HMAP 1
> +
> +/* if the parallel macros are defined by hmap.h or any other ovs define
> + * we skip over the ovn specific definitions.
> + */
> +
> +#ifdef  __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdbool.h>
> +#include <stdlib.h>
> +#include <semaphore.h>
> +#include <errno.h>
> +#include "openvswitch/util.h"
> +#include "openvswitch/hmap.h"
> +#include "openvswitch/thread.h"
> +#include "ovs-atomic.h"
> +
> +/* Process this include only if OVS does not supply parallel definitions
> + */
> +
> +#ifdef OVS_HAS_PARALLEL_HMAP
> +
> +#include "parallel-hmap.h"
> +
> +#else
> +
> +
> +#ifdef __clang__
> +#pragma clang diagnostic push
> +#pragma clang diagnostic ignored "-Wthread-safety"
> +#endif
> +
> +
> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
> + * of parallel processing.
> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
> + * and will iterate hash buckets ThreadID, ThreadID + step,
> + * ThreadID + step * 2, etc. The actual macro accepts
> + * ThreadID + step * i as the JOBID parameter.
> + */
> +
> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
> +       || ((NODE = NULL), false); \
> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
> +
> +/* We do not have a SAFE version of the macro, because the hash size is not
> + * atomic and hash removal operations would need to be wrapped with
> + * locks. This will defeat most of the benefits from doing anything in
> + * parallel.
> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
> + * each thread should store them in a temporary list result instead, merging
> + * the lists into a combined result at the end */
> +
> +/* Work "Handle" */
> +
> +struct worker_control {
> +    int id; /* Used as a modulo when iterating over a hash. */
> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
> +    struct ovs_mutex mutex; /* Guards the data. */
> +    void *data; /* Pointer to data to be processed. */
> +    void *workload; /* back-pointer to the worker pool structure. */
> +};
> +
> +struct worker_pool {
> +    int size;   /* Number of threads in the pool. */
> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
> +    struct worker_control *controls; /* "Handles" in this pool. */
> +    sem_t *done; /* Work completion semaphorew. */
> +};
> +
> +/* Add a worker pool for thread function start() which expects a pointer to
> + * a worker_control structure as an argument. */
> +
> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
> +
> +/* Setting this to true will make all processing threads exit */
> +
> +bool ovn_stop_parallel_processing(void);
> +
> +/* Build a hmap pre-sized for size elements */
> +
> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
> +
> +/* Build a hmap with a mask equals to size */
> +
> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
> +
> +/* Brute-force merge a hmap into hmap.
> + * Dest and inc have to have the same mask. The merge is performed
> + * by extending the element list for bucket N in the dest hmap with the list
> + * from bucket N in inc.
> + */
> +
> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
> +
> +/* Run a pool, without any default processing of results.
> + */
> +
> +void ovn_run_pool(struct worker_pool *pool);
> +
> +/* Run a pool, merge results from hash frags into a final hash result.
> + * The hash frags must be pre-sized to the same size.
> + */
> +
> +void ovn_run_pool_hash(struct worker_pool *pool,
> +                       struct hmap *result, struct hmap *result_frags);
> +/* Run a pool, merge results from list frags into a final list result.
> + */
> +
> +void ovn_run_pool_list(struct worker_pool *pool,
> +                       struct ovs_list *result, struct ovs_list *result_frags);
> +
> +/* Run a pool, call a callback function to perform processing of results.
> + */
> +
> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
> +                    void *result_frags,
> +                    void (*helper_func)(struct worker_pool *pool,
> +                        void *fin_result, void *result_frags, int index));
> +
> +
> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
> + * would land, or a null pointer if that bucket is empty. */
> +
> +static inline struct hmap_node *
> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
> +{
> +    return hmap->buckets[num];
> +}
> +
> +static inline struct hmap_node *
> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
> +{
> +    size_t i;
> +    for (i = start; i <= hmap->mask; i+= pool_size) {
> +        struct hmap_node *node = hmap->buckets[i];
> +        if (node) {
> +            return node;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +/* Returns the first node in 'hmap', as expected by thread with job_id
> + * for parallel processing in arbitrary order, or a null pointer if
> + * the slice of 'hmap' for that job_id is empty. */
> +static inline struct hmap_node *
> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
> +{
> +    return parallel_hmap_next__(hmap, job_id, pool_size);
> +}
> +
> +/* Returns the next node in the slice of 'hmap' following 'node',
> + * in arbitrary order, or a * null pointer if 'node' is the last node in
> + * the 'hmap' slice.
> + *
> + */
> +static inline struct hmap_node *
> +parallel_hmap_next(const struct hmap *hmap,
> +                   const struct hmap_node *node, ssize_t pool_size)
> +{
> +    return (node->next
> +            ? node->next
> +            : parallel_hmap_next__(hmap,
> +                (node->hash & hmap->mask) + pool_size, pool_size));
> +}
> +
> +static inline void post_completed_work(struct worker_control *control)
> +{
> +    atomic_thread_fence(memory_order_acq_rel);
> +    atomic_store_relaxed(&control->finished, true);
> +    sem_post(control->done);
> +}
> +
> +static inline void wait_for_work(struct worker_control *control)
> +{
> +    int ret;
> +
> +    do {
> +        ret = sem_wait(control->fire);
> +    } while ((ret == -1) && (errno == EINTR));
> +    ovs_assert(ret == 0);
> +}
> +static inline void wait_for_work_completion(struct worker_pool *pool)
> +{
> +    int ret;
> +
> +    do {
> +        ret = sem_wait(pool->done);
> +    } while ((ret == -1) && (errno == EINTR));
> +    ovs_assert(ret == 0);
> +}
> +
> +
> +/* Hash per-row locking support - to be used only in conjunction
> + * with fast hash inserts. Normal hash inserts may resize the hash
> + * rendering the locking invalid.
> + */
> +
> +struct hashrow_locks {
> +    ssize_t mask;
> +    struct ovs_mutex *row_locks;
> +};
> +
> +/* Update an hash row locks structure to match the current hash size */
> +
> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
> +
> +/* Lock a hash row */
> +
> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
> +{
> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
> +}
> +
> +/* Unlock a hash row */
> +
> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
> +{
> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
> +}
> +/* Init the row locks structure */
> +
> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
> +{
> +    hrl->mask = 0;
> +    hrl->row_locks = NULL;
> +}
> +
> +bool ovn_can_parallelize_hashes(bool force_parallel);
> +
> +/* Use the OVN library functions for stuff which OVS has not defined
> + * If OVS has defined these, they will still compile using the OVN
> + * local names, but will be dropped by the linker in favour of the OVS
> + * supplied functions.
> + */
> +
> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
> +
> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
> +
> +#define stop_parallel_processing() ovn_stop_parallel_processing()
> +
> +#define add_worker_pool(start) ovn_add_worker_pool(start)
> +
> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
> +
> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
> +
> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
> +
> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
> +
> +#define ovn_run_pool(pool) ovn_run_pool(pool)
> +
> +#define run_pool_hash(pool, result, result_frags) \
> +    ovn_run_pool_hash(pool, result, result_frags)
> +
> +#define run_pool_list(pool, result, result_frags) \
> +    ovn_run_pool_list(pool, result, result_frags)
> +
> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
> +
> +
> +
> +#ifdef __clang__
> +#pragma clang diagnostic pop
> +#endif
> +
> +#endif
> +
> +#ifdef  __cplusplus
> +}
> +#endif
> +
> +
> +#endif /* lib/fasthmap.h */
> --
> 2.20.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Anton Ivanov March 25, 2021, 9:30 a.m. UTC | #3
On 24/03/2021 15:31, Numan Siddique wrote:
> On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>>
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> This adds a set of functions and macros intended to process
>> hashes in parallel.
>>
>> The principles of operation are documented in the ovn-parallel-hmap.h
>>
>> If these one day go into the OVS tree, the OVS tree versions
>> would be used in preference.
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> 
> Hi Anton,
> 
> I tested the first 2 patches of this series and it crashes again for me.
> 
> This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
> E3-1220 v5 @ 3.00GHz
> 
> The below trace is seen for both gcc and clang.
> 
> ----
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `ovn-northd -vjsonrpc
> --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
> /lib64/libpthread.so.0
> [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
> Missing separate debuginfos, use: dnf debuginfo-install
> glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
> libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
> python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
> zlib-1.2.11-23.fc33.x86_64
> (gdb) bt
> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
> /lib64/libpthread.so.0
> #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
> ../lib/ovn-parallel-hmap.h:203
> #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
> #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
> ../lib/ovs-thread.c:383
> #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
> #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
> -----
> 
> I'm not sure why you're not able to reproduce this issue.

I can't. I have run it for days in a loop.

One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.

There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.

Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.

diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
index e83ae23cb..3597f896f 100644
--- a/lib/ovn-parallel-hmap.c
+++ b/lib/ovn-parallel-hmap.c
@@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
          }

          for (i = 0; i < pool_size; i++) {
-            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
+            new_pool->controls[i].worker =
+                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
          }
          ovs_list_push_back(&worker_pools, &new_pool->list_node);
      }
@@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
          for (i = 0; i < pool->size ; i++) {
              sem_post(pool->controls[i].fire);
          }
+        for (i = 0; i < pool->size ; i++) {
+            pthread_join(pool->controls[i].worker, NULL);
+        }
          for (i = 0; i < pool->size ; i++) {
              sem_close(pool->controls[i].fire);
              sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
index 8db61eaba..d62ca3da5 100644
--- a/lib/ovn-parallel-hmap.h
+++ b/lib/ovn-parallel-hmap.h
@@ -82,6 +82,7 @@ struct worker_control {
      struct ovs_mutex mutex; /* Guards the data. */
      void *data; /* Pointer to data to be processed. */
      void *workload; /* back-pointer to the worker pool structure. */
+    pthread_t worker;
  };

  struct worker_pool {


> 
> All the test cases passed for me. So maybe something's wrong when
> ovn-northd exits.
> IMHO, these crashes should be addressed before these patches can be considered.
> 
> Thanks
> Numan
> 
>> ---
>>   lib/automake.mk         |   2 +
>>   lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>>   lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>>   3 files changed, 758 insertions(+)
>>   create mode 100644 lib/ovn-parallel-hmap.c
>>   create mode 100644 lib/ovn-parallel-hmap.h
>>
>> diff --git a/lib/automake.mk b/lib/automake.mk
>> index 250c7aefa..781be2109 100644
>> --- a/lib/automake.mk
>> +++ b/lib/automake.mk
>> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>>          lib/expr.c \
>>          lib/extend-table.h \
>>          lib/extend-table.c \
>> +       lib/ovn-parallel-hmap.h \
>> +       lib/ovn-parallel-hmap.c \
>>          lib/ip-mcast-index.c \
>>          lib/ip-mcast-index.h \
>>          lib/mcast-group-index.c \
>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>> new file mode 100644
>> index 000000000..e83ae23cb
>> --- /dev/null
>> +++ b/lib/ovn-parallel-hmap.c
>> @@ -0,0 +1,455 @@
>> +/*
>> + * Copyright (c) 2020 Red Hat, Inc.
>> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#include <config.h>
>> +#include <stdint.h>
>> +#include <string.h>
>> +#include <stdlib.h>
>> +#include <fcntl.h>
>> +#include <unistd.h>
>> +#include <errno.h>
>> +#include <semaphore.h>
>> +#include "fatal-signal.h"
>> +#include "util.h"
>> +#include "openvswitch/vlog.h"
>> +#include "openvswitch/hmap.h"
>> +#include "openvswitch/thread.h"
>> +#include "ovn-parallel-hmap.h"
>> +#include "ovs-atomic.h"
>> +#include "ovs-thread.h"
>> +#include "ovs-numa.h"
>> +#include "random.h"
>> +
>> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
>> +
>> +#ifndef OVS_HAS_PARALLEL_HMAP
>> +
>> +#define WORKER_SEM_NAME "%x-%p-%x"
>> +#define MAIN_SEM_NAME "%x-%p-main"
>> +
>> +/* These are accessed under mutex inside add_worker_pool().
>> + * They do not need to be atomic.
>> + */
>> +
>> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
>> +static bool can_parallelize = false;
>> +
>> +/* This is set only in the process of exit and the set is
>> + * accompanied by a fence. It does not need to be atomic or be
>> + * accessed under a lock.
>> + */
>> +
>> +static bool workers_must_exit = false;
>> +
>> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
>> +
>> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
>> +
>> +static int pool_size;
>> +
>> +static int sembase;
>> +
>> +static void worker_pool_hook(void *aux OVS_UNUSED);
>> +static void setup_worker_pools(bool force);
>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>> +                               void *fin_result, void *result_frags,
>> +                               int index);
>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>> +                               void *fin_result, void *result_frags,
>> +                               int index);
>> +
>> +bool ovn_stop_parallel_processing(void)
>> +{
>> +    return workers_must_exit;
>> +}
>> +
>> +bool ovn_can_parallelize_hashes(bool force_parallel)
>> +{
>> +    bool test = false;
>> +
>> +    if (atomic_compare_exchange_strong(
>> +            &initial_pool_setup,
>> +            &test,
>> +            true)) {
>> +        ovs_mutex_lock(&init_mutex);
>> +        setup_worker_pools(force_parallel);
>> +        ovs_mutex_unlock(&init_mutex);
>> +    }
>> +    return can_parallelize;
>> +}
>> +
>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>> +
>> +    struct worker_pool *new_pool = NULL;
>> +    struct worker_control *new_control;
>> +    bool test = false;
>> +    int i;
>> +    char sem_name[256];
>> +
>> +
>> +    /* Belt and braces - initialize the pool system just in case if
>> +     * if it is not yet initialized.
>> +     */
>> +
>> +    if (atomic_compare_exchange_strong(
>> +            &initial_pool_setup,
>> +            &test,
>> +            true)) {
>> +        ovs_mutex_lock(&init_mutex);
>> +        setup_worker_pools(false);
>> +        ovs_mutex_unlock(&init_mutex);
>> +    }
>> +
>> +    ovs_mutex_lock(&init_mutex);
>> +    if (can_parallelize) {
>> +        new_pool = xmalloc(sizeof(struct worker_pool));
>> +        new_pool->size = pool_size;
>> +        new_pool->controls = NULL;
>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>> +        if (new_pool->done == SEM_FAILED) {
>> +            goto cleanup;
>> +        }
>> +
>> +        new_pool->controls =
>> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
>> +
>> +        for (i = 0; i < new_pool->size; i++) {
>> +            new_control = &new_pool->controls[i];
>> +            new_control->id = i;
>> +            new_control->done = new_pool->done;
>> +            new_control->data = NULL;
>> +            ovs_mutex_init(&new_control->mutex);
>> +            new_control->finished = ATOMIC_VAR_INIT(false);
>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>> +            if (new_control->fire == SEM_FAILED) {
>> +                goto cleanup;
>> +            }
>> +        }
>> +
>> +        for (i = 0; i < pool_size; i++) {
>> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>> +        }
>> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
>> +    }
>> +    ovs_mutex_unlock(&init_mutex);
>> +    return new_pool;
>> +cleanup:
>> +
>> +    /* Something went wrong when opening semaphores. In this case
>> +     * it is better to shut off parallel procesing altogether
>> +     */
>> +
>> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
>> +    can_parallelize = false;
>> +    if (new_pool->controls) {
>> +        for (i = 0; i < new_pool->size; i++) {
>> +            if (new_pool->controls[i].fire != SEM_FAILED) {
>> +                sem_close(new_pool->controls[i].fire);
>> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>> +                sem_unlink(sem_name);
>> +                break; /* semaphores past this one are uninitialized */
>> +            }
>> +        }
>> +    }
>> +    if (new_pool->done != SEM_FAILED) {
>> +        sem_close(new_pool->done);
>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>> +        sem_unlink(sem_name);
>> +    }
>> +    ovs_mutex_unlock(&init_mutex);
>> +    return NULL;
>> +}
>> +
>> +
>> +/* Initializes 'hmap' as an empty hash table with mask N. */
>> +void
>> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
>> +{
>> +    size_t i;
>> +
>> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
>> +    hmap->one = NULL;
>> +    hmap->mask = mask;
>> +    hmap->n = 0;
>> +    for (i = 0; i <= hmap->mask; i++) {
>> +        hmap->buckets[i] = NULL;
>> +    }
>> +}
>> +
>> +/* Initializes 'hmap' as an empty hash table of size X.
>> + * Intended for use in parallel processing so that all
>> + * fragments used to store results in a parallel job
>> + * are the same size.
>> + */
>> +void
>> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
>> +{
>> +    size_t mask;
>> +    mask = size / 2;
>> +    mask |= mask >> 1;
>> +    mask |= mask >> 2;
>> +    mask |= mask >> 4;
>> +    mask |= mask >> 8;
>> +    mask |= mask >> 16;
>> +#if SIZE_MAX > UINT32_MAX
>> +    mask |= mask >> 32;
>> +#endif
>> +
>> +    /* If we need to dynamically allocate buckets we might as well allocate at
>> +     * least 4 of them. */
>> +    mask |= (mask & 1) << 1;
>> +
>> +    fast_hmap_init(hmap, mask);
>> +}
>> +
>> +/* Run a thread pool which uses a callback function to process results
>> + */
>> +
>> +void ovn_run_pool_callback(struct worker_pool *pool,
>> +                           void *fin_result, void *result_frags,
>> +                           void (*helper_func)(struct worker_pool *pool,
>> +                                               void *fin_result,
>> +                                               void *result_frags, int index))
>> +{
>> +    int index, completed;
>> +
>> +    /* Ensure that all worker threads see the same data as the
>> +     * main thread.
>> +     */
>> +
>> +    atomic_thread_fence(memory_order_acq_rel);
>> +
>> +    /* Start workers */
>> +
>> +    for (index = 0; index < pool->size; index++) {
>> +        sem_post(pool->controls[index].fire);
>> +    }
>> +
>> +    completed = 0;
>> +
>> +    do {
>> +        bool test;
>> +        /* Note - we do not loop on semaphore until it reaches
>> +         * zero, but on pool size/remaining workers.
>> +         * This is by design. If the inner loop can handle
>> +         * completion for more than one worker within an iteration
>> +         * it will do so to ensure no additional iterations and
>> +         * waits once all of them are done.
>> +         *
>> +         * This may result in us having an initial positive value
>> +         * of the semaphore when the pool is invoked the next time.
>> +         * This is harmless - the loop will spin up a couple of times
>> +         * doing nothing while the workers are processing their data
>> +         * slices.
>> +         */
>> +        wait_for_work_completion(pool);
>> +        for (index = 0; index < pool->size; index++) {
>> +            test = true;
>> +            /* If the worker has marked its data chunk as complete,
>> +             * invoke the helper function to combine the results of
>> +             * this worker into the main result.
>> +             *
>> +             * The worker must invoke an appropriate memory fence
>> +             * (most likely acq_rel) to ensure that the main thread
>> +             * sees all of the results produced by the worker.
>> +             */
>> +            if (atomic_compare_exchange_weak(
>> +                    &pool->controls[index].finished,
>> +                    &test,
>> +                    false)) {
>> +                if (helper_func) {
>> +                    (helper_func)(pool, fin_result, result_frags, index);
>> +                }
>> +                completed++;
>> +                pool->controls[index].data = NULL;
>> +            }
>> +        }
>> +    } while (completed < pool->size);
>> +}
>> +
>> +/* Run a thread pool - basic, does not do results processing.
>> + */
>> +
>> +void ovn_run_pool(struct worker_pool *pool)
>> +{
>> +    run_pool_callback(pool, NULL, NULL, NULL);
>> +}
>> +
>> +/* Brute force merge of a hashmap into another hashmap.
>> + * Intended for use in parallel processing. The destination
>> + * hashmap MUST be the same size as the one being merged.
>> + *
>> + * This can be achieved by pre-allocating them to correct size
>> + * and using hmap_insert_fast() instead of hmap_insert()
>> + */
>> +
>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
>> +{
>> +    size_t i;
>> +
>> +    ovs_assert(inc->mask == dest->mask);
>> +
>> +    if (!inc->n) {
>> +        /* Request to merge an empty frag, nothing to do */
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i <= dest->mask; i++) {
>> +        struct hmap_node **dest_bucket = &dest->buckets[i];
>> +        struct hmap_node **inc_bucket = &inc->buckets[i];
>> +        if (*inc_bucket != NULL) {
>> +            struct hmap_node *last_node = *inc_bucket;
>> +            while (last_node->next != NULL) {
>> +                last_node = last_node->next;
>> +            }
>> +            last_node->next = *dest_bucket;
>> +            *dest_bucket = *inc_bucket;
>> +            *inc_bucket = NULL;
>> +        }
>> +    }
>> +    dest->n += inc->n;
>> +    inc->n = 0;
>> +}
>> +
>> +/* Run a thread pool which gathers results in an array
>> + * of hashes. Merge results.
>> + */
>> +
>> +
>> +void ovn_run_pool_hash(
>> +        struct worker_pool *pool,
>> +        struct hmap *result,
>> +        struct hmap *result_frags)
>> +{
>> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
>> +}
>> +
>> +/* Run a thread pool which gathers results in an array of lists.
>> + * Merge results.
>> + */
>> +void ovn_run_pool_list(
>> +        struct worker_pool *pool,
>> +        struct ovs_list *result,
>> +        struct ovs_list *result_frags)
>> +{
>> +    run_pool_callback(pool, result, result_frags, merge_list_results);
>> +}
>> +
>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
>> +{
>> +    int i;
>> +    if (hrl->mask != lflows->mask) {
>> +        if (hrl->row_locks) {
>> +            free(hrl->row_locks);
>> +        }
>> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
>> +        hrl->mask = lflows->mask;
>> +        for (i = 0; i <= lflows->mask; i++) {
>> +            ovs_mutex_init(&hrl->row_locks[i]);
>> +        }
>> +    }
>> +}
>> +
>> +static void worker_pool_hook(void *aux OVS_UNUSED) {
>> +    int i;
>> +    static struct worker_pool *pool;
>> +    char sem_name[256];
>> +
>> +    workers_must_exit = true;
>> +
>> +    /* All workers must honour the must_exit flag and check for it regularly.
>> +     * We can make it atomic and check it via atomics in workers, but that
>> +     * is not really necessary as it is set just once - when the program
>> +     * terminates. So we use a fence which is invoked before exiting instead.
>> +     */
>> +    atomic_thread_fence(memory_order_acq_rel);
>> +
>> +    /* Wake up the workers after the must_exit flag has been set */
>> +
>> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
>> +        for (i = 0; i < pool->size ; i++) {
>> +            sem_post(pool->controls[i].fire);
>> +        }
>> +        for (i = 0; i < pool->size ; i++) {
>> +            sem_close(pool->controls[i].fire);
>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>> +            sem_unlink(sem_name);
>> +        }
>> +        sem_close(pool->done);
>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
>> +        sem_unlink(sem_name);
>> +    }
>> +}
>> +
>> +static void setup_worker_pools(bool force) {
>> +    int cores, nodes;
>> +
>> +    nodes = ovs_numa_get_n_numas();
>> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>> +        nodes = 1;
>> +    }
>> +    cores = ovs_numa_get_n_cores();
>> +
>> +    /* If there is no NUMA config, use 4 cores.
>> +     * If there is NUMA config use half the cores on
>> +     * one node so that the OS does not start pushing
>> +     * threads to other nodes.
>> +     */
>> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
>> +        /* If there is no NUMA we can try the ovs-threads routine.
>> +         * It falls back to sysconf and/or affinity mask.
>> +         */
>> +        cores = count_cpu_cores();
>> +        pool_size = cores;
>> +    } else {
>> +        pool_size = cores / nodes;
>> +    }
>> +    if ((pool_size < 4) && force) {
>> +        pool_size = 4;
>> +    }
>> +    can_parallelize = (pool_size >= 3);
>> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
>> +    sembase = random_uint32();
>> +}
>> +
>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>> +                               void *fin_result, void *result_frags,
>> +                               int index)
>> +{
>> +    struct ovs_list *result = (struct ovs_list *)fin_result;
>> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
>> +
>> +    if (!ovs_list_is_empty(&res_frags[index])) {
>> +        ovs_list_splice(result->next,
>> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
>> +    }
>> +}
>> +
>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>> +                               void *fin_result, void *result_frags,
>> +                               int index)
>> +{
>> +    struct hmap *result = (struct hmap *)fin_result;
>> +    struct hmap *res_frags = (struct hmap *)result_frags;
>> +
>> +    fast_hmap_merge(result, &res_frags[index]);
>> +    hmap_destroy(&res_frags[index]);
>> +}
>> +
>> +#endif
>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>> new file mode 100644
>> index 000000000..8db61eaba
>> --- /dev/null
>> +++ b/lib/ovn-parallel-hmap.h
>> @@ -0,0 +1,301 @@
>> +/*
>> + * Copyright (c) 2020 Red Hat, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#ifndef OVN_PARALLEL_HMAP
>> +#define OVN_PARALLEL_HMAP 1
>> +
>> +/* if the parallel macros are defined by hmap.h or any other ovs define
>> + * we skip over the ovn specific definitions.
>> + */
>> +
>> +#ifdef  __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +#include <stdbool.h>
>> +#include <stdlib.h>
>> +#include <semaphore.h>
>> +#include <errno.h>
>> +#include "openvswitch/util.h"
>> +#include "openvswitch/hmap.h"
>> +#include "openvswitch/thread.h"
>> +#include "ovs-atomic.h"
>> +
>> +/* Process this include only if OVS does not supply parallel definitions
>> + */
>> +
>> +#ifdef OVS_HAS_PARALLEL_HMAP
>> +
>> +#include "parallel-hmap.h"
>> +
>> +#else
>> +
>> +
>> +#ifdef __clang__
>> +#pragma clang diagnostic push
>> +#pragma clang diagnostic ignored "-Wthread-safety"
>> +#endif
>> +
>> +
>> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
>> + * of parallel processing.
>> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
>> + * and will iterate hash buckets ThreadID, ThreadID + step,
>> + * ThreadID + step * 2, etc. The actual macro accepts
>> + * ThreadID + step * i as the JOBID parameter.
>> + */
>> +
>> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
>> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
>> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
>> +       || ((NODE = NULL), false); \
>> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
>> +
>> +/* We do not have a SAFE version of the macro, because the hash size is not
>> + * atomic and hash removal operations would need to be wrapped with
>> + * locks. This will defeat most of the benefits from doing anything in
>> + * parallel.
>> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
>> + * each thread should store them in a temporary list result instead, merging
>> + * the lists into a combined result at the end */
>> +
>> +/* Work "Handle" */
>> +
>> +struct worker_control {
>> +    int id; /* Used as a modulo when iterating over a hash. */
>> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
>> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
>> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
>> +    struct ovs_mutex mutex; /* Guards the data. */
>> +    void *data; /* Pointer to data to be processed. */
>> +    void *workload; /* back-pointer to the worker pool structure. */
>> +};
>> +
>> +struct worker_pool {
>> +    int size;   /* Number of threads in the pool. */
>> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
>> +    struct worker_control *controls; /* "Handles" in this pool. */
>> +    sem_t *done; /* Work completion semaphorew. */
>> +};
>> +
>> +/* Add a worker pool for thread function start() which expects a pointer to
>> + * a worker_control structure as an argument. */
>> +
>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
>> +
>> +/* Setting this to true will make all processing threads exit */
>> +
>> +bool ovn_stop_parallel_processing(void);
>> +
>> +/* Build a hmap pre-sized for size elements */
>> +
>> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
>> +
>> +/* Build a hmap with a mask equals to size */
>> +
>> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
>> +
>> +/* Brute-force merge a hmap into hmap.
>> + * Dest and inc have to have the same mask. The merge is performed
>> + * by extending the element list for bucket N in the dest hmap with the list
>> + * from bucket N in inc.
>> + */
>> +
>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
>> +
>> +/* Run a pool, without any default processing of results.
>> + */
>> +
>> +void ovn_run_pool(struct worker_pool *pool);
>> +
>> +/* Run a pool, merge results from hash frags into a final hash result.
>> + * The hash frags must be pre-sized to the same size.
>> + */
>> +
>> +void ovn_run_pool_hash(struct worker_pool *pool,
>> +                       struct hmap *result, struct hmap *result_frags);
>> +/* Run a pool, merge results from list frags into a final list result.
>> + */
>> +
>> +void ovn_run_pool_list(struct worker_pool *pool,
>> +                       struct ovs_list *result, struct ovs_list *result_frags);
>> +
>> +/* Run a pool, call a callback function to perform processing of results.
>> + */
>> +
>> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
>> +                    void *result_frags,
>> +                    void (*helper_func)(struct worker_pool *pool,
>> +                        void *fin_result, void *result_frags, int index));
>> +
>> +
>> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
>> + * would land, or a null pointer if that bucket is empty. */
>> +
>> +static inline struct hmap_node *
>> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
>> +{
>> +    return hmap->buckets[num];
>> +}
>> +
>> +static inline struct hmap_node *
>> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
>> +{
>> +    size_t i;
>> +    for (i = start; i <= hmap->mask; i+= pool_size) {
>> +        struct hmap_node *node = hmap->buckets[i];
>> +        if (node) {
>> +            return node;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>> +/* Returns the first node in 'hmap', as expected by thread with job_id
>> + * for parallel processing in arbitrary order, or a null pointer if
>> + * the slice of 'hmap' for that job_id is empty. */
>> +static inline struct hmap_node *
>> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
>> +{
>> +    return parallel_hmap_next__(hmap, job_id, pool_size);
>> +}
>> +
>> +/* Returns the next node in the slice of 'hmap' following 'node',
>> + * in arbitrary order, or a * null pointer if 'node' is the last node in
>> + * the 'hmap' slice.
>> + *
>> + */
>> +static inline struct hmap_node *
>> +parallel_hmap_next(const struct hmap *hmap,
>> +                   const struct hmap_node *node, ssize_t pool_size)
>> +{
>> +    return (node->next
>> +            ? node->next
>> +            : parallel_hmap_next__(hmap,
>> +                (node->hash & hmap->mask) + pool_size, pool_size));
>> +}
>> +
>> +static inline void post_completed_work(struct worker_control *control)
>> +{
>> +    atomic_thread_fence(memory_order_acq_rel);
>> +    atomic_store_relaxed(&control->finished, true);
>> +    sem_post(control->done);
>> +}
>> +
>> +static inline void wait_for_work(struct worker_control *control)
>> +{
>> +    int ret;
>> +
>> +    do {
>> +        ret = sem_wait(control->fire);
>> +    } while ((ret == -1) && (errno == EINTR));
>> +    ovs_assert(ret == 0);
>> +}
>> +static inline void wait_for_work_completion(struct worker_pool *pool)
>> +{
>> +    int ret;
>> +
>> +    do {
>> +        ret = sem_wait(pool->done);
>> +    } while ((ret == -1) && (errno == EINTR));
>> +    ovs_assert(ret == 0);
>> +}
>> +
>> +
>> +/* Hash per-row locking support - to be used only in conjunction
>> + * with fast hash inserts. Normal hash inserts may resize the hash
>> + * rendering the locking invalid.
>> + */
>> +
>> +struct hashrow_locks {
>> +    ssize_t mask;
>> +    struct ovs_mutex *row_locks;
>> +};
>> +
>> +/* Update an hash row locks structure to match the current hash size */
>> +
>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
>> +
>> +/* Lock a hash row */
>> +
>> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>> +{
>> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
>> +}
>> +
>> +/* Unlock a hash row */
>> +
>> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>> +{
>> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
>> +}
>> +/* Init the row locks structure */
>> +
>> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
>> +{
>> +    hrl->mask = 0;
>> +    hrl->row_locks = NULL;
>> +}
>> +
>> +bool ovn_can_parallelize_hashes(bool force_parallel);
>> +
>> +/* Use the OVN library functions for stuff which OVS has not defined
>> + * If OVS has defined these, they will still compile using the OVN
>> + * local names, but will be dropped by the linker in favour of the OVS
>> + * supplied functions.
>> + */
>> +
>> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
>> +
>> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
>> +
>> +#define stop_parallel_processing() ovn_stop_parallel_processing()
>> +
>> +#define add_worker_pool(start) ovn_add_worker_pool(start)
>> +
>> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>> +
>> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
>> +
>> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
>> +
>> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
>> +
>> +#define ovn_run_pool(pool) ovn_run_pool(pool)
>> +
>> +#define run_pool_hash(pool, result, result_frags) \
>> +    ovn_run_pool_hash(pool, result, result_frags)
>> +
>> +#define run_pool_list(pool, result, result_frags) \
>> +    ovn_run_pool_list(pool, result, result_frags)
>> +
>> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
>> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
>> +
>> +
>> +
>> +#ifdef __clang__
>> +#pragma clang diagnostic pop
>> +#endif
>> +
>> +#endif
>> +
>> +#ifdef  __cplusplus
>> +}
>> +#endif
>> +
>> +
>> +#endif /* lib/fasthmap.h */
>> --
>> 2.20.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Numan Siddique March 26, 2021, 3:25 a.m. UTC | #4
On Thu, Mar 25, 2021 at 3:01 PM Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
>
>
>
> On 24/03/2021 15:31, Numan Siddique wrote:
> > On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
> >>
> >> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> >>
> >> This adds a set of functions and macros intended to process
> >> hashes in parallel.
> >>
> >> The principles of operation are documented in the ovn-parallel-hmap.h
> >>
> >> If these one day go into the OVS tree, the OVS tree versions
> >> would be used in preference.
> >>
> >> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> >
> > Hi Anton,
> >
> > I tested the first 2 patches of this series and it crashes again for me.
> >
> > This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
> > E3-1220 v5 @ 3.00GHz
> >
> > The below trace is seen for both gcc and clang.
> >
> > ----
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `ovn-northd -vjsonrpc
> > --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
> > /lib64/libpthread.so.0
> > [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
> > Missing separate debuginfos, use: dnf debuginfo-install
> > glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
> > libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
> > python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
> > zlib-1.2.11-23.fc33.x86_64
> > (gdb) bt
> > #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
> > /lib64/libpthread.so.0
> > #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
> > ../lib/ovn-parallel-hmap.h:203
> > #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
> > #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
> > ../lib/ovs-thread.c:383
> > #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
> > #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
> > -----
> >
> > I'm not sure why you're not able to reproduce this issue.
>
> I can't. I have run it for days in a loop.
>
> One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.
>
> There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.
>
> Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.
>
> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
> index e83ae23cb..3597f896f 100644
> --- a/lib/ovn-parallel-hmap.c
> +++ b/lib/ovn-parallel-hmap.c
> @@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>           }
>
>           for (i = 0; i < pool_size; i++) {
> -            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
> +            new_pool->controls[i].worker =
> +                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>           }
>           ovs_list_push_back(&worker_pools, &new_pool->list_node);
>       }
> @@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
>           for (i = 0; i < pool->size ; i++) {
>               sem_post(pool->controls[i].fire);
>           }
> +        for (i = 0; i < pool->size ; i++) {
> +            pthread_join(pool->controls[i].worker, NULL);
> +        }
>           for (i = 0; i < pool->size ; i++) {
>               sem_close(pool->controls[i].fire);
>               sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
> index 8db61eaba..d62ca3da5 100644
> --- a/lib/ovn-parallel-hmap.h
> +++ b/lib/ovn-parallel-hmap.h
> @@ -82,6 +82,7 @@ struct worker_control {
>       struct ovs_mutex mutex; /* Guards the data. */
>       void *data; /* Pointer to data to be processed. */
>       void *workload; /* back-pointer to the worker pool structure. */
> +    pthread_t worker;
>   };
>
>   struct worker_pool {
>

I applied the above diff on top of patch 2  and did some tests.  I see
a big improvement
with this.  On my "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz"  server,
I saw just one
crash only once when I ran the test suite multiple times.

On my work laptop (in which the tests used to hang earlier), all the
tests are passing now.
But I see a lot more consistent crashes here.  For all single run of
whole testsuite (with make check -j5)
I observed around 7 crashes.  Definitely an improvement when compared
to my previous runs with v14.

Here are the back traces details of the core dumps I observed -
https://gist.github.com/numansiddique/5cab90ec4a1ee6e1adbfd3cd90eccf5a

Crash 1 and Crash 2 are frequent.  Let me know in case you want the core files.

Thanks
Numan

>
> >
> > All the test cases passed for me. So maybe something's wrong when
> > ovn-northd exits.
> > IMHO, these crashes should be addressed before these patches can be considered.
> >
> > Thanks
> > Numan
> >
> >> ---
> >>   lib/automake.mk         |   2 +
> >>   lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
> >>   lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
> >>   3 files changed, 758 insertions(+)
> >>   create mode 100644 lib/ovn-parallel-hmap.c
> >>   create mode 100644 lib/ovn-parallel-hmap.h
> >>
> >> diff --git a/lib/automake.mk b/lib/automake.mk
> >> index 250c7aefa..781be2109 100644
> >> --- a/lib/automake.mk
> >> +++ b/lib/automake.mk
> >> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
> >>          lib/expr.c \
> >>          lib/extend-table.h \
> >>          lib/extend-table.c \
> >> +       lib/ovn-parallel-hmap.h \
> >> +       lib/ovn-parallel-hmap.c \
> >>          lib/ip-mcast-index.c \
> >>          lib/ip-mcast-index.h \
> >>          lib/mcast-group-index.c \
> >> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
> >> new file mode 100644
> >> index 000000000..e83ae23cb
> >> --- /dev/null
> >> +++ b/lib/ovn-parallel-hmap.c
> >> @@ -0,0 +1,455 @@
> >> +/*
> >> + * Copyright (c) 2020 Red Hat, Inc.
> >> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
> >> + *
> >> + * Licensed under the Apache License, Version 2.0 (the "License");
> >> + * you may not use this file except in compliance with the License.
> >> + * You may obtain a copy of the License at:
> >> + *
> >> + *     http://www.apache.org/licenses/LICENSE-2.0
> >> + *
> >> + * Unless required by applicable law or agreed to in writing, software
> >> + * distributed under the License is distributed on an "AS IS" BASIS,
> >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> >> + * See the License for the specific language governing permissions and
> >> + * limitations under the License.
> >> + */
> >> +
> >> +#include <config.h>
> >> +#include <stdint.h>
> >> +#include <string.h>
> >> +#include <stdlib.h>
> >> +#include <fcntl.h>
> >> +#include <unistd.h>
> >> +#include <errno.h>
> >> +#include <semaphore.h>
> >> +#include "fatal-signal.h"
> >> +#include "util.h"
> >> +#include "openvswitch/vlog.h"
> >> +#include "openvswitch/hmap.h"
> >> +#include "openvswitch/thread.h"
> >> +#include "ovn-parallel-hmap.h"
> >> +#include "ovs-atomic.h"
> >> +#include "ovs-thread.h"
> >> +#include "ovs-numa.h"
> >> +#include "random.h"
> >> +
> >> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
> >> +
> >> +#ifndef OVS_HAS_PARALLEL_HMAP
> >> +
> >> +#define WORKER_SEM_NAME "%x-%p-%x"
> >> +#define MAIN_SEM_NAME "%x-%p-main"
> >> +
> >> +/* These are accessed under mutex inside add_worker_pool().
> >> + * They do not need to be atomic.
> >> + */
> >> +
> >> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
> >> +static bool can_parallelize = false;
> >> +
> >> +/* This is set only in the process of exit and the set is
> >> + * accompanied by a fence. It does not need to be atomic or be
> >> + * accessed under a lock.
> >> + */
> >> +
> >> +static bool workers_must_exit = false;
> >> +
> >> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
> >> +
> >> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
> >> +
> >> +static int pool_size;
> >> +
> >> +static int sembase;
> >> +
> >> +static void worker_pool_hook(void *aux OVS_UNUSED);
> >> +static void setup_worker_pools(bool force);
> >> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
> >> +                               void *fin_result, void *result_frags,
> >> +                               int index);
> >> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
> >> +                               void *fin_result, void *result_frags,
> >> +                               int index);
> >> +
> >> +bool ovn_stop_parallel_processing(void)
> >> +{
> >> +    return workers_must_exit;
> >> +}
> >> +
> >> +bool ovn_can_parallelize_hashes(bool force_parallel)
> >> +{
> >> +    bool test = false;
> >> +
> >> +    if (atomic_compare_exchange_strong(
> >> +            &initial_pool_setup,
> >> +            &test,
> >> +            true)) {
> >> +        ovs_mutex_lock(&init_mutex);
> >> +        setup_worker_pools(force_parallel);
> >> +        ovs_mutex_unlock(&init_mutex);
> >> +    }
> >> +    return can_parallelize;
> >> +}
> >> +
> >> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
> >> +
> >> +    struct worker_pool *new_pool = NULL;
> >> +    struct worker_control *new_control;
> >> +    bool test = false;
> >> +    int i;
> >> +    char sem_name[256];
> >> +
> >> +
> >> +    /* Belt and braces - initialize the pool system just in case if
> >> +     * if it is not yet initialized.
> >> +     */
> >> +
> >> +    if (atomic_compare_exchange_strong(
> >> +            &initial_pool_setup,
> >> +            &test,
> >> +            true)) {
> >> +        ovs_mutex_lock(&init_mutex);
> >> +        setup_worker_pools(false);
> >> +        ovs_mutex_unlock(&init_mutex);
> >> +    }
> >> +
> >> +    ovs_mutex_lock(&init_mutex);
> >> +    if (can_parallelize) {
> >> +        new_pool = xmalloc(sizeof(struct worker_pool));
> >> +        new_pool->size = pool_size;
> >> +        new_pool->controls = NULL;
> >> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
> >> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
> >> +        if (new_pool->done == SEM_FAILED) {
> >> +            goto cleanup;
> >> +        }
> >> +
> >> +        new_pool->controls =
> >> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
> >> +
> >> +        for (i = 0; i < new_pool->size; i++) {
> >> +            new_control = &new_pool->controls[i];
> >> +            new_control->id = i;
> >> +            new_control->done = new_pool->done;
> >> +            new_control->data = NULL;
> >> +            ovs_mutex_init(&new_control->mutex);
> >> +            new_control->finished = ATOMIC_VAR_INIT(false);
> >> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
> >> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
> >> +            if (new_control->fire == SEM_FAILED) {
> >> +                goto cleanup;
> >> +            }
> >> +        }
> >> +
> >> +        for (i = 0; i < pool_size; i++) {
> >> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
> >> +        }
> >> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
> >> +    }
> >> +    ovs_mutex_unlock(&init_mutex);
> >> +    return new_pool;
> >> +cleanup:
> >> +
> >> +    /* Something went wrong when opening semaphores. In this case
> >> +     * it is better to shut off parallel procesing altogether
> >> +     */
> >> +
> >> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
> >> +    can_parallelize = false;
> >> +    if (new_pool->controls) {
> >> +        for (i = 0; i < new_pool->size; i++) {
> >> +            if (new_pool->controls[i].fire != SEM_FAILED) {
> >> +                sem_close(new_pool->controls[i].fire);
> >> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
> >> +                sem_unlink(sem_name);
> >> +                break; /* semaphores past this one are uninitialized */
> >> +            }
> >> +        }
> >> +    }
> >> +    if (new_pool->done != SEM_FAILED) {
> >> +        sem_close(new_pool->done);
> >> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
> >> +        sem_unlink(sem_name);
> >> +    }
> >> +    ovs_mutex_unlock(&init_mutex);
> >> +    return NULL;
> >> +}
> >> +
> >> +
> >> +/* Initializes 'hmap' as an empty hash table with mask N. */
> >> +void
> >> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
> >> +{
> >> +    size_t i;
> >> +
> >> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
> >> +    hmap->one = NULL;
> >> +    hmap->mask = mask;
> >> +    hmap->n = 0;
> >> +    for (i = 0; i <= hmap->mask; i++) {
> >> +        hmap->buckets[i] = NULL;
> >> +    }
> >> +}
> >> +
> >> +/* Initializes 'hmap' as an empty hash table of size X.
> >> + * Intended for use in parallel processing so that all
> >> + * fragments used to store results in a parallel job
> >> + * are the same size.
> >> + */
> >> +void
> >> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
> >> +{
> >> +    size_t mask;
> >> +    mask = size / 2;
> >> +    mask |= mask >> 1;
> >> +    mask |= mask >> 2;
> >> +    mask |= mask >> 4;
> >> +    mask |= mask >> 8;
> >> +    mask |= mask >> 16;
> >> +#if SIZE_MAX > UINT32_MAX
> >> +    mask |= mask >> 32;
> >> +#endif
> >> +
> >> +    /* If we need to dynamically allocate buckets we might as well allocate at
> >> +     * least 4 of them. */
> >> +    mask |= (mask & 1) << 1;
> >> +
> >> +    fast_hmap_init(hmap, mask);
> >> +}
> >> +
> >> +/* Run a thread pool which uses a callback function to process results
> >> + */
> >> +
> >> +void ovn_run_pool_callback(struct worker_pool *pool,
> >> +                           void *fin_result, void *result_frags,
> >> +                           void (*helper_func)(struct worker_pool *pool,
> >> +                                               void *fin_result,
> >> +                                               void *result_frags, int index))
> >> +{
> >> +    int index, completed;
> >> +
> >> +    /* Ensure that all worker threads see the same data as the
> >> +     * main thread.
> >> +     */
> >> +
> >> +    atomic_thread_fence(memory_order_acq_rel);
> >> +
> >> +    /* Start workers */
> >> +
> >> +    for (index = 0; index < pool->size; index++) {
> >> +        sem_post(pool->controls[index].fire);
> >> +    }
> >> +
> >> +    completed = 0;
> >> +
> >> +    do {
> >> +        bool test;
> >> +        /* Note - we do not loop on semaphore until it reaches
> >> +         * zero, but on pool size/remaining workers.
> >> +         * This is by design. If the inner loop can handle
> >> +         * completion for more than one worker within an iteration
> >> +         * it will do so to ensure no additional iterations and
> >> +         * waits once all of them are done.
> >> +         *
> >> +         * This may result in us having an initial positive value
> >> +         * of the semaphore when the pool is invoked the next time.
> >> +         * This is harmless - the loop will spin up a couple of times
> >> +         * doing nothing while the workers are processing their data
> >> +         * slices.
> >> +         */
> >> +        wait_for_work_completion(pool);
> >> +        for (index = 0; index < pool->size; index++) {
> >> +            test = true;
> >> +            /* If the worker has marked its data chunk as complete,
> >> +             * invoke the helper function to combine the results of
> >> +             * this worker into the main result.
> >> +             *
> >> +             * The worker must invoke an appropriate memory fence
> >> +             * (most likely acq_rel) to ensure that the main thread
> >> +             * sees all of the results produced by the worker.
> >> +             */
> >> +            if (atomic_compare_exchange_weak(
> >> +                    &pool->controls[index].finished,
> >> +                    &test,
> >> +                    false)) {
> >> +                if (helper_func) {
> >> +                    (helper_func)(pool, fin_result, result_frags, index);
> >> +                }
> >> +                completed++;
> >> +                pool->controls[index].data = NULL;
> >> +            }
> >> +        }
> >> +    } while (completed < pool->size);
> >> +}
> >> +
> >> +/* Run a thread pool - basic, does not do results processing.
> >> + */
> >> +
> >> +void ovn_run_pool(struct worker_pool *pool)
> >> +{
> >> +    run_pool_callback(pool, NULL, NULL, NULL);
> >> +}
> >> +
> >> +/* Brute force merge of a hashmap into another hashmap.
> >> + * Intended for use in parallel processing. The destination
> >> + * hashmap MUST be the same size as the one being merged.
> >> + *
> >> + * This can be achieved by pre-allocating them to correct size
> >> + * and using hmap_insert_fast() instead of hmap_insert()
> >> + */
> >> +
> >> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
> >> +{
> >> +    size_t i;
> >> +
> >> +    ovs_assert(inc->mask == dest->mask);
> >> +
> >> +    if (!inc->n) {
> >> +        /* Request to merge an empty frag, nothing to do */
> >> +        return;
> >> +    }
> >> +
> >> +    for (i = 0; i <= dest->mask; i++) {
> >> +        struct hmap_node **dest_bucket = &dest->buckets[i];
> >> +        struct hmap_node **inc_bucket = &inc->buckets[i];
> >> +        if (*inc_bucket != NULL) {
> >> +            struct hmap_node *last_node = *inc_bucket;
> >> +            while (last_node->next != NULL) {
> >> +                last_node = last_node->next;
> >> +            }
> >> +            last_node->next = *dest_bucket;
> >> +            *dest_bucket = *inc_bucket;
> >> +            *inc_bucket = NULL;
> >> +        }
> >> +    }
> >> +    dest->n += inc->n;
> >> +    inc->n = 0;
> >> +}
> >> +
> >> +/* Run a thread pool which gathers results in an array
> >> + * of hashes. Merge results.
> >> + */
> >> +
> >> +
> >> +void ovn_run_pool_hash(
> >> +        struct worker_pool *pool,
> >> +        struct hmap *result,
> >> +        struct hmap *result_frags)
> >> +{
> >> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
> >> +}
> >> +
> >> +/* Run a thread pool which gathers results in an array of lists.
> >> + * Merge results.
> >> + */
> >> +void ovn_run_pool_list(
> >> +        struct worker_pool *pool,
> >> +        struct ovs_list *result,
> >> +        struct ovs_list *result_frags)
> >> +{
> >> +    run_pool_callback(pool, result, result_frags, merge_list_results);
> >> +}
> >> +
> >> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
> >> +{
> >> +    int i;
> >> +    if (hrl->mask != lflows->mask) {
> >> +        if (hrl->row_locks) {
> >> +            free(hrl->row_locks);
> >> +        }
> >> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
> >> +        hrl->mask = lflows->mask;
> >> +        for (i = 0; i <= lflows->mask; i++) {
> >> +            ovs_mutex_init(&hrl->row_locks[i]);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +static void worker_pool_hook(void *aux OVS_UNUSED) {
> >> +    int i;
> >> +    static struct worker_pool *pool;
> >> +    char sem_name[256];
> >> +
> >> +    workers_must_exit = true;
> >> +
> >> +    /* All workers must honour the must_exit flag and check for it regularly.
> >> +     * We can make it atomic and check it via atomics in workers, but that
> >> +     * is not really necessary as it is set just once - when the program
> >> +     * terminates. So we use a fence which is invoked before exiting instead.
> >> +     */
> >> +    atomic_thread_fence(memory_order_acq_rel);
> >> +
> >> +    /* Wake up the workers after the must_exit flag has been set */
> >> +
> >> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
> >> +        for (i = 0; i < pool->size ; i++) {
> >> +            sem_post(pool->controls[i].fire);
> >> +        }
> >> +        for (i = 0; i < pool->size ; i++) {
> >> +            sem_close(pool->controls[i].fire);
> >> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
> >> +            sem_unlink(sem_name);
> >> +        }
> >> +        sem_close(pool->done);
> >> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
> >> +        sem_unlink(sem_name);
> >> +    }
> >> +}
> >> +
> >> +static void setup_worker_pools(bool force) {
> >> +    int cores, nodes;
> >> +
> >> +    nodes = ovs_numa_get_n_numas();
> >> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
> >> +        nodes = 1;
> >> +    }
> >> +    cores = ovs_numa_get_n_cores();
> >> +
> >> +    /* If there is no NUMA config, use 4 cores.
> >> +     * If there is NUMA config use half the cores on
> >> +     * one node so that the OS does not start pushing
> >> +     * threads to other nodes.
> >> +     */
> >> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
> >> +        /* If there is no NUMA we can try the ovs-threads routine.
> >> +         * It falls back to sysconf and/or affinity mask.
> >> +         */
> >> +        cores = count_cpu_cores();
> >> +        pool_size = cores;
> >> +    } else {
> >> +        pool_size = cores / nodes;
> >> +    }
> >> +    if ((pool_size < 4) && force) {
> >> +        pool_size = 4;
> >> +    }
> >> +    can_parallelize = (pool_size >= 3);
> >> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
> >> +    sembase = random_uint32();
> >> +}
> >> +
> >> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
> >> +                               void *fin_result, void *result_frags,
> >> +                               int index)
> >> +{
> >> +    struct ovs_list *result = (struct ovs_list *)fin_result;
> >> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
> >> +
> >> +    if (!ovs_list_is_empty(&res_frags[index])) {
> >> +        ovs_list_splice(result->next,
> >> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
> >> +    }
> >> +}
> >> +
> >> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
> >> +                               void *fin_result, void *result_frags,
> >> +                               int index)
> >> +{
> >> +    struct hmap *result = (struct hmap *)fin_result;
> >> +    struct hmap *res_frags = (struct hmap *)result_frags;
> >> +
> >> +    fast_hmap_merge(result, &res_frags[index]);
> >> +    hmap_destroy(&res_frags[index]);
> >> +}
> >> +
> >> +#endif
> >> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
> >> new file mode 100644
> >> index 000000000..8db61eaba
> >> --- /dev/null
> >> +++ b/lib/ovn-parallel-hmap.h
> >> @@ -0,0 +1,301 @@
> >> +/*
> >> + * Copyright (c) 2020 Red Hat, Inc.
> >> + *
> >> + * Licensed under the Apache License, Version 2.0 (the "License");
> >> + * you may not use this file except in compliance with the License.
> >> + * You may obtain a copy of the License at:
> >> + *
> >> + *     http://www.apache.org/licenses/LICENSE-2.0
> >> + *
> >> + * Unless required by applicable law or agreed to in writing, software
> >> + * distributed under the License is distributed on an "AS IS" BASIS,
> >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> >> + * See the License for the specific language governing permissions and
> >> + * limitations under the License.
> >> + */
> >> +
> >> +#ifndef OVN_PARALLEL_HMAP
> >> +#define OVN_PARALLEL_HMAP 1
> >> +
> >> +/* if the parallel macros are defined by hmap.h or any other ovs define
> >> + * we skip over the ovn specific definitions.
> >> + */
> >> +
> >> +#ifdef  __cplusplus
> >> +extern "C" {
> >> +#endif
> >> +
> >> +#include <stdbool.h>
> >> +#include <stdlib.h>
> >> +#include <semaphore.h>
> >> +#include <errno.h>
> >> +#include "openvswitch/util.h"
> >> +#include "openvswitch/hmap.h"
> >> +#include "openvswitch/thread.h"
> >> +#include "ovs-atomic.h"
> >> +
> >> +/* Process this include only if OVS does not supply parallel definitions
> >> + */
> >> +
> >> +#ifdef OVS_HAS_PARALLEL_HMAP
> >> +
> >> +#include "parallel-hmap.h"
> >> +
> >> +#else
> >> +
> >> +
> >> +#ifdef __clang__
> >> +#pragma clang diagnostic push
> >> +#pragma clang diagnostic ignored "-Wthread-safety"
> >> +#endif
> >> +
> >> +
> >> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
> >> + * of parallel processing.
> >> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
> >> + * and will iterate hash buckets ThreadID, ThreadID + step,
> >> + * ThreadID + step * 2, etc. The actual macro accepts
> >> + * ThreadID + step * i as the JOBID parameter.
> >> + */
> >> +
> >> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
> >> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
> >> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
> >> +       || ((NODE = NULL), false); \
> >> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
> >> +
> >> +/* We do not have a SAFE version of the macro, because the hash size is not
> >> + * atomic and hash removal operations would need to be wrapped with
> >> + * locks. This will defeat most of the benefits from doing anything in
> >> + * parallel.
> >> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
> >> + * each thread should store them in a temporary list result instead, merging
> >> + * the lists into a combined result at the end */
> >> +
> >> +/* Work "Handle" */
> >> +
> >> +struct worker_control {
> >> +    int id; /* Used as a modulo when iterating over a hash. */
> >> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
> >> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
> >> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
> >> +    struct ovs_mutex mutex; /* Guards the data. */
> >> +    void *data; /* Pointer to data to be processed. */
> >> +    void *workload; /* back-pointer to the worker pool structure. */
> >> +};
> >> +
> >> +struct worker_pool {
> >> +    int size;   /* Number of threads in the pool. */
> >> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
> >> +    struct worker_control *controls; /* "Handles" in this pool. */
> >> +    sem_t *done; /* Work completion semaphorew. */
> >> +};
> >> +
> >> +/* Add a worker pool for thread function start() which expects a pointer to
> >> + * a worker_control structure as an argument. */
> >> +
> >> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
> >> +
> >> +/* Setting this to true will make all processing threads exit */
> >> +
> >> +bool ovn_stop_parallel_processing(void);
> >> +
> >> +/* Build a hmap pre-sized for size elements */
> >> +
> >> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
> >> +
> >> +/* Build a hmap with a mask equals to size */
> >> +
> >> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
> >> +
> >> +/* Brute-force merge a hmap into hmap.
> >> + * Dest and inc have to have the same mask. The merge is performed
> >> + * by extending the element list for bucket N in the dest hmap with the list
> >> + * from bucket N in inc.
> >> + */
> >> +
> >> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
> >> +
> >> +/* Run a pool, without any default processing of results.
> >> + */
> >> +
> >> +void ovn_run_pool(struct worker_pool *pool);
> >> +
> >> +/* Run a pool, merge results from hash frags into a final hash result.
> >> + * The hash frags must be pre-sized to the same size.
> >> + */
> >> +
> >> +void ovn_run_pool_hash(struct worker_pool *pool,
> >> +                       struct hmap *result, struct hmap *result_frags);
> >> +/* Run a pool, merge results from list frags into a final list result.
> >> + */
> >> +
> >> +void ovn_run_pool_list(struct worker_pool *pool,
> >> +                       struct ovs_list *result, struct ovs_list *result_frags);
> >> +
> >> +/* Run a pool, call a callback function to perform processing of results.
> >> + */
> >> +
> >> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
> >> +                    void *result_frags,
> >> +                    void (*helper_func)(struct worker_pool *pool,
> >> +                        void *fin_result, void *result_frags, int index));
> >> +
> >> +
> >> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
> >> + * would land, or a null pointer if that bucket is empty. */
> >> +
> >> +static inline struct hmap_node *
> >> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
> >> +{
> >> +    return hmap->buckets[num];
> >> +}
> >> +
> >> +static inline struct hmap_node *
> >> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
> >> +{
> >> +    size_t i;
> >> +    for (i = start; i <= hmap->mask; i+= pool_size) {
> >> +        struct hmap_node *node = hmap->buckets[i];
> >> +        if (node) {
> >> +            return node;
> >> +        }
> >> +    }
> >> +    return NULL;
> >> +}
> >> +
> >> +/* Returns the first node in 'hmap', as expected by thread with job_id
> >> + * for parallel processing in arbitrary order, or a null pointer if
> >> + * the slice of 'hmap' for that job_id is empty. */
> >> +static inline struct hmap_node *
> >> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
> >> +{
> >> +    return parallel_hmap_next__(hmap, job_id, pool_size);
> >> +}
> >> +
> >> +/* Returns the next node in the slice of 'hmap' following 'node',
> >> + * in arbitrary order, or a * null pointer if 'node' is the last node in
> >> + * the 'hmap' slice.
> >> + *
> >> + */
> >> +static inline struct hmap_node *
> >> +parallel_hmap_next(const struct hmap *hmap,
> >> +                   const struct hmap_node *node, ssize_t pool_size)
> >> +{
> >> +    return (node->next
> >> +            ? node->next
> >> +            : parallel_hmap_next__(hmap,
> >> +                (node->hash & hmap->mask) + pool_size, pool_size));
> >> +}
> >> +
> >> +static inline void post_completed_work(struct worker_control *control)
> >> +{
> >> +    atomic_thread_fence(memory_order_acq_rel);
> >> +    atomic_store_relaxed(&control->finished, true);
> >> +    sem_post(control->done);
> >> +}
> >> +
> >> +static inline void wait_for_work(struct worker_control *control)
> >> +{
> >> +    int ret;
> >> +
> >> +    do {
> >> +        ret = sem_wait(control->fire);
> >> +    } while ((ret == -1) && (errno == EINTR));
> >> +    ovs_assert(ret == 0);
> >> +}
> >> +static inline void wait_for_work_completion(struct worker_pool *pool)
> >> +{
> >> +    int ret;
> >> +
> >> +    do {
> >> +        ret = sem_wait(pool->done);
> >> +    } while ((ret == -1) && (errno == EINTR));
> >> +    ovs_assert(ret == 0);
> >> +}
> >> +
> >> +
> >> +/* Hash per-row locking support - to be used only in conjunction
> >> + * with fast hash inserts. Normal hash inserts may resize the hash
> >> + * rendering the locking invalid.
> >> + */
> >> +
> >> +struct hashrow_locks {
> >> +    ssize_t mask;
> >> +    struct ovs_mutex *row_locks;
> >> +};
> >> +
> >> +/* Update an hash row locks structure to match the current hash size */
> >> +
> >> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
> >> +
> >> +/* Lock a hash row */
> >> +
> >> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
> >> +{
> >> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
> >> +}
> >> +
> >> +/* Unlock a hash row */
> >> +
> >> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
> >> +{
> >> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
> >> +}
> >> +/* Init the row locks structure */
> >> +
> >> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
> >> +{
> >> +    hrl->mask = 0;
> >> +    hrl->row_locks = NULL;
> >> +}
> >> +
> >> +bool ovn_can_parallelize_hashes(bool force_parallel);
> >> +
> >> +/* Use the OVN library functions for stuff which OVS has not defined
> >> + * If OVS has defined these, they will still compile using the OVN
> >> + * local names, but will be dropped by the linker in favour of the OVS
> >> + * supplied functions.
> >> + */
> >> +
> >> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
> >> +
> >> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
> >> +
> >> +#define stop_parallel_processing() ovn_stop_parallel_processing()
> >> +
> >> +#define add_worker_pool(start) ovn_add_worker_pool(start)
> >> +
> >> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
> >> +
> >> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
> >> +
> >> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
> >> +
> >> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
> >> +
> >> +#define ovn_run_pool(pool) ovn_run_pool(pool)
> >> +
> >> +#define run_pool_hash(pool, result, result_frags) \
> >> +    ovn_run_pool_hash(pool, result, result_frags)
> >> +
> >> +#define run_pool_list(pool, result, result_frags) \
> >> +    ovn_run_pool_list(pool, result, result_frags)
> >> +
> >> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
> >> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
> >> +
> >> +
> >> +
> >> +#ifdef __clang__
> >> +#pragma clang diagnostic pop
> >> +#endif
> >> +
> >> +#endif
> >> +
> >> +#ifdef  __cplusplus
> >> +}
> >> +#endif
> >> +
> >> +
> >> +#endif /* lib/fasthmap.h */
> >> --
> >> 2.20.1
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
>
> --
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661
> https://www.cambridgegreys.com/
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Anton Ivanov March 26, 2021, 8:07 a.m. UTC | #5
On 26/03/2021 03:25, Numan Siddique wrote:
> On Thu, Mar 25, 2021 at 3:01 PM Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>>
>>
>> On 24/03/2021 15:31, Numan Siddique wrote:
>>> On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>>>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>>
>>>> This adds a set of functions and macros intended to process
>>>> hashes in parallel.
>>>>
>>>> The principles of operation are documented in the ovn-parallel-hmap.h
>>>>
>>>> If these one day go into the OVS tree, the OVS tree versions
>>>> would be used in preference.
>>>>
>>>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>> Hi Anton,
>>>
>>> I tested the first 2 patches of this series and it crashes again for me.
>>>
>>> This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
>>> E3-1220 v5 @ 3.00GHz
>>>
>>> The below trace is seen for both gcc and clang.
>>>
>>> ----
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `ovn-northd -vjsonrpc
>>> --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
>>> Missing separate debuginfos, use: dnf debuginfo-install
>>> glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
>>> libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
>>> python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
>>> zlib-1.2.11-23.fc33.x86_64
>>> (gdb) bt
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
>>> ../lib/ovn-parallel-hmap.h:203
>>> #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
>>> #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
>>> ../lib/ovs-thread.c:383
>>> #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
>>> #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
>>> -----
>>>
>>> I'm not sure why you're not able to reproduce this issue.
>> I can't. I have run it for days in a loop.
>>
>> One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.
>>
>> There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.
>>
>> Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.
>>
>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>> index e83ae23cb..3597f896f 100644
>> --- a/lib/ovn-parallel-hmap.c
>> +++ b/lib/ovn-parallel-hmap.c
>> @@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>            }
>>
>>            for (i = 0; i < pool_size; i++) {
>> -            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>> +            new_pool->controls[i].worker =
>> +                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>            }
>>            ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>        }
>> @@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_post(pool->controls[i].fire);
>>            }
>> +        for (i = 0; i < pool->size ; i++) {
>> +            pthread_join(pool->controls[i].worker, NULL);
>> +        }
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_close(pool->controls[i].fire);
>>                sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>> index 8db61eaba..d62ca3da5 100644
>> --- a/lib/ovn-parallel-hmap.h
>> +++ b/lib/ovn-parallel-hmap.h
>> @@ -82,6 +82,7 @@ struct worker_control {
>>        struct ovs_mutex mutex; /* Guards the data. */
>>        void *data; /* Pointer to data to be processed. */
>>        void *workload; /* back-pointer to the worker pool structure. */
>> +    pthread_t worker;
>>    };
>>
>>    struct worker_pool {
>>
> I applied the above diff on top of patch 2  and did some tests.  I see
> a big improvement
> with this.  On my "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz"  server,
> I saw just one
> crash only once when I ran the test suite multiple times.
>
> On my work laptop (in which the tests used to hang earlier), all the
> tests are passing now.
> But I see a lot more consistent crashes here.  For all single run of
> whole testsuite (with make check -j5)
> I observed around 7 crashes.  Definitely an improvement when compared
> to my previous runs with v14.
>
> Here are the back traces details of the core dumps I observed -
> https://gist.github.com/numansiddique/5cab90ec4a1ee6e1adbfd3cd90eccf5a
>
> Crash 1 and Crash 2 are frequent.  Let me know in case you want the core files.

Not really. Traces are informative.

I have no idea why I cannot reproduce these, but I can see where 
(roughly) is the problem.

I can't see why (yet). The place where it crashes in 1 and 2 is the 
brute-force hash merge code which is trivial, runs on every iteration 
and has been tested quite thoroughly.

I will look at it later today.

Brgds,

>
> Thanks
> Numan
>
>>> All the test cases passed for me. So maybe something's wrong when
>>> ovn-northd exits.
>>> IMHO, these crashes should be addressed before these patches can be considered.
>>>
>>> Thanks
>>> Numan
>>>
>>>> ---
>>>>    lib/automake.mk         |   2 +
>>>>    lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>>>>    lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>>>>    3 files changed, 758 insertions(+)
>>>>    create mode 100644 lib/ovn-parallel-hmap.c
>>>>    create mode 100644 lib/ovn-parallel-hmap.h
>>>>
>>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>>> index 250c7aefa..781be2109 100644
>>>> --- a/lib/automake.mk
>>>> +++ b/lib/automake.mk
>>>> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>>>>           lib/expr.c \
>>>>           lib/extend-table.h \
>>>>           lib/extend-table.c \
>>>> +       lib/ovn-parallel-hmap.h \
>>>> +       lib/ovn-parallel-hmap.c \
>>>>           lib/ip-mcast-index.c \
>>>>           lib/ip-mcast-index.h \
>>>>           lib/mcast-group-index.c \
>>>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>>>> new file mode 100644
>>>> index 000000000..e83ae23cb
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.c
>>>> @@ -0,0 +1,455 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#include <config.h>
>>>> +#include <stdint.h>
>>>> +#include <string.h>
>>>> +#include <stdlib.h>
>>>> +#include <fcntl.h>
>>>> +#include <unistd.h>
>>>> +#include <errno.h>
>>>> +#include <semaphore.h>
>>>> +#include "fatal-signal.h"
>>>> +#include "util.h"
>>>> +#include "openvswitch/vlog.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovn-parallel-hmap.h"
>>>> +#include "ovs-atomic.h"
>>>> +#include "ovs-thread.h"
>>>> +#include "ovs-numa.h"
>>>> +#include "random.h"
>>>> +
>>>> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
>>>> +
>>>> +#ifndef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#define WORKER_SEM_NAME "%x-%p-%x"
>>>> +#define MAIN_SEM_NAME "%x-%p-main"
>>>> +
>>>> +/* These are accessed under mutex inside add_worker_pool().
>>>> + * They do not need to be atomic.
>>>> + */
>>>> +
>>>> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
>>>> +static bool can_parallelize = false;
>>>> +
>>>> +/* This is set only in the process of exit and the set is
>>>> + * accompanied by a fence. It does not need to be atomic or be
>>>> + * accessed under a lock.
>>>> + */
>>>> +
>>>> +static bool workers_must_exit = false;
>>>> +
>>>> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
>>>> +
>>>> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
>>>> +
>>>> +static int pool_size;
>>>> +
>>>> +static int sembase;
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED);
>>>> +static void setup_worker_pools(bool force);
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +
>>>> +bool ovn_stop_parallel_processing(void)
>>>> +{
>>>> +    return workers_must_exit;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel)
>>>> +{
>>>> +    bool test = false;
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(force_parallel);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +    return can_parallelize;
>>>> +}
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>>> +
>>>> +    struct worker_pool *new_pool = NULL;
>>>> +    struct worker_control *new_control;
>>>> +    bool test = false;
>>>> +    int i;
>>>> +    char sem_name[256];
>>>> +
>>>> +
>>>> +    /* Belt and braces - initialize the pool system just in case if
>>>> +     * if it is not yet initialized.
>>>> +     */
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(false);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +
>>>> +    ovs_mutex_lock(&init_mutex);
>>>> +    if (can_parallelize) {
>>>> +        new_pool = xmalloc(sizeof(struct worker_pool));
>>>> +        new_pool->size = pool_size;
>>>> +        new_pool->controls = NULL;
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +        if (new_pool->done == SEM_FAILED) {
>>>> +            goto cleanup;
>>>> +        }
>>>> +
>>>> +        new_pool->controls =
>>>> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
>>>> +
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            new_control = &new_pool->controls[i];
>>>> +            new_control->id = i;
>>>> +            new_control->done = new_pool->done;
>>>> +            new_control->data = NULL;
>>>> +            ovs_mutex_init(&new_control->mutex);
>>>> +            new_control->finished = ATOMIC_VAR_INIT(false);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +            if (new_control->fire == SEM_FAILED) {
>>>> +                goto cleanup;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        for (i = 0; i < pool_size; i++) {
>>>> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>>> +        }
>>>> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return new_pool;
>>>> +cleanup:
>>>> +
>>>> +    /* Something went wrong when opening semaphores. In this case
>>>> +     * it is better to shut off parallel procesing altogether
>>>> +     */
>>>> +
>>>> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
>>>> +    can_parallelize = false;
>>>> +    if (new_pool->controls) {
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            if (new_pool->controls[i].fire != SEM_FAILED) {
>>>> +                sem_close(new_pool->controls[i].fire);
>>>> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +                sem_unlink(sem_name);
>>>> +                break; /* semaphores past this one are uninitialized */
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    if (new_pool->done != SEM_FAILED) {
>>>> +        sem_close(new_pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table with mask N. */
>>>> +void
>>>> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
>>>> +    hmap->one = NULL;
>>>> +    hmap->mask = mask;
>>>> +    hmap->n = 0;
>>>> +    for (i = 0; i <= hmap->mask; i++) {
>>>> +        hmap->buckets[i] = NULL;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table of size X.
>>>> + * Intended for use in parallel processing so that all
>>>> + * fragments used to store results in a parallel job
>>>> + * are the same size.
>>>> + */
>>>> +void
>>>> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
>>>> +{
>>>> +    size_t mask;
>>>> +    mask = size / 2;
>>>> +    mask |= mask >> 1;
>>>> +    mask |= mask >> 2;
>>>> +    mask |= mask >> 4;
>>>> +    mask |= mask >> 8;
>>>> +    mask |= mask >> 16;
>>>> +#if SIZE_MAX > UINT32_MAX
>>>> +    mask |= mask >> 32;
>>>> +#endif
>>>> +
>>>> +    /* If we need to dynamically allocate buckets we might as well allocate at
>>>> +     * least 4 of them. */
>>>> +    mask |= (mask & 1) << 1;
>>>> +
>>>> +    fast_hmap_init(hmap, mask);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which uses a callback function to process results
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool,
>>>> +                           void *fin_result, void *result_frags,
>>>> +                           void (*helper_func)(struct worker_pool *pool,
>>>> +                                               void *fin_result,
>>>> +                                               void *result_frags, int index))
>>>> +{
>>>> +    int index, completed;
>>>> +
>>>> +    /* Ensure that all worker threads see the same data as the
>>>> +     * main thread.
>>>> +     */
>>>> +
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Start workers */
>>>> +
>>>> +    for (index = 0; index < pool->size; index++) {
>>>> +        sem_post(pool->controls[index].fire);
>>>> +    }
>>>> +
>>>> +    completed = 0;
>>>> +
>>>> +    do {
>>>> +        bool test;
>>>> +        /* Note - we do not loop on semaphore until it reaches
>>>> +         * zero, but on pool size/remaining workers.
>>>> +         * This is by design. If the inner loop can handle
>>>> +         * completion for more than one worker within an iteration
>>>> +         * it will do so to ensure no additional iterations and
>>>> +         * waits once all of them are done.
>>>> +         *
>>>> +         * This may result in us having an initial positive value
>>>> +         * of the semaphore when the pool is invoked the next time.
>>>> +         * This is harmless - the loop will spin up a couple of times
>>>> +         * doing nothing while the workers are processing their data
>>>> +         * slices.
>>>> +         */
>>>> +        wait_for_work_completion(pool);
>>>> +        for (index = 0; index < pool->size; index++) {
>>>> +            test = true;
>>>> +            /* If the worker has marked its data chunk as complete,
>>>> +             * invoke the helper function to combine the results of
>>>> +             * this worker into the main result.
>>>> +             *
>>>> +             * The worker must invoke an appropriate memory fence
>>>> +             * (most likely acq_rel) to ensure that the main thread
>>>> +             * sees all of the results produced by the worker.
>>>> +             */
>>>> +            if (atomic_compare_exchange_weak(
>>>> +                    &pool->controls[index].finished,
>>>> +                    &test,
>>>> +                    false)) {
>>>> +                if (helper_func) {
>>>> +                    (helper_func)(pool, fin_result, result_frags, index);
>>>> +                }
>>>> +                completed++;
>>>> +                pool->controls[index].data = NULL;
>>>> +            }
>>>> +        }
>>>> +    } while (completed < pool->size);
>>>> +}
>>>> +
>>>> +/* Run a thread pool - basic, does not do results processing.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool)
>>>> +{
>>>> +    run_pool_callback(pool, NULL, NULL, NULL);
>>>> +}
>>>> +
>>>> +/* Brute force merge of a hashmap into another hashmap.
>>>> + * Intended for use in parallel processing. The destination
>>>> + * hashmap MUST be the same size as the one being merged.
>>>> + *
>>>> + * This can be achieved by pre-allocating them to correct size
>>>> + * and using hmap_insert_fast() instead of hmap_insert()
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    ovs_assert(inc->mask == dest->mask);
>>>> +
>>>> +    if (!inc->n) {
>>>> +        /* Request to merge an empty frag, nothing to do */
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i <= dest->mask; i++) {
>>>> +        struct hmap_node **dest_bucket = &dest->buckets[i];
>>>> +        struct hmap_node **inc_bucket = &inc->buckets[i];
>>>> +        if (*inc_bucket != NULL) {
>>>> +            struct hmap_node *last_node = *inc_bucket;
>>>> +            while (last_node->next != NULL) {
>>>> +                last_node = last_node->next;
>>>> +            }
>>>> +            last_node->next = *dest_bucket;
>>>> +            *dest_bucket = *inc_bucket;
>>>> +            *inc_bucket = NULL;
>>>> +        }
>>>> +    }
>>>> +    dest->n += inc->n;
>>>> +    inc->n = 0;
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array
>>>> + * of hashes. Merge results.
>>>> + */
>>>> +
>>>> +
>>>> +void ovn_run_pool_hash(
>>>> +        struct worker_pool *pool,
>>>> +        struct hmap *result,
>>>> +        struct hmap *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array of lists.
>>>> + * Merge results.
>>>> + */
>>>> +void ovn_run_pool_list(
>>>> +        struct worker_pool *pool,
>>>> +        struct ovs_list *result,
>>>> +        struct ovs_list *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_list_results);
>>>> +}
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
>>>> +{
>>>> +    int i;
>>>> +    if (hrl->mask != lflows->mask) {
>>>> +        if (hrl->row_locks) {
>>>> +            free(hrl->row_locks);
>>>> +        }
>>>> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
>>>> +        hrl->mask = lflows->mask;
>>>> +        for (i = 0; i <= lflows->mask; i++) {
>>>> +            ovs_mutex_init(&hrl->row_locks[i]);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED) {
>>>> +    int i;
>>>> +    static struct worker_pool *pool;
>>>> +    char sem_name[256];
>>>> +
>>>> +    workers_must_exit = true;
>>>> +
>>>> +    /* All workers must honour the must_exit flag and check for it regularly.
>>>> +     * We can make it atomic and check it via atomics in workers, but that
>>>> +     * is not really necessary as it is set just once - when the program
>>>> +     * terminates. So we use a fence which is invoked before exiting instead.
>>>> +     */
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Wake up the workers after the must_exit flag has been set */
>>>> +
>>>> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_post(pool->controls[i].fire);
>>>> +        }
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_close(pool->controls[i].fire);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>>>> +            sem_unlink(sem_name);
>>>> +        }
>>>> +        sem_close(pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void setup_worker_pools(bool force) {
>>>> +    int cores, nodes;
>>>> +
>>>> +    nodes = ovs_numa_get_n_numas();
>>>> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>>>> +        nodes = 1;
>>>> +    }
>>>> +    cores = ovs_numa_get_n_cores();
>>>> +
>>>> +    /* If there is no NUMA config, use 4 cores.
>>>> +     * If there is NUMA config use half the cores on
>>>> +     * one node so that the OS does not start pushing
>>>> +     * threads to other nodes.
>>>> +     */
>>>> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
>>>> +        /* If there is no NUMA we can try the ovs-threads routine.
>>>> +         * It falls back to sysconf and/or affinity mask.
>>>> +         */
>>>> +        cores = count_cpu_cores();
>>>> +        pool_size = cores;
>>>> +    } else {
>>>> +        pool_size = cores / nodes;
>>>> +    }
>>>> +    if ((pool_size < 4) && force) {
>>>> +        pool_size = 4;
>>>> +    }
>>>> +    can_parallelize = (pool_size >= 3);
>>>> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
>>>> +    sembase = random_uint32();
>>>> +}
>>>> +
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct ovs_list *result = (struct ovs_list *)fin_result;
>>>> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
>>>> +
>>>> +    if (!ovs_list_is_empty(&res_frags[index])) {
>>>> +        ovs_list_splice(result->next,
>>>> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct hmap *result = (struct hmap *)fin_result;
>>>> +    struct hmap *res_frags = (struct hmap *)result_frags;
>>>> +
>>>> +    fast_hmap_merge(result, &res_frags[index]);
>>>> +    hmap_destroy(&res_frags[index]);
>>>> +}
>>>> +
>>>> +#endif
>>>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>>>> new file mode 100644
>>>> index 000000000..8db61eaba
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.h
>>>> @@ -0,0 +1,301 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#ifndef OVN_PARALLEL_HMAP
>>>> +#define OVN_PARALLEL_HMAP 1
>>>> +
>>>> +/* if the parallel macros are defined by hmap.h or any other ovs define
>>>> + * we skip over the ovn specific definitions.
>>>> + */
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdlib.h>
>>>> +#include <semaphore.h>
>>>> +#include <errno.h>
>>>> +#include "openvswitch/util.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovs-atomic.h"
>>>> +
>>>> +/* Process this include only if OVS does not supply parallel definitions
>>>> + */
>>>> +
>>>> +#ifdef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#include "parallel-hmap.h"
>>>> +
>>>> +#else
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic push
>>>> +#pragma clang diagnostic ignored "-Wthread-safety"
>>>> +#endif
>>>> +
>>>> +
>>>> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
>>>> + * of parallel processing.
>>>> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
>>>> + * and will iterate hash buckets ThreadID, ThreadID + step,
>>>> + * ThreadID + step * 2, etc. The actual macro accepts
>>>> + * ThreadID + step * i as the JOBID parameter.
>>>> + */
>>>> +
>>>> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
>>>> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
>>>> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
>>>> +       || ((NODE = NULL), false); \
>>>> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
>>>> +
>>>> +/* We do not have a SAFE version of the macro, because the hash size is not
>>>> + * atomic and hash removal operations would need to be wrapped with
>>>> + * locks. This will defeat most of the benefits from doing anything in
>>>> + * parallel.
>>>> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
>>>> + * each thread should store them in a temporary list result instead, merging
>>>> + * the lists into a combined result at the end */
>>>> +
>>>> +/* Work "Handle" */
>>>> +
>>>> +struct worker_control {
>>>> +    int id; /* Used as a modulo when iterating over a hash. */
>>>> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
>>>> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
>>>> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
>>>> +    struct ovs_mutex mutex; /* Guards the data. */
>>>> +    void *data; /* Pointer to data to be processed. */
>>>> +    void *workload; /* back-pointer to the worker pool structure. */
>>>> +};
>>>> +
>>>> +struct worker_pool {
>>>> +    int size;   /* Number of threads in the pool. */
>>>> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
>>>> +    struct worker_control *controls; /* "Handles" in this pool. */
>>>> +    sem_t *done; /* Work completion semaphorew. */
>>>> +};
>>>> +
>>>> +/* Add a worker pool for thread function start() which expects a pointer to
>>>> + * a worker_control structure as an argument. */
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
>>>> +
>>>> +/* Setting this to true will make all processing threads exit */
>>>> +
>>>> +bool ovn_stop_parallel_processing(void);
>>>> +
>>>> +/* Build a hmap pre-sized for size elements */
>>>> +
>>>> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
>>>> +
>>>> +/* Build a hmap with a mask equals to size */
>>>> +
>>>> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
>>>> +
>>>> +/* Brute-force merge a hmap into hmap.
>>>> + * Dest and inc have to have the same mask. The merge is performed
>>>> + * by extending the element list for bucket N in the dest hmap with the list
>>>> + * from bucket N in inc.
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
>>>> +
>>>> +/* Run a pool, without any default processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool);
>>>> +
>>>> +/* Run a pool, merge results from hash frags into a final hash result.
>>>> + * The hash frags must be pre-sized to the same size.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_hash(struct worker_pool *pool,
>>>> +                       struct hmap *result, struct hmap *result_frags);
>>>> +/* Run a pool, merge results from list frags into a final list result.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_list(struct worker_pool *pool,
>>>> +                       struct ovs_list *result, struct ovs_list *result_frags);
>>>> +
>>>> +/* Run a pool, call a callback function to perform processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
>>>> +                    void *result_frags,
>>>> +                    void (*helper_func)(struct worker_pool *pool,
>>>> +                        void *fin_result, void *result_frags, int index));
>>>> +
>>>> +
>>>> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
>>>> + * would land, or a null pointer if that bucket is empty. */
>>>> +
>>>> +static inline struct hmap_node *
>>>> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
>>>> +{
>>>> +    return hmap->buckets[num];
>>>> +}
>>>> +
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
>>>> +{
>>>> +    size_t i;
>>>> +    for (i = start; i <= hmap->mask; i+= pool_size) {
>>>> +        struct hmap_node *node = hmap->buckets[i];
>>>> +        if (node) {
>>>> +            return node;
>>>> +        }
>>>> +    }
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +/* Returns the first node in 'hmap', as expected by thread with job_id
>>>> + * for parallel processing in arbitrary order, or a null pointer if
>>>> + * the slice of 'hmap' for that job_id is empty. */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
>>>> +{
>>>> +    return parallel_hmap_next__(hmap, job_id, pool_size);
>>>> +}
>>>> +
>>>> +/* Returns the next node in the slice of 'hmap' following 'node',
>>>> + * in arbitrary order, or a * null pointer if 'node' is the last node in
>>>> + * the 'hmap' slice.
>>>> + *
>>>> + */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next(const struct hmap *hmap,
>>>> +                   const struct hmap_node *node, ssize_t pool_size)
>>>> +{
>>>> +    return (node->next
>>>> +            ? node->next
>>>> +            : parallel_hmap_next__(hmap,
>>>> +                (node->hash & hmap->mask) + pool_size, pool_size));
>>>> +}
>>>> +
>>>> +static inline void post_completed_work(struct worker_control *control)
>>>> +{
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +    atomic_store_relaxed(&control->finished, true);
>>>> +    sem_post(control->done);
>>>> +}
>>>> +
>>>> +static inline void wait_for_work(struct worker_control *control)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(control->fire);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +static inline void wait_for_work_completion(struct worker_pool *pool)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(pool->done);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +
>>>> +
>>>> +/* Hash per-row locking support - to be used only in conjunction
>>>> + * with fast hash inserts. Normal hash inserts may resize the hash
>>>> + * rendering the locking invalid.
>>>> + */
>>>> +
>>>> +struct hashrow_locks {
>>>> +    ssize_t mask;
>>>> +    struct ovs_mutex *row_locks;
>>>> +};
>>>> +
>>>> +/* Update an hash row locks structure to match the current hash size */
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
>>>> +
>>>> +/* Lock a hash row */
>>>> +
>>>> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +
>>>> +/* Unlock a hash row */
>>>> +
>>>> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +/* Init the row locks structure */
>>>> +
>>>> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
>>>> +{
>>>> +    hrl->mask = 0;
>>>> +    hrl->row_locks = NULL;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel);
>>>> +
>>>> +/* Use the OVN library functions for stuff which OVS has not defined
>>>> + * If OVS has defined these, they will still compile using the OVN
>>>> + * local names, but will be dropped by the linker in favour of the OVS
>>>> + * supplied functions.
>>>> + */
>>>> +
>>>> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
>>>> +
>>>> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
>>>> +
>>>> +#define stop_parallel_processing() ovn_stop_parallel_processing()
>>>> +
>>>> +#define add_worker_pool(start) ovn_add_worker_pool(start)
>>>> +
>>>> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>>>> +
>>>> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
>>>> +
>>>> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
>>>> +
>>>> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
>>>> +
>>>> +#define ovn_run_pool(pool) ovn_run_pool(pool)
>>>> +
>>>> +#define run_pool_hash(pool, result, result_frags) \
>>>> +    ovn_run_pool_hash(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_list(pool, result, result_frags) \
>>>> +    ovn_run_pool_list(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
>>>> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
>>>> +
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic pop
>>>> +#endif
>>>> +
>>>> +#endif
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +
>>>> +#endif /* lib/fasthmap.h */
>>>> --
>>>> 2.20.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661
>> https://www.cambridgegreys.com/
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Anton Ivanov March 26, 2021, 2:21 p.m. UTC | #6
On 26/03/2021 03:25, Numan Siddique wrote:
> On Thu, Mar 25, 2021 at 3:01 PM Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>>
>>
>> On 24/03/2021 15:31, Numan Siddique wrote:
>>> On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>>>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>>
>>>> This adds a set of functions and macros intended to process
>>>> hashes in parallel.
>>>>
>>>> The principles of operation are documented in the ovn-parallel-hmap.h
>>>>
>>>> If these one day go into the OVS tree, the OVS tree versions
>>>> would be used in preference.
>>>>
>>>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>> Hi Anton,
>>>
>>> I tested the first 2 patches of this series and it crashes again for me.
>>>
>>> This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
>>> E3-1220 v5 @ 3.00GHz
>>>
>>> The below trace is seen for both gcc and clang.
>>>
>>> ----
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `ovn-northd -vjsonrpc
>>> --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
>>> Missing separate debuginfos, use: dnf debuginfo-install
>>> glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
>>> libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
>>> python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
>>> zlib-1.2.11-23.fc33.x86_64
>>> (gdb) bt
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
>>> ../lib/ovn-parallel-hmap.h:203
>>> #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
>>> #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
>>> ../lib/ovs-thread.c:383
>>> #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
>>> #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
>>> -----
>>>
>>> I'm not sure why you're not able to reproduce this issue.
>> I can't. I have run it for days in a loop.
>>
>> One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.
>>
>> There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.
>>
>> Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.
>>
>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>> index e83ae23cb..3597f896f 100644
>> --- a/lib/ovn-parallel-hmap.c
>> +++ b/lib/ovn-parallel-hmap.c
>> @@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>            }
>>
>>            for (i = 0; i < pool_size; i++) {
>> -            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>> +            new_pool->controls[i].worker =
>> +                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>            }
>>            ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>        }
>> @@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_post(pool->controls[i].fire);
>>            }
>> +        for (i = 0; i < pool->size ; i++) {
>> +            pthread_join(pool->controls[i].worker, NULL);
>> +        }
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_close(pool->controls[i].fire);
>>                sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>> index 8db61eaba..d62ca3da5 100644
>> --- a/lib/ovn-parallel-hmap.h
>> +++ b/lib/ovn-parallel-hmap.h
>> @@ -82,6 +82,7 @@ struct worker_control {
>>        struct ovs_mutex mutex; /* Guards the data. */
>>        void *data; /* Pointer to data to be processed. */
>>        void *workload; /* back-pointer to the worker pool structure. */
>> +    pthread_t worker;
>>    };
>>
>>    struct worker_pool {
>>
> I applied the above diff on top of patch 2  and did some tests.  I see
> a big improvement
> with this.  On my "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz"  server,
> I saw just one
> crash only once when I ran the test suite multiple times.
>
> On my work laptop (in which the tests used to hang earlier), all the
> tests are passing now.
> But I see a lot more consistent crashes here.  For all single run of
> whole testsuite (with make check -j5)
> I observed around 7 crashes.  Definitely an improvement when compared
> to my previous runs with v14.
>
> Here are the back traces details of the core dumps I observed -
> https://gist.github.com/numansiddique/5cab90ec4a1ee6e1adbfd3cd90eccf5a
>
> Crash 1 and Crash 2 are frequent.  Let me know in case you want the core files.

All crashes are "indicator" crashes. There is nothing wrong with the block of code which crashes - f.e. hash merge, just at that point there is already some memory corruption.  The best example is f.e. trace 3 which is outside the parallel code.

The problem is elsewhere - one of the parallel "work" chunks needs a lock somewhere. A contended resource is being modified without locking resulting in corruption.

It is something very rare too as I am not able to trigger it even if I loop the test-suite for a few hours.

When I did the initial audit for parallelization I found only one point of contention for the dp-less case - a mcast group change in port unicast flow buildout. It looks like there is one or more I have missed. I am going through all of the steps at the moment and trying to find it.

Brgds,

A.

>
> Thanks
> Numan
>
>>> All the test cases passed for me. So maybe something's wrong when
>>> ovn-northd exits.
>>> IMHO, these crashes should be addressed before these patches can be considered.
>>>
>>> Thanks
>>> Numan
>>>
>>>> ---
>>>>    lib/automake.mk         |   2 +
>>>>    lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>>>>    lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>>>>    3 files changed, 758 insertions(+)
>>>>    create mode 100644 lib/ovn-parallel-hmap.c
>>>>    create mode 100644 lib/ovn-parallel-hmap.h
>>>>
>>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>>> index 250c7aefa..781be2109 100644
>>>> --- a/lib/automake.mk
>>>> +++ b/lib/automake.mk
>>>> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>>>>           lib/expr.c \
>>>>           lib/extend-table.h \
>>>>           lib/extend-table.c \
>>>> +       lib/ovn-parallel-hmap.h \
>>>> +       lib/ovn-parallel-hmap.c \
>>>>           lib/ip-mcast-index.c \
>>>>           lib/ip-mcast-index.h \
>>>>           lib/mcast-group-index.c \
>>>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>>>> new file mode 100644
>>>> index 000000000..e83ae23cb
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.c
>>>> @@ -0,0 +1,455 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#include <config.h>
>>>> +#include <stdint.h>
>>>> +#include <string.h>
>>>> +#include <stdlib.h>
>>>> +#include <fcntl.h>
>>>> +#include <unistd.h>
>>>> +#include <errno.h>
>>>> +#include <semaphore.h>
>>>> +#include "fatal-signal.h"
>>>> +#include "util.h"
>>>> +#include "openvswitch/vlog.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovn-parallel-hmap.h"
>>>> +#include "ovs-atomic.h"
>>>> +#include "ovs-thread.h"
>>>> +#include "ovs-numa.h"
>>>> +#include "random.h"
>>>> +
>>>> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
>>>> +
>>>> +#ifndef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#define WORKER_SEM_NAME "%x-%p-%x"
>>>> +#define MAIN_SEM_NAME "%x-%p-main"
>>>> +
>>>> +/* These are accessed under mutex inside add_worker_pool().
>>>> + * They do not need to be atomic.
>>>> + */
>>>> +
>>>> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
>>>> +static bool can_parallelize = false;
>>>> +
>>>> +/* This is set only in the process of exit and the set is
>>>> + * accompanied by a fence. It does not need to be atomic or be
>>>> + * accessed under a lock.
>>>> + */
>>>> +
>>>> +static bool workers_must_exit = false;
>>>> +
>>>> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
>>>> +
>>>> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
>>>> +
>>>> +static int pool_size;
>>>> +
>>>> +static int sembase;
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED);
>>>> +static void setup_worker_pools(bool force);
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +
>>>> +bool ovn_stop_parallel_processing(void)
>>>> +{
>>>> +    return workers_must_exit;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel)
>>>> +{
>>>> +    bool test = false;
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(force_parallel);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +    return can_parallelize;
>>>> +}
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>>> +
>>>> +    struct worker_pool *new_pool = NULL;
>>>> +    struct worker_control *new_control;
>>>> +    bool test = false;
>>>> +    int i;
>>>> +    char sem_name[256];
>>>> +
>>>> +
>>>> +    /* Belt and braces - initialize the pool system just in case if
>>>> +     * if it is not yet initialized.
>>>> +     */
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(false);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +
>>>> +    ovs_mutex_lock(&init_mutex);
>>>> +    if (can_parallelize) {
>>>> +        new_pool = xmalloc(sizeof(struct worker_pool));
>>>> +        new_pool->size = pool_size;
>>>> +        new_pool->controls = NULL;
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +        if (new_pool->done == SEM_FAILED) {
>>>> +            goto cleanup;
>>>> +        }
>>>> +
>>>> +        new_pool->controls =
>>>> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
>>>> +
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            new_control = &new_pool->controls[i];
>>>> +            new_control->id = i;
>>>> +            new_control->done = new_pool->done;
>>>> +            new_control->data = NULL;
>>>> +            ovs_mutex_init(&new_control->mutex);
>>>> +            new_control->finished = ATOMIC_VAR_INIT(false);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +            if (new_control->fire == SEM_FAILED) {
>>>> +                goto cleanup;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        for (i = 0; i < pool_size; i++) {
>>>> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>>> +        }
>>>> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return new_pool;
>>>> +cleanup:
>>>> +
>>>> +    /* Something went wrong when opening semaphores. In this case
>>>> +     * it is better to shut off parallel procesing altogether
>>>> +     */
>>>> +
>>>> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
>>>> +    can_parallelize = false;
>>>> +    if (new_pool->controls) {
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            if (new_pool->controls[i].fire != SEM_FAILED) {
>>>> +                sem_close(new_pool->controls[i].fire);
>>>> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +                sem_unlink(sem_name);
>>>> +                break; /* semaphores past this one are uninitialized */
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    if (new_pool->done != SEM_FAILED) {
>>>> +        sem_close(new_pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table with mask N. */
>>>> +void
>>>> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
>>>> +    hmap->one = NULL;
>>>> +    hmap->mask = mask;
>>>> +    hmap->n = 0;
>>>> +    for (i = 0; i <= hmap->mask; i++) {
>>>> +        hmap->buckets[i] = NULL;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table of size X.
>>>> + * Intended for use in parallel processing so that all
>>>> + * fragments used to store results in a parallel job
>>>> + * are the same size.
>>>> + */
>>>> +void
>>>> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
>>>> +{
>>>> +    size_t mask;
>>>> +    mask = size / 2;
>>>> +    mask |= mask >> 1;
>>>> +    mask |= mask >> 2;
>>>> +    mask |= mask >> 4;
>>>> +    mask |= mask >> 8;
>>>> +    mask |= mask >> 16;
>>>> +#if SIZE_MAX > UINT32_MAX
>>>> +    mask |= mask >> 32;
>>>> +#endif
>>>> +
>>>> +    /* If we need to dynamically allocate buckets we might as well allocate at
>>>> +     * least 4 of them. */
>>>> +    mask |= (mask & 1) << 1;
>>>> +
>>>> +    fast_hmap_init(hmap, mask);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which uses a callback function to process results
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool,
>>>> +                           void *fin_result, void *result_frags,
>>>> +                           void (*helper_func)(struct worker_pool *pool,
>>>> +                                               void *fin_result,
>>>> +                                               void *result_frags, int index))
>>>> +{
>>>> +    int index, completed;
>>>> +
>>>> +    /* Ensure that all worker threads see the same data as the
>>>> +     * main thread.
>>>> +     */
>>>> +
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Start workers */
>>>> +
>>>> +    for (index = 0; index < pool->size; index++) {
>>>> +        sem_post(pool->controls[index].fire);
>>>> +    }
>>>> +
>>>> +    completed = 0;
>>>> +
>>>> +    do {
>>>> +        bool test;
>>>> +        /* Note - we do not loop on semaphore until it reaches
>>>> +         * zero, but on pool size/remaining workers.
>>>> +         * This is by design. If the inner loop can handle
>>>> +         * completion for more than one worker within an iteration
>>>> +         * it will do so to ensure no additional iterations and
>>>> +         * waits once all of them are done.
>>>> +         *
>>>> +         * This may result in us having an initial positive value
>>>> +         * of the semaphore when the pool is invoked the next time.
>>>> +         * This is harmless - the loop will spin up a couple of times
>>>> +         * doing nothing while the workers are processing their data
>>>> +         * slices.
>>>> +         */
>>>> +        wait_for_work_completion(pool);
>>>> +        for (index = 0; index < pool->size; index++) {
>>>> +            test = true;
>>>> +            /* If the worker has marked its data chunk as complete,
>>>> +             * invoke the helper function to combine the results of
>>>> +             * this worker into the main result.
>>>> +             *
>>>> +             * The worker must invoke an appropriate memory fence
>>>> +             * (most likely acq_rel) to ensure that the main thread
>>>> +             * sees all of the results produced by the worker.
>>>> +             */
>>>> +            if (atomic_compare_exchange_weak(
>>>> +                    &pool->controls[index].finished,
>>>> +                    &test,
>>>> +                    false)) {
>>>> +                if (helper_func) {
>>>> +                    (helper_func)(pool, fin_result, result_frags, index);
>>>> +                }
>>>> +                completed++;
>>>> +                pool->controls[index].data = NULL;
>>>> +            }
>>>> +        }
>>>> +    } while (completed < pool->size);
>>>> +}
>>>> +
>>>> +/* Run a thread pool - basic, does not do results processing.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool)
>>>> +{
>>>> +    run_pool_callback(pool, NULL, NULL, NULL);
>>>> +}
>>>> +
>>>> +/* Brute force merge of a hashmap into another hashmap.
>>>> + * Intended for use in parallel processing. The destination
>>>> + * hashmap MUST be the same size as the one being merged.
>>>> + *
>>>> + * This can be achieved by pre-allocating them to correct size
>>>> + * and using hmap_insert_fast() instead of hmap_insert()
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    ovs_assert(inc->mask == dest->mask);
>>>> +
>>>> +    if (!inc->n) {
>>>> +        /* Request to merge an empty frag, nothing to do */
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i <= dest->mask; i++) {
>>>> +        struct hmap_node **dest_bucket = &dest->buckets[i];
>>>> +        struct hmap_node **inc_bucket = &inc->buckets[i];
>>>> +        if (*inc_bucket != NULL) {
>>>> +            struct hmap_node *last_node = *inc_bucket;
>>>> +            while (last_node->next != NULL) {
>>>> +                last_node = last_node->next;
>>>> +            }
>>>> +            last_node->next = *dest_bucket;
>>>> +            *dest_bucket = *inc_bucket;
>>>> +            *inc_bucket = NULL;
>>>> +        }
>>>> +    }
>>>> +    dest->n += inc->n;
>>>> +    inc->n = 0;
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array
>>>> + * of hashes. Merge results.
>>>> + */
>>>> +
>>>> +
>>>> +void ovn_run_pool_hash(
>>>> +        struct worker_pool *pool,
>>>> +        struct hmap *result,
>>>> +        struct hmap *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array of lists.
>>>> + * Merge results.
>>>> + */
>>>> +void ovn_run_pool_list(
>>>> +        struct worker_pool *pool,
>>>> +        struct ovs_list *result,
>>>> +        struct ovs_list *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_list_results);
>>>> +}
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
>>>> +{
>>>> +    int i;
>>>> +    if (hrl->mask != lflows->mask) {
>>>> +        if (hrl->row_locks) {
>>>> +            free(hrl->row_locks);
>>>> +        }
>>>> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
>>>> +        hrl->mask = lflows->mask;
>>>> +        for (i = 0; i <= lflows->mask; i++) {
>>>> +            ovs_mutex_init(&hrl->row_locks[i]);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED) {
>>>> +    int i;
>>>> +    static struct worker_pool *pool;
>>>> +    char sem_name[256];
>>>> +
>>>> +    workers_must_exit = true;
>>>> +
>>>> +    /* All workers must honour the must_exit flag and check for it regularly.
>>>> +     * We can make it atomic and check it via atomics in workers, but that
>>>> +     * is not really necessary as it is set just once - when the program
>>>> +     * terminates. So we use a fence which is invoked before exiting instead.
>>>> +     */
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Wake up the workers after the must_exit flag has been set */
>>>> +
>>>> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_post(pool->controls[i].fire);
>>>> +        }
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_close(pool->controls[i].fire);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>>>> +            sem_unlink(sem_name);
>>>> +        }
>>>> +        sem_close(pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void setup_worker_pools(bool force) {
>>>> +    int cores, nodes;
>>>> +
>>>> +    nodes = ovs_numa_get_n_numas();
>>>> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>>>> +        nodes = 1;
>>>> +    }
>>>> +    cores = ovs_numa_get_n_cores();
>>>> +
>>>> +    /* If there is no NUMA config, use 4 cores.
>>>> +     * If there is NUMA config use half the cores on
>>>> +     * one node so that the OS does not start pushing
>>>> +     * threads to other nodes.
>>>> +     */
>>>> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
>>>> +        /* If there is no NUMA we can try the ovs-threads routine.
>>>> +         * It falls back to sysconf and/or affinity mask.
>>>> +         */
>>>> +        cores = count_cpu_cores();
>>>> +        pool_size = cores;
>>>> +    } else {
>>>> +        pool_size = cores / nodes;
>>>> +    }
>>>> +    if ((pool_size < 4) && force) {
>>>> +        pool_size = 4;
>>>> +    }
>>>> +    can_parallelize = (pool_size >= 3);
>>>> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
>>>> +    sembase = random_uint32();
>>>> +}
>>>> +
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct ovs_list *result = (struct ovs_list *)fin_result;
>>>> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
>>>> +
>>>> +    if (!ovs_list_is_empty(&res_frags[index])) {
>>>> +        ovs_list_splice(result->next,
>>>> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct hmap *result = (struct hmap *)fin_result;
>>>> +    struct hmap *res_frags = (struct hmap *)result_frags;
>>>> +
>>>> +    fast_hmap_merge(result, &res_frags[index]);
>>>> +    hmap_destroy(&res_frags[index]);
>>>> +}
>>>> +
>>>> +#endif
>>>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>>>> new file mode 100644
>>>> index 000000000..8db61eaba
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.h
>>>> @@ -0,0 +1,301 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#ifndef OVN_PARALLEL_HMAP
>>>> +#define OVN_PARALLEL_HMAP 1
>>>> +
>>>> +/* if the parallel macros are defined by hmap.h or any other ovs define
>>>> + * we skip over the ovn specific definitions.
>>>> + */
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdlib.h>
>>>> +#include <semaphore.h>
>>>> +#include <errno.h>
>>>> +#include "openvswitch/util.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovs-atomic.h"
>>>> +
>>>> +/* Process this include only if OVS does not supply parallel definitions
>>>> + */
>>>> +
>>>> +#ifdef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#include "parallel-hmap.h"
>>>> +
>>>> +#else
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic push
>>>> +#pragma clang diagnostic ignored "-Wthread-safety"
>>>> +#endif
>>>> +
>>>> +
>>>> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
>>>> + * of parallel processing.
>>>> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
>>>> + * and will iterate hash buckets ThreadID, ThreadID + step,
>>>> + * ThreadID + step * 2, etc. The actual macro accepts
>>>> + * ThreadID + step * i as the JOBID parameter.
>>>> + */
>>>> +
>>>> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
>>>> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
>>>> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
>>>> +       || ((NODE = NULL), false); \
>>>> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
>>>> +
>>>> +/* We do not have a SAFE version of the macro, because the hash size is not
>>>> + * atomic and hash removal operations would need to be wrapped with
>>>> + * locks. This will defeat most of the benefits from doing anything in
>>>> + * parallel.
>>>> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
>>>> + * each thread should store them in a temporary list result instead, merging
>>>> + * the lists into a combined result at the end */
>>>> +
>>>> +/* Work "Handle" */
>>>> +
>>>> +struct worker_control {
>>>> +    int id; /* Used as a modulo when iterating over a hash. */
>>>> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
>>>> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
>>>> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
>>>> +    struct ovs_mutex mutex; /* Guards the data. */
>>>> +    void *data; /* Pointer to data to be processed. */
>>>> +    void *workload; /* back-pointer to the worker pool structure. */
>>>> +};
>>>> +
>>>> +struct worker_pool {
>>>> +    int size;   /* Number of threads in the pool. */
>>>> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
>>>> +    struct worker_control *controls; /* "Handles" in this pool. */
>>>> +    sem_t *done; /* Work completion semaphorew. */
>>>> +};
>>>> +
>>>> +/* Add a worker pool for thread function start() which expects a pointer to
>>>> + * a worker_control structure as an argument. */
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
>>>> +
>>>> +/* Setting this to true will make all processing threads exit */
>>>> +
>>>> +bool ovn_stop_parallel_processing(void);
>>>> +
>>>> +/* Build a hmap pre-sized for size elements */
>>>> +
>>>> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
>>>> +
>>>> +/* Build a hmap with a mask equals to size */
>>>> +
>>>> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
>>>> +
>>>> +/* Brute-force merge a hmap into hmap.
>>>> + * Dest and inc have to have the same mask. The merge is performed
>>>> + * by extending the element list for bucket N in the dest hmap with the list
>>>> + * from bucket N in inc.
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
>>>> +
>>>> +/* Run a pool, without any default processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool);
>>>> +
>>>> +/* Run a pool, merge results from hash frags into a final hash result.
>>>> + * The hash frags must be pre-sized to the same size.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_hash(struct worker_pool *pool,
>>>> +                       struct hmap *result, struct hmap *result_frags);
>>>> +/* Run a pool, merge results from list frags into a final list result.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_list(struct worker_pool *pool,
>>>> +                       struct ovs_list *result, struct ovs_list *result_frags);
>>>> +
>>>> +/* Run a pool, call a callback function to perform processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
>>>> +                    void *result_frags,
>>>> +                    void (*helper_func)(struct worker_pool *pool,
>>>> +                        void *fin_result, void *result_frags, int index));
>>>> +
>>>> +
>>>> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
>>>> + * would land, or a null pointer if that bucket is empty. */
>>>> +
>>>> +static inline struct hmap_node *
>>>> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
>>>> +{
>>>> +    return hmap->buckets[num];
>>>> +}
>>>> +
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
>>>> +{
>>>> +    size_t i;
>>>> +    for (i = start; i <= hmap->mask; i+= pool_size) {
>>>> +        struct hmap_node *node = hmap->buckets[i];
>>>> +        if (node) {
>>>> +            return node;
>>>> +        }
>>>> +    }
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +/* Returns the first node in 'hmap', as expected by thread with job_id
>>>> + * for parallel processing in arbitrary order, or a null pointer if
>>>> + * the slice of 'hmap' for that job_id is empty. */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
>>>> +{
>>>> +    return parallel_hmap_next__(hmap, job_id, pool_size);
>>>> +}
>>>> +
>>>> +/* Returns the next node in the slice of 'hmap' following 'node',
>>>> + * in arbitrary order, or a * null pointer if 'node' is the last node in
>>>> + * the 'hmap' slice.
>>>> + *
>>>> + */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next(const struct hmap *hmap,
>>>> +                   const struct hmap_node *node, ssize_t pool_size)
>>>> +{
>>>> +    return (node->next
>>>> +            ? node->next
>>>> +            : parallel_hmap_next__(hmap,
>>>> +                (node->hash & hmap->mask) + pool_size, pool_size));
>>>> +}
>>>> +
>>>> +static inline void post_completed_work(struct worker_control *control)
>>>> +{
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +    atomic_store_relaxed(&control->finished, true);
>>>> +    sem_post(control->done);
>>>> +}
>>>> +
>>>> +static inline void wait_for_work(struct worker_control *control)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(control->fire);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +static inline void wait_for_work_completion(struct worker_pool *pool)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(pool->done);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +
>>>> +
>>>> +/* Hash per-row locking support - to be used only in conjunction
>>>> + * with fast hash inserts. Normal hash inserts may resize the hash
>>>> + * rendering the locking invalid.
>>>> + */
>>>> +
>>>> +struct hashrow_locks {
>>>> +    ssize_t mask;
>>>> +    struct ovs_mutex *row_locks;
>>>> +};
>>>> +
>>>> +/* Update an hash row locks structure to match the current hash size */
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
>>>> +
>>>> +/* Lock a hash row */
>>>> +
>>>> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +
>>>> +/* Unlock a hash row */
>>>> +
>>>> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +/* Init the row locks structure */
>>>> +
>>>> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
>>>> +{
>>>> +    hrl->mask = 0;
>>>> +    hrl->row_locks = NULL;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel);
>>>> +
>>>> +/* Use the OVN library functions for stuff which OVS has not defined
>>>> + * If OVS has defined these, they will still compile using the OVN
>>>> + * local names, but will be dropped by the linker in favour of the OVS
>>>> + * supplied functions.
>>>> + */
>>>> +
>>>> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
>>>> +
>>>> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
>>>> +
>>>> +#define stop_parallel_processing() ovn_stop_parallel_processing()
>>>> +
>>>> +#define add_worker_pool(start) ovn_add_worker_pool(start)
>>>> +
>>>> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>>>> +
>>>> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
>>>> +
>>>> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
>>>> +
>>>> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
>>>> +
>>>> +#define ovn_run_pool(pool) ovn_run_pool(pool)
>>>> +
>>>> +#define run_pool_hash(pool, result, result_frags) \
>>>> +    ovn_run_pool_hash(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_list(pool, result, result_frags) \
>>>> +    ovn_run_pool_list(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
>>>> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
>>>> +
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic pop
>>>> +#endif
>>>> +
>>>> +#endif
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +
>>>> +#endif /* lib/fasthmap.h */
>>>> --
>>>> 2.20.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661
>> https://www.cambridgegreys.com/
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Anton Ivanov March 26, 2021, 5:12 p.m. UTC | #7
On 26/03/2021 03:25, Numan Siddique wrote:
> On Thu, Mar 25, 2021 at 3:01 PM Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>>
>>
>> On 24/03/2021 15:31, Numan Siddique wrote:
>>> On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>>>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>>
>>>> This adds a set of functions and macros intended to process
>>>> hashes in parallel.
>>>>
>>>> The principles of operation are documented in the ovn-parallel-hmap.h
>>>>
>>>> If these one day go into the OVS tree, the OVS tree versions
>>>> would be used in preference.
>>>>
>>>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>> Hi Anton,
>>>
>>> I tested the first 2 patches of this series and it crashes again for me.
>>>
>>> This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
>>> E3-1220 v5 @ 3.00GHz
>>>
>>> The below trace is seen for both gcc and clang.
>>>
>>> ----
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `ovn-northd -vjsonrpc
>>> --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
>>> Missing separate debuginfos, use: dnf debuginfo-install
>>> glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
>>> libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
>>> python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
>>> zlib-1.2.11-23.fc33.x86_64
>>> (gdb) bt
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
>>> ../lib/ovn-parallel-hmap.h:203
>>> #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
>>> #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
>>> ../lib/ovs-thread.c:383
>>> #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
>>> #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
>>> -----
>>>
>>> I'm not sure why you're not able to reproduce this issue.
>> I can't. I have run it for days in a loop.
>>
>> One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.
>>
>> There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.
>>
>> Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.
>>
>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>> index e83ae23cb..3597f896f 100644
>> --- a/lib/ovn-parallel-hmap.c
>> +++ b/lib/ovn-parallel-hmap.c
>> @@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>            }
>>
>>            for (i = 0; i < pool_size; i++) {
>> -            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>> +            new_pool->controls[i].worker =
>> +                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>            }
>>            ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>        }
>> @@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_post(pool->controls[i].fire);
>>            }
>> +        for (i = 0; i < pool->size ; i++) {
>> +            pthread_join(pool->controls[i].worker, NULL);
>> +        }
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_close(pool->controls[i].fire);
>>                sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>> index 8db61eaba..d62ca3da5 100644
>> --- a/lib/ovn-parallel-hmap.h
>> +++ b/lib/ovn-parallel-hmap.h
>> @@ -82,6 +82,7 @@ struct worker_control {
>>        struct ovs_mutex mutex; /* Guards the data. */
>>        void *data; /* Pointer to data to be processed. */
>>        void *workload; /* back-pointer to the worker pool structure. */
>> +    pthread_t worker;
>>    };
>>
>>    struct worker_pool {
>>
> I applied the above diff on top of patch 2  and did some tests.  I see
> a big improvement
> with this.  On my "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz"  server,
> I saw just one
> crash only once when I ran the test suite multiple times.
>
> On my work laptop (in which the tests used to hang earlier), all the
> tests are passing now.
> But I see a lot more consistent crashes here.  For all single run of
> whole testsuite (with make check -j5)
> I observed around 7 crashes.  Definitely an improvement when compared
> to my previous runs with v14.
>
> Here are the back traces details of the core dumps I observed -
> https://gist.github.com/numansiddique/5cab90ec4a1ee6e1adbfd3cd90eccf5a
>
> Crash 1 and Crash 2 are frequent.  Let me know in case you want the core files.

The only suspect I can find so far is the bfd code. There is one place where it may end up as a ds operation with a string re-alloc and/or dereferencing pointers being changed in-flight.

I would not expect that to be invoked in all tests though and I still cannot reproduce it here. Do you have the crashes on all tests or in specific tests only?

A.

> Thanks
> Numan
>
>>> All the test cases passed for me. So maybe something's wrong when
>>> ovn-northd exits.
>>> IMHO, these crashes should be addressed before these patches can be considered.
>>>
>>> Thanks
>>> Numan
>>>
>>>> ---
>>>>    lib/automake.mk         |   2 +
>>>>    lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>>>>    lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>>>>    3 files changed, 758 insertions(+)
>>>>    create mode 100644 lib/ovn-parallel-hmap.c
>>>>    create mode 100644 lib/ovn-parallel-hmap.h
>>>>
>>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>>> index 250c7aefa..781be2109 100644
>>>> --- a/lib/automake.mk
>>>> +++ b/lib/automake.mk
>>>> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>>>>           lib/expr.c \
>>>>           lib/extend-table.h \
>>>>           lib/extend-table.c \
>>>> +       lib/ovn-parallel-hmap.h \
>>>> +       lib/ovn-parallel-hmap.c \
>>>>           lib/ip-mcast-index.c \
>>>>           lib/ip-mcast-index.h \
>>>>           lib/mcast-group-index.c \
>>>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>>>> new file mode 100644
>>>> index 000000000..e83ae23cb
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.c
>>>> @@ -0,0 +1,455 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#include <config.h>
>>>> +#include <stdint.h>
>>>> +#include <string.h>
>>>> +#include <stdlib.h>
>>>> +#include <fcntl.h>
>>>> +#include <unistd.h>
>>>> +#include <errno.h>
>>>> +#include <semaphore.h>
>>>> +#include "fatal-signal.h"
>>>> +#include "util.h"
>>>> +#include "openvswitch/vlog.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovn-parallel-hmap.h"
>>>> +#include "ovs-atomic.h"
>>>> +#include "ovs-thread.h"
>>>> +#include "ovs-numa.h"
>>>> +#include "random.h"
>>>> +
>>>> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
>>>> +
>>>> +#ifndef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#define WORKER_SEM_NAME "%x-%p-%x"
>>>> +#define MAIN_SEM_NAME "%x-%p-main"
>>>> +
>>>> +/* These are accessed under mutex inside add_worker_pool().
>>>> + * They do not need to be atomic.
>>>> + */
>>>> +
>>>> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
>>>> +static bool can_parallelize = false;
>>>> +
>>>> +/* This is set only in the process of exit and the set is
>>>> + * accompanied by a fence. It does not need to be atomic or be
>>>> + * accessed under a lock.
>>>> + */
>>>> +
>>>> +static bool workers_must_exit = false;
>>>> +
>>>> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
>>>> +
>>>> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
>>>> +
>>>> +static int pool_size;
>>>> +
>>>> +static int sembase;
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED);
>>>> +static void setup_worker_pools(bool force);
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +
>>>> +bool ovn_stop_parallel_processing(void)
>>>> +{
>>>> +    return workers_must_exit;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel)
>>>> +{
>>>> +    bool test = false;
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(force_parallel);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +    return can_parallelize;
>>>> +}
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>>> +
>>>> +    struct worker_pool *new_pool = NULL;
>>>> +    struct worker_control *new_control;
>>>> +    bool test = false;
>>>> +    int i;
>>>> +    char sem_name[256];
>>>> +
>>>> +
>>>> +    /* Belt and braces - initialize the pool system just in case if
>>>> +     * if it is not yet initialized.
>>>> +     */
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(false);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +
>>>> +    ovs_mutex_lock(&init_mutex);
>>>> +    if (can_parallelize) {
>>>> +        new_pool = xmalloc(sizeof(struct worker_pool));
>>>> +        new_pool->size = pool_size;
>>>> +        new_pool->controls = NULL;
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +        if (new_pool->done == SEM_FAILED) {
>>>> +            goto cleanup;
>>>> +        }
>>>> +
>>>> +        new_pool->controls =
>>>> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
>>>> +
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            new_control = &new_pool->controls[i];
>>>> +            new_control->id = i;
>>>> +            new_control->done = new_pool->done;
>>>> +            new_control->data = NULL;
>>>> +            ovs_mutex_init(&new_control->mutex);
>>>> +            new_control->finished = ATOMIC_VAR_INIT(false);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +            if (new_control->fire == SEM_FAILED) {
>>>> +                goto cleanup;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        for (i = 0; i < pool_size; i++) {
>>>> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>>> +        }
>>>> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return new_pool;
>>>> +cleanup:
>>>> +
>>>> +    /* Something went wrong when opening semaphores. In this case
>>>> +     * it is better to shut off parallel procesing altogether
>>>> +     */
>>>> +
>>>> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
>>>> +    can_parallelize = false;
>>>> +    if (new_pool->controls) {
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            if (new_pool->controls[i].fire != SEM_FAILED) {
>>>> +                sem_close(new_pool->controls[i].fire);
>>>> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +                sem_unlink(sem_name);
>>>> +                break; /* semaphores past this one are uninitialized */
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    if (new_pool->done != SEM_FAILED) {
>>>> +        sem_close(new_pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table with mask N. */
>>>> +void
>>>> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
>>>> +    hmap->one = NULL;
>>>> +    hmap->mask = mask;
>>>> +    hmap->n = 0;
>>>> +    for (i = 0; i <= hmap->mask; i++) {
>>>> +        hmap->buckets[i] = NULL;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table of size X.
>>>> + * Intended for use in parallel processing so that all
>>>> + * fragments used to store results in a parallel job
>>>> + * are the same size.
>>>> + */
>>>> +void
>>>> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
>>>> +{
>>>> +    size_t mask;
>>>> +    mask = size / 2;
>>>> +    mask |= mask >> 1;
>>>> +    mask |= mask >> 2;
>>>> +    mask |= mask >> 4;
>>>> +    mask |= mask >> 8;
>>>> +    mask |= mask >> 16;
>>>> +#if SIZE_MAX > UINT32_MAX
>>>> +    mask |= mask >> 32;
>>>> +#endif
>>>> +
>>>> +    /* If we need to dynamically allocate buckets we might as well allocate at
>>>> +     * least 4 of them. */
>>>> +    mask |= (mask & 1) << 1;
>>>> +
>>>> +    fast_hmap_init(hmap, mask);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which uses a callback function to process results
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool,
>>>> +                           void *fin_result, void *result_frags,
>>>> +                           void (*helper_func)(struct worker_pool *pool,
>>>> +                                               void *fin_result,
>>>> +                                               void *result_frags, int index))
>>>> +{
>>>> +    int index, completed;
>>>> +
>>>> +    /* Ensure that all worker threads see the same data as the
>>>> +     * main thread.
>>>> +     */
>>>> +
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Start workers */
>>>> +
>>>> +    for (index = 0; index < pool->size; index++) {
>>>> +        sem_post(pool->controls[index].fire);
>>>> +    }
>>>> +
>>>> +    completed = 0;
>>>> +
>>>> +    do {
>>>> +        bool test;
>>>> +        /* Note - we do not loop on semaphore until it reaches
>>>> +         * zero, but on pool size/remaining workers.
>>>> +         * This is by design. If the inner loop can handle
>>>> +         * completion for more than one worker within an iteration
>>>> +         * it will do so to ensure no additional iterations and
>>>> +         * waits once all of them are done.
>>>> +         *
>>>> +         * This may result in us having an initial positive value
>>>> +         * of the semaphore when the pool is invoked the next time.
>>>> +         * This is harmless - the loop will spin up a couple of times
>>>> +         * doing nothing while the workers are processing their data
>>>> +         * slices.
>>>> +         */
>>>> +        wait_for_work_completion(pool);
>>>> +        for (index = 0; index < pool->size; index++) {
>>>> +            test = true;
>>>> +            /* If the worker has marked its data chunk as complete,
>>>> +             * invoke the helper function to combine the results of
>>>> +             * this worker into the main result.
>>>> +             *
>>>> +             * The worker must invoke an appropriate memory fence
>>>> +             * (most likely acq_rel) to ensure that the main thread
>>>> +             * sees all of the results produced by the worker.
>>>> +             */
>>>> +            if (atomic_compare_exchange_weak(
>>>> +                    &pool->controls[index].finished,
>>>> +                    &test,
>>>> +                    false)) {
>>>> +                if (helper_func) {
>>>> +                    (helper_func)(pool, fin_result, result_frags, index);
>>>> +                }
>>>> +                completed++;
>>>> +                pool->controls[index].data = NULL;
>>>> +            }
>>>> +        }
>>>> +    } while (completed < pool->size);
>>>> +}
>>>> +
>>>> +/* Run a thread pool - basic, does not do results processing.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool)
>>>> +{
>>>> +    run_pool_callback(pool, NULL, NULL, NULL);
>>>> +}
>>>> +
>>>> +/* Brute force merge of a hashmap into another hashmap.
>>>> + * Intended for use in parallel processing. The destination
>>>> + * hashmap MUST be the same size as the one being merged.
>>>> + *
>>>> + * This can be achieved by pre-allocating them to correct size
>>>> + * and using hmap_insert_fast() instead of hmap_insert()
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    ovs_assert(inc->mask == dest->mask);
>>>> +
>>>> +    if (!inc->n) {
>>>> +        /* Request to merge an empty frag, nothing to do */
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i <= dest->mask; i++) {
>>>> +        struct hmap_node **dest_bucket = &dest->buckets[i];
>>>> +        struct hmap_node **inc_bucket = &inc->buckets[i];
>>>> +        if (*inc_bucket != NULL) {
>>>> +            struct hmap_node *last_node = *inc_bucket;
>>>> +            while (last_node->next != NULL) {
>>>> +                last_node = last_node->next;
>>>> +            }
>>>> +            last_node->next = *dest_bucket;
>>>> +            *dest_bucket = *inc_bucket;
>>>> +            *inc_bucket = NULL;
>>>> +        }
>>>> +    }
>>>> +    dest->n += inc->n;
>>>> +    inc->n = 0;
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array
>>>> + * of hashes. Merge results.
>>>> + */
>>>> +
>>>> +
>>>> +void ovn_run_pool_hash(
>>>> +        struct worker_pool *pool,
>>>> +        struct hmap *result,
>>>> +        struct hmap *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array of lists.
>>>> + * Merge results.
>>>> + */
>>>> +void ovn_run_pool_list(
>>>> +        struct worker_pool *pool,
>>>> +        struct ovs_list *result,
>>>> +        struct ovs_list *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_list_results);
>>>> +}
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
>>>> +{
>>>> +    int i;
>>>> +    if (hrl->mask != lflows->mask) {
>>>> +        if (hrl->row_locks) {
>>>> +            free(hrl->row_locks);
>>>> +        }
>>>> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
>>>> +        hrl->mask = lflows->mask;
>>>> +        for (i = 0; i <= lflows->mask; i++) {
>>>> +            ovs_mutex_init(&hrl->row_locks[i]);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED) {
>>>> +    int i;
>>>> +    static struct worker_pool *pool;
>>>> +    char sem_name[256];
>>>> +
>>>> +    workers_must_exit = true;
>>>> +
>>>> +    /* All workers must honour the must_exit flag and check for it regularly.
>>>> +     * We can make it atomic and check it via atomics in workers, but that
>>>> +     * is not really necessary as it is set just once - when the program
>>>> +     * terminates. So we use a fence which is invoked before exiting instead.
>>>> +     */
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Wake up the workers after the must_exit flag has been set */
>>>> +
>>>> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_post(pool->controls[i].fire);
>>>> +        }
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_close(pool->controls[i].fire);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>>>> +            sem_unlink(sem_name);
>>>> +        }
>>>> +        sem_close(pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void setup_worker_pools(bool force) {
>>>> +    int cores, nodes;
>>>> +
>>>> +    nodes = ovs_numa_get_n_numas();
>>>> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>>>> +        nodes = 1;
>>>> +    }
>>>> +    cores = ovs_numa_get_n_cores();
>>>> +
>>>> +    /* If there is no NUMA config, use 4 cores.
>>>> +     * If there is NUMA config use half the cores on
>>>> +     * one node so that the OS does not start pushing
>>>> +     * threads to other nodes.
>>>> +     */
>>>> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
>>>> +        /* If there is no NUMA we can try the ovs-threads routine.
>>>> +         * It falls back to sysconf and/or affinity mask.
>>>> +         */
>>>> +        cores = count_cpu_cores();
>>>> +        pool_size = cores;
>>>> +    } else {
>>>> +        pool_size = cores / nodes;
>>>> +    }
>>>> +    if ((pool_size < 4) && force) {
>>>> +        pool_size = 4;
>>>> +    }
>>>> +    can_parallelize = (pool_size >= 3);
>>>> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
>>>> +    sembase = random_uint32();
>>>> +}
>>>> +
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct ovs_list *result = (struct ovs_list *)fin_result;
>>>> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
>>>> +
>>>> +    if (!ovs_list_is_empty(&res_frags[index])) {
>>>> +        ovs_list_splice(result->next,
>>>> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct hmap *result = (struct hmap *)fin_result;
>>>> +    struct hmap *res_frags = (struct hmap *)result_frags;
>>>> +
>>>> +    fast_hmap_merge(result, &res_frags[index]);
>>>> +    hmap_destroy(&res_frags[index]);
>>>> +}
>>>> +
>>>> +#endif
>>>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>>>> new file mode 100644
>>>> index 000000000..8db61eaba
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.h
>>>> @@ -0,0 +1,301 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#ifndef OVN_PARALLEL_HMAP
>>>> +#define OVN_PARALLEL_HMAP 1
>>>> +
>>>> +/* if the parallel macros are defined by hmap.h or any other ovs define
>>>> + * we skip over the ovn specific definitions.
>>>> + */
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdlib.h>
>>>> +#include <semaphore.h>
>>>> +#include <errno.h>
>>>> +#include "openvswitch/util.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovs-atomic.h"
>>>> +
>>>> +/* Process this include only if OVS does not supply parallel definitions
>>>> + */
>>>> +
>>>> +#ifdef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#include "parallel-hmap.h"
>>>> +
>>>> +#else
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic push
>>>> +#pragma clang diagnostic ignored "-Wthread-safety"
>>>> +#endif
>>>> +
>>>> +
>>>> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
>>>> + * of parallel processing.
>>>> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
>>>> + * and will iterate hash buckets ThreadID, ThreadID + step,
>>>> + * ThreadID + step * 2, etc. The actual macro accepts
>>>> + * ThreadID + step * i as the JOBID parameter.
>>>> + */
>>>> +
>>>> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
>>>> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
>>>> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
>>>> +       || ((NODE = NULL), false); \
>>>> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
>>>> +
>>>> +/* We do not have a SAFE version of the macro, because the hash size is not
>>>> + * atomic and hash removal operations would need to be wrapped with
>>>> + * locks. This will defeat most of the benefits from doing anything in
>>>> + * parallel.
>>>> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
>>>> + * each thread should store them in a temporary list result instead, merging
>>>> + * the lists into a combined result at the end */
>>>> +
>>>> +/* Work "Handle" */
>>>> +
>>>> +struct worker_control {
>>>> +    int id; /* Used as a modulo when iterating over a hash. */
>>>> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
>>>> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
>>>> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
>>>> +    struct ovs_mutex mutex; /* Guards the data. */
>>>> +    void *data; /* Pointer to data to be processed. */
>>>> +    void *workload; /* back-pointer to the worker pool structure. */
>>>> +};
>>>> +
>>>> +struct worker_pool {
>>>> +    int size;   /* Number of threads in the pool. */
>>>> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
>>>> +    struct worker_control *controls; /* "Handles" in this pool. */
>>>> +    sem_t *done; /* Work completion semaphorew. */
>>>> +};
>>>> +
>>>> +/* Add a worker pool for thread function start() which expects a pointer to
>>>> + * a worker_control structure as an argument. */
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
>>>> +
>>>> +/* Setting this to true will make all processing threads exit */
>>>> +
>>>> +bool ovn_stop_parallel_processing(void);
>>>> +
>>>> +/* Build a hmap pre-sized for size elements */
>>>> +
>>>> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
>>>> +
>>>> +/* Build a hmap with a mask equals to size */
>>>> +
>>>> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
>>>> +
>>>> +/* Brute-force merge a hmap into hmap.
>>>> + * Dest and inc have to have the same mask. The merge is performed
>>>> + * by extending the element list for bucket N in the dest hmap with the list
>>>> + * from bucket N in inc.
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
>>>> +
>>>> +/* Run a pool, without any default processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool);
>>>> +
>>>> +/* Run a pool, merge results from hash frags into a final hash result.
>>>> + * The hash frags must be pre-sized to the same size.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_hash(struct worker_pool *pool,
>>>> +                       struct hmap *result, struct hmap *result_frags);
>>>> +/* Run a pool, merge results from list frags into a final list result.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_list(struct worker_pool *pool,
>>>> +                       struct ovs_list *result, struct ovs_list *result_frags);
>>>> +
>>>> +/* Run a pool, call a callback function to perform processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
>>>> +                    void *result_frags,
>>>> +                    void (*helper_func)(struct worker_pool *pool,
>>>> +                        void *fin_result, void *result_frags, int index));
>>>> +
>>>> +
>>>> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
>>>> + * would land, or a null pointer if that bucket is empty. */
>>>> +
>>>> +static inline struct hmap_node *
>>>> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
>>>> +{
>>>> +    return hmap->buckets[num];
>>>> +}
>>>> +
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
>>>> +{
>>>> +    size_t i;
>>>> +    for (i = start; i <= hmap->mask; i+= pool_size) {
>>>> +        struct hmap_node *node = hmap->buckets[i];
>>>> +        if (node) {
>>>> +            return node;
>>>> +        }
>>>> +    }
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +/* Returns the first node in 'hmap', as expected by thread with job_id
>>>> + * for parallel processing in arbitrary order, or a null pointer if
>>>> + * the slice of 'hmap' for that job_id is empty. */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
>>>> +{
>>>> +    return parallel_hmap_next__(hmap, job_id, pool_size);
>>>> +}
>>>> +
>>>> +/* Returns the next node in the slice of 'hmap' following 'node',
>>>> + * in arbitrary order, or a * null pointer if 'node' is the last node in
>>>> + * the 'hmap' slice.
>>>> + *
>>>> + */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next(const struct hmap *hmap,
>>>> +                   const struct hmap_node *node, ssize_t pool_size)
>>>> +{
>>>> +    return (node->next
>>>> +            ? node->next
>>>> +            : parallel_hmap_next__(hmap,
>>>> +                (node->hash & hmap->mask) + pool_size, pool_size));
>>>> +}
>>>> +
>>>> +static inline void post_completed_work(struct worker_control *control)
>>>> +{
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +    atomic_store_relaxed(&control->finished, true);
>>>> +    sem_post(control->done);
>>>> +}
>>>> +
>>>> +static inline void wait_for_work(struct worker_control *control)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(control->fire);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +static inline void wait_for_work_completion(struct worker_pool *pool)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(pool->done);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +
>>>> +
>>>> +/* Hash per-row locking support - to be used only in conjunction
>>>> + * with fast hash inserts. Normal hash inserts may resize the hash
>>>> + * rendering the locking invalid.
>>>> + */
>>>> +
>>>> +struct hashrow_locks {
>>>> +    ssize_t mask;
>>>> +    struct ovs_mutex *row_locks;
>>>> +};
>>>> +
>>>> +/* Update an hash row locks structure to match the current hash size */
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
>>>> +
>>>> +/* Lock a hash row */
>>>> +
>>>> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +
>>>> +/* Unlock a hash row */
>>>> +
>>>> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +/* Init the row locks structure */
>>>> +
>>>> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
>>>> +{
>>>> +    hrl->mask = 0;
>>>> +    hrl->row_locks = NULL;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel);
>>>> +
>>>> +/* Use the OVN library functions for stuff which OVS has not defined
>>>> + * If OVS has defined these, they will still compile using the OVN
>>>> + * local names, but will be dropped by the linker in favour of the OVS
>>>> + * supplied functions.
>>>> + */
>>>> +
>>>> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
>>>> +
>>>> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
>>>> +
>>>> +#define stop_parallel_processing() ovn_stop_parallel_processing()
>>>> +
>>>> +#define add_worker_pool(start) ovn_add_worker_pool(start)
>>>> +
>>>> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>>>> +
>>>> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
>>>> +
>>>> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
>>>> +
>>>> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
>>>> +
>>>> +#define ovn_run_pool(pool) ovn_run_pool(pool)
>>>> +
>>>> +#define run_pool_hash(pool, result, result_frags) \
>>>> +    ovn_run_pool_hash(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_list(pool, result, result_frags) \
>>>> +    ovn_run_pool_list(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
>>>> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
>>>> +
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic pop
>>>> +#endif
>>>> +
>>>> +#endif
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +
>>>> +#endif /* lib/fasthmap.h */
>>>> --
>>>> 2.20.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661
>> https://www.cambridgegreys.com/
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Anton Ivanov March 26, 2021, 6:19 p.m. UTC | #8
Hi Numan,

Can you try the following patch. This addresses the only other "suspicious" places I have found so far.

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 7c1413116..0cc3bca40 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -540,10 +540,10 @@ struct mcast_switch_info {
                                   * be received for queries that were sent out.
                                   */

-    uint32_t active_v4_flows;   /* Current number of active IPv4 multicast
+    atomic_uint64_t active_v4_flows;   /* Current number of active IPv4 multicast
                                   * flows.
                                   */
-    uint32_t active_v6_flows;   /* Current number of active IPv6 multicast
+    atomic_uint64_t active_v6_flows;   /* Current number of active IPv6 multicast
                                   * flows.
                                   */
  };
@@ -1002,8 +1002,8 @@ init_mcast_info_for_switch_datapath(struct ovn_datapath *od)
          smap_get_ullong(&od->nbs->other_config, "mcast_query_max_response",
                          OVN_MCAST_DEFAULT_QUERY_MAX_RESPONSE_S);

-    mcast_sw_info->active_v4_flows = 0;
-    mcast_sw_info->active_v6_flows = 0;
+    mcast_sw_info->active_v4_flows = ATOMIC_VAR_INIT(0);
+    mcast_sw_info->active_v6_flows = ATOMIC_VAR_INIT(0);
  }

  static void
@@ -7311,6 +7311,8 @@ build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group *igmp_group,
                                  struct ds *actions,
                                  struct ds *match)
  {
+    uint64_t dummy;
+
      if (igmp_group->datapath) {

          ds_clear(match);
@@ -7329,10 +7331,13 @@ build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group *igmp_group,
                  return;
              }

-            if (mcast_sw_info->active_v4_flows >= mcast_sw_info->table_size) {
+            if (atomic_compare_exchange_strong(
+                        &mcast_sw_info->active_v4_flows,
+                        &mcast_sw_info->table_size,
+                        mcast_sw_info->table_size)) {
                  return;
              }
-            mcast_sw_info->active_v4_flows++;
+            atomic_add(&mcast_sw_info->active_v4_flows, 1, &dummy);
              ds_put_format(match, "eth.mcast && ip4 && ip4.dst == %s ",
                            igmp_group->mcgroup.name);
          } else {
@@ -7342,10 +7347,13 @@ build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group *igmp_group,
              if (ipv6_is_all_hosts(&igmp_group->address)) {
                  return;
              }
-            if (mcast_sw_info->active_v6_flows >= mcast_sw_info->table_size) {
+            if (atomic_compare_exchange_strong(
+                        &mcast_sw_info->active_v6_flows,
+                        &mcast_sw_info->table_size,
+                        mcast_sw_info->table_size)) {
                  return;
              }
-            mcast_sw_info->active_v6_flows++;
+            atomic_add(&mcast_sw_info->active_v6_flows, 1, &dummy);
              ds_put_format(match, "eth.mcast && ip6 && ip6.dst == %s ",
                            igmp_group->mcgroup.name);
          }
@@ -7977,6 +7985,8 @@ route_hash(struct parsed_route *route)
                        (uint32_t)route->plen);
  }

+static struct ovs_mutex bfd_lock = OVS_MUTEX_INITIALIZER;
+
  /* Parse and validate the route. Return the parsed route if successful.
   * Otherwise return NULL. */
  static struct parsed_route *
@@ -8029,6 +8039,7 @@ parsed_routes_add(struct ovs_list *routes,

          bfd_e = bfd_port_lookup(bfd_connections, nb_bt->logical_port,
                                  nb_bt->dst_ip);
+        ovs_mutex_lock(&bfd_lock);
          if (bfd_e) {
              bfd_e->ref = true;
          }
@@ -8038,8 +8049,10 @@ parsed_routes_add(struct ovs_list *routes,
          }

          if (!strcmp(nb_bt->status, "down")) {
+            ovs_mutex_unlock(&bfd_lock);
              return NULL;
          }
+        ovs_mutex_unlock(&bfd_lock);
      }

      struct parsed_route *pr = xzalloc(sizeof *pr);



On 26/03/2021 03:25, Numan Siddique wrote:
> On Thu, Mar 25, 2021 at 3:01 PM Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>>
>>
>>
>> On 24/03/2021 15:31, Numan Siddique wrote:
>>> On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>>>>
>>>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>>
>>>> This adds a set of functions and macros intended to process
>>>> hashes in parallel.
>>>>
>>>> The principles of operation are documented in the ovn-parallel-hmap.h
>>>>
>>>> If these one day go into the OVS tree, the OVS tree versions
>>>> would be used in preference.
>>>>
>>>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>
>>> Hi Anton,
>>>
>>> I tested the first 2 patches of this series and it crashes again for me.
>>>
>>> This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
>>> E3-1220 v5 @ 3.00GHz
>>>
>>> The below trace is seen for both gcc and clang.
>>>
>>> ----
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `ovn-northd -vjsonrpc
>>> --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
>>> Missing separate debuginfos, use: dnf debuginfo-install
>>> glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
>>> libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
>>> python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
>>> zlib-1.2.11-23.fc33.x86_64
>>> (gdb) bt
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
>>> ../lib/ovn-parallel-hmap.h:203
>>> #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
>>> #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
>>> ../lib/ovs-thread.c:383
>>> #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
>>> #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
>>> -----
>>>
>>> I'm not sure why you're not able to reproduce this issue.
>>
>> I can't. I have run it for days in a loop.
>>
>> One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.
>>
>> There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.
>>
>> Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.
>>
>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>> index e83ae23cb..3597f896f 100644
>> --- a/lib/ovn-parallel-hmap.c
>> +++ b/lib/ovn-parallel-hmap.c
>> @@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>            }
>>
>>            for (i = 0; i < pool_size; i++) {
>> -            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>> +            new_pool->controls[i].worker =
>> +                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>            }
>>            ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>        }
>> @@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_post(pool->controls[i].fire);
>>            }
>> +        for (i = 0; i < pool->size ; i++) {
>> +            pthread_join(pool->controls[i].worker, NULL);
>> +        }
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_close(pool->controls[i].fire);
>>                sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>> index 8db61eaba..d62ca3da5 100644
>> --- a/lib/ovn-parallel-hmap.h
>> +++ b/lib/ovn-parallel-hmap.h
>> @@ -82,6 +82,7 @@ struct worker_control {
>>        struct ovs_mutex mutex; /* Guards the data. */
>>        void *data; /* Pointer to data to be processed. */
>>        void *workload; /* back-pointer to the worker pool structure. */
>> +    pthread_t worker;
>>    };
>>
>>    struct worker_pool {
>>
> 
> I applied the above diff on top of patch 2  and did some tests.  I see
> a big improvement
> with this.  On my "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz"  server,
> I saw just one
> crash only once when I ran the test suite multiple times.
> 
> On my work laptop (in which the tests used to hang earlier), all the
> tests are passing now.
> But I see a lot more consistent crashes here.  For all single run of
> whole testsuite (with make check -j5)
> I observed around 7 crashes.  Definitely an improvement when compared
> to my previous runs with v14.
> 
> Here are the back traces details of the core dumps I observed -
> https://gist.github.com/numansiddique/5cab90ec4a1ee6e1adbfd3cd90eccf5a
> 
> Crash 1 and Crash 2 are frequent.  Let me know in case you want the core files.
> 
> Thanks
> Numan
> 
>>
>>>
>>> All the test cases passed for me. So maybe something's wrong when
>>> ovn-northd exits.
>>> IMHO, these crashes should be addressed before these patches can be considered.
>>>
>>> Thanks
>>> Numan
>>>
>>>> ---
>>>>    lib/automake.mk         |   2 +
>>>>    lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>>>>    lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>>>>    3 files changed, 758 insertions(+)
>>>>    create mode 100644 lib/ovn-parallel-hmap.c
>>>>    create mode 100644 lib/ovn-parallel-hmap.h
>>>>
>>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>>> index 250c7aefa..781be2109 100644
>>>> --- a/lib/automake.mk
>>>> +++ b/lib/automake.mk
>>>> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>>>>           lib/expr.c \
>>>>           lib/extend-table.h \
>>>>           lib/extend-table.c \
>>>> +       lib/ovn-parallel-hmap.h \
>>>> +       lib/ovn-parallel-hmap.c \
>>>>           lib/ip-mcast-index.c \
>>>>           lib/ip-mcast-index.h \
>>>>           lib/mcast-group-index.c \
>>>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>>>> new file mode 100644
>>>> index 000000000..e83ae23cb
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.c
>>>> @@ -0,0 +1,455 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#include <config.h>
>>>> +#include <stdint.h>
>>>> +#include <string.h>
>>>> +#include <stdlib.h>
>>>> +#include <fcntl.h>
>>>> +#include <unistd.h>
>>>> +#include <errno.h>
>>>> +#include <semaphore.h>
>>>> +#include "fatal-signal.h"
>>>> +#include "util.h"
>>>> +#include "openvswitch/vlog.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovn-parallel-hmap.h"
>>>> +#include "ovs-atomic.h"
>>>> +#include "ovs-thread.h"
>>>> +#include "ovs-numa.h"
>>>> +#include "random.h"
>>>> +
>>>> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
>>>> +
>>>> +#ifndef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#define WORKER_SEM_NAME "%x-%p-%x"
>>>> +#define MAIN_SEM_NAME "%x-%p-main"
>>>> +
>>>> +/* These are accessed under mutex inside add_worker_pool().
>>>> + * They do not need to be atomic.
>>>> + */
>>>> +
>>>> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
>>>> +static bool can_parallelize = false;
>>>> +
>>>> +/* This is set only in the process of exit and the set is
>>>> + * accompanied by a fence. It does not need to be atomic or be
>>>> + * accessed under a lock.
>>>> + */
>>>> +
>>>> +static bool workers_must_exit = false;
>>>> +
>>>> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
>>>> +
>>>> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
>>>> +
>>>> +static int pool_size;
>>>> +
>>>> +static int sembase;
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED);
>>>> +static void setup_worker_pools(bool force);
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +
>>>> +bool ovn_stop_parallel_processing(void)
>>>> +{
>>>> +    return workers_must_exit;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel)
>>>> +{
>>>> +    bool test = false;
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(force_parallel);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +    return can_parallelize;
>>>> +}
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>>> +
>>>> +    struct worker_pool *new_pool = NULL;
>>>> +    struct worker_control *new_control;
>>>> +    bool test = false;
>>>> +    int i;
>>>> +    char sem_name[256];
>>>> +
>>>> +
>>>> +    /* Belt and braces - initialize the pool system just in case if
>>>> +     * if it is not yet initialized.
>>>> +     */
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(false);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +
>>>> +    ovs_mutex_lock(&init_mutex);
>>>> +    if (can_parallelize) {
>>>> +        new_pool = xmalloc(sizeof(struct worker_pool));
>>>> +        new_pool->size = pool_size;
>>>> +        new_pool->controls = NULL;
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +        if (new_pool->done == SEM_FAILED) {
>>>> +            goto cleanup;
>>>> +        }
>>>> +
>>>> +        new_pool->controls =
>>>> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
>>>> +
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            new_control = &new_pool->controls[i];
>>>> +            new_control->id = i;
>>>> +            new_control->done = new_pool->done;
>>>> +            new_control->data = NULL;
>>>> +            ovs_mutex_init(&new_control->mutex);
>>>> +            new_control->finished = ATOMIC_VAR_INIT(false);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +            if (new_control->fire == SEM_FAILED) {
>>>> +                goto cleanup;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        for (i = 0; i < pool_size; i++) {
>>>> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>>> +        }
>>>> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return new_pool;
>>>> +cleanup:
>>>> +
>>>> +    /* Something went wrong when opening semaphores. In this case
>>>> +     * it is better to shut off parallel procesing altogether
>>>> +     */
>>>> +
>>>> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
>>>> +    can_parallelize = false;
>>>> +    if (new_pool->controls) {
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            if (new_pool->controls[i].fire != SEM_FAILED) {
>>>> +                sem_close(new_pool->controls[i].fire);
>>>> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +                sem_unlink(sem_name);
>>>> +                break; /* semaphores past this one are uninitialized */
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    if (new_pool->done != SEM_FAILED) {
>>>> +        sem_close(new_pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table with mask N. */
>>>> +void
>>>> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
>>>> +    hmap->one = NULL;
>>>> +    hmap->mask = mask;
>>>> +    hmap->n = 0;
>>>> +    for (i = 0; i <= hmap->mask; i++) {
>>>> +        hmap->buckets[i] = NULL;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table of size X.
>>>> + * Intended for use in parallel processing so that all
>>>> + * fragments used to store results in a parallel job
>>>> + * are the same size.
>>>> + */
>>>> +void
>>>> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
>>>> +{
>>>> +    size_t mask;
>>>> +    mask = size / 2;
>>>> +    mask |= mask >> 1;
>>>> +    mask |= mask >> 2;
>>>> +    mask |= mask >> 4;
>>>> +    mask |= mask >> 8;
>>>> +    mask |= mask >> 16;
>>>> +#if SIZE_MAX > UINT32_MAX
>>>> +    mask |= mask >> 32;
>>>> +#endif
>>>> +
>>>> +    /* If we need to dynamically allocate buckets we might as well allocate at
>>>> +     * least 4 of them. */
>>>> +    mask |= (mask & 1) << 1;
>>>> +
>>>> +    fast_hmap_init(hmap, mask);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which uses a callback function to process results
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool,
>>>> +                           void *fin_result, void *result_frags,
>>>> +                           void (*helper_func)(struct worker_pool *pool,
>>>> +                                               void *fin_result,
>>>> +                                               void *result_frags, int index))
>>>> +{
>>>> +    int index, completed;
>>>> +
>>>> +    /* Ensure that all worker threads see the same data as the
>>>> +     * main thread.
>>>> +     */
>>>> +
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Start workers */
>>>> +
>>>> +    for (index = 0; index < pool->size; index++) {
>>>> +        sem_post(pool->controls[index].fire);
>>>> +    }
>>>> +
>>>> +    completed = 0;
>>>> +
>>>> +    do {
>>>> +        bool test;
>>>> +        /* Note - we do not loop on semaphore until it reaches
>>>> +         * zero, but on pool size/remaining workers.
>>>> +         * This is by design. If the inner loop can handle
>>>> +         * completion for more than one worker within an iteration
>>>> +         * it will do so to ensure no additional iterations and
>>>> +         * waits once all of them are done.
>>>> +         *
>>>> +         * This may result in us having an initial positive value
>>>> +         * of the semaphore when the pool is invoked the next time.
>>>> +         * This is harmless - the loop will spin up a couple of times
>>>> +         * doing nothing while the workers are processing their data
>>>> +         * slices.
>>>> +         */
>>>> +        wait_for_work_completion(pool);
>>>> +        for (index = 0; index < pool->size; index++) {
>>>> +            test = true;
>>>> +            /* If the worker has marked its data chunk as complete,
>>>> +             * invoke the helper function to combine the results of
>>>> +             * this worker into the main result.
>>>> +             *
>>>> +             * The worker must invoke an appropriate memory fence
>>>> +             * (most likely acq_rel) to ensure that the main thread
>>>> +             * sees all of the results produced by the worker.
>>>> +             */
>>>> +            if (atomic_compare_exchange_weak(
>>>> +                    &pool->controls[index].finished,
>>>> +                    &test,
>>>> +                    false)) {
>>>> +                if (helper_func) {
>>>> +                    (helper_func)(pool, fin_result, result_frags, index);
>>>> +                }
>>>> +                completed++;
>>>> +                pool->controls[index].data = NULL;
>>>> +            }
>>>> +        }
>>>> +    } while (completed < pool->size);
>>>> +}
>>>> +
>>>> +/* Run a thread pool - basic, does not do results processing.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool)
>>>> +{
>>>> +    run_pool_callback(pool, NULL, NULL, NULL);
>>>> +}
>>>> +
>>>> +/* Brute force merge of a hashmap into another hashmap.
>>>> + * Intended for use in parallel processing. The destination
>>>> + * hashmap MUST be the same size as the one being merged.
>>>> + *
>>>> + * This can be achieved by pre-allocating them to correct size
>>>> + * and using hmap_insert_fast() instead of hmap_insert()
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    ovs_assert(inc->mask == dest->mask);
>>>> +
>>>> +    if (!inc->n) {
>>>> +        /* Request to merge an empty frag, nothing to do */
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i <= dest->mask; i++) {
>>>> +        struct hmap_node **dest_bucket = &dest->buckets[i];
>>>> +        struct hmap_node **inc_bucket = &inc->buckets[i];
>>>> +        if (*inc_bucket != NULL) {
>>>> +            struct hmap_node *last_node = *inc_bucket;
>>>> +            while (last_node->next != NULL) {
>>>> +                last_node = last_node->next;
>>>> +            }
>>>> +            last_node->next = *dest_bucket;
>>>> +            *dest_bucket = *inc_bucket;
>>>> +            *inc_bucket = NULL;
>>>> +        }
>>>> +    }
>>>> +    dest->n += inc->n;
>>>> +    inc->n = 0;
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array
>>>> + * of hashes. Merge results.
>>>> + */
>>>> +
>>>> +
>>>> +void ovn_run_pool_hash(
>>>> +        struct worker_pool *pool,
>>>> +        struct hmap *result,
>>>> +        struct hmap *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array of lists.
>>>> + * Merge results.
>>>> + */
>>>> +void ovn_run_pool_list(
>>>> +        struct worker_pool *pool,
>>>> +        struct ovs_list *result,
>>>> +        struct ovs_list *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_list_results);
>>>> +}
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
>>>> +{
>>>> +    int i;
>>>> +    if (hrl->mask != lflows->mask) {
>>>> +        if (hrl->row_locks) {
>>>> +            free(hrl->row_locks);
>>>> +        }
>>>> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
>>>> +        hrl->mask = lflows->mask;
>>>> +        for (i = 0; i <= lflows->mask; i++) {
>>>> +            ovs_mutex_init(&hrl->row_locks[i]);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED) {
>>>> +    int i;
>>>> +    static struct worker_pool *pool;
>>>> +    char sem_name[256];
>>>> +
>>>> +    workers_must_exit = true;
>>>> +
>>>> +    /* All workers must honour the must_exit flag and check for it regularly.
>>>> +     * We can make it atomic and check it via atomics in workers, but that
>>>> +     * is not really necessary as it is set just once - when the program
>>>> +     * terminates. So we use a fence which is invoked before exiting instead.
>>>> +     */
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Wake up the workers after the must_exit flag has been set */
>>>> +
>>>> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_post(pool->controls[i].fire);
>>>> +        }
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_close(pool->controls[i].fire);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>>>> +            sem_unlink(sem_name);
>>>> +        }
>>>> +        sem_close(pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void setup_worker_pools(bool force) {
>>>> +    int cores, nodes;
>>>> +
>>>> +    nodes = ovs_numa_get_n_numas();
>>>> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>>>> +        nodes = 1;
>>>> +    }
>>>> +    cores = ovs_numa_get_n_cores();
>>>> +
>>>> +    /* If there is no NUMA config, use 4 cores.
>>>> +     * If there is NUMA config use half the cores on
>>>> +     * one node so that the OS does not start pushing
>>>> +     * threads to other nodes.
>>>> +     */
>>>> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
>>>> +        /* If there is no NUMA we can try the ovs-threads routine.
>>>> +         * It falls back to sysconf and/or affinity mask.
>>>> +         */
>>>> +        cores = count_cpu_cores();
>>>> +        pool_size = cores;
>>>> +    } else {
>>>> +        pool_size = cores / nodes;
>>>> +    }
>>>> +    if ((pool_size < 4) && force) {
>>>> +        pool_size = 4;
>>>> +    }
>>>> +    can_parallelize = (pool_size >= 3);
>>>> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
>>>> +    sembase = random_uint32();
>>>> +}
>>>> +
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct ovs_list *result = (struct ovs_list *)fin_result;
>>>> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
>>>> +
>>>> +    if (!ovs_list_is_empty(&res_frags[index])) {
>>>> +        ovs_list_splice(result->next,
>>>> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct hmap *result = (struct hmap *)fin_result;
>>>> +    struct hmap *res_frags = (struct hmap *)result_frags;
>>>> +
>>>> +    fast_hmap_merge(result, &res_frags[index]);
>>>> +    hmap_destroy(&res_frags[index]);
>>>> +}
>>>> +
>>>> +#endif
>>>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>>>> new file mode 100644
>>>> index 000000000..8db61eaba
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.h
>>>> @@ -0,0 +1,301 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#ifndef OVN_PARALLEL_HMAP
>>>> +#define OVN_PARALLEL_HMAP 1
>>>> +
>>>> +/* if the parallel macros are defined by hmap.h or any other ovs define
>>>> + * we skip over the ovn specific definitions.
>>>> + */
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdlib.h>
>>>> +#include <semaphore.h>
>>>> +#include <errno.h>
>>>> +#include "openvswitch/util.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovs-atomic.h"
>>>> +
>>>> +/* Process this include only if OVS does not supply parallel definitions
>>>> + */
>>>> +
>>>> +#ifdef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#include "parallel-hmap.h"
>>>> +
>>>> +#else
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic push
>>>> +#pragma clang diagnostic ignored "-Wthread-safety"
>>>> +#endif
>>>> +
>>>> +
>>>> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
>>>> + * of parallel processing.
>>>> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
>>>> + * and will iterate hash buckets ThreadID, ThreadID + step,
>>>> + * ThreadID + step * 2, etc. The actual macro accepts
>>>> + * ThreadID + step * i as the JOBID parameter.
>>>> + */
>>>> +
>>>> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
>>>> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
>>>> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
>>>> +       || ((NODE = NULL), false); \
>>>> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
>>>> +
>>>> +/* We do not have a SAFE version of the macro, because the hash size is not
>>>> + * atomic and hash removal operations would need to be wrapped with
>>>> + * locks. This will defeat most of the benefits from doing anything in
>>>> + * parallel.
>>>> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
>>>> + * each thread should store them in a temporary list result instead, merging
>>>> + * the lists into a combined result at the end */
>>>> +
>>>> +/* Work "Handle" */
>>>> +
>>>> +struct worker_control {
>>>> +    int id; /* Used as a modulo when iterating over a hash. */
>>>> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
>>>> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
>>>> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
>>>> +    struct ovs_mutex mutex; /* Guards the data. */
>>>> +    void *data; /* Pointer to data to be processed. */
>>>> +    void *workload; /* back-pointer to the worker pool structure. */
>>>> +};
>>>> +
>>>> +struct worker_pool {
>>>> +    int size;   /* Number of threads in the pool. */
>>>> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
>>>> +    struct worker_control *controls; /* "Handles" in this pool. */
>>>> +    sem_t *done; /* Work completion semaphorew. */
>>>> +};
>>>> +
>>>> +/* Add a worker pool for thread function start() which expects a pointer to
>>>> + * a worker_control structure as an argument. */
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
>>>> +
>>>> +/* Setting this to true will make all processing threads exit */
>>>> +
>>>> +bool ovn_stop_parallel_processing(void);
>>>> +
>>>> +/* Build a hmap pre-sized for size elements */
>>>> +
>>>> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
>>>> +
>>>> +/* Build a hmap with a mask equals to size */
>>>> +
>>>> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
>>>> +
>>>> +/* Brute-force merge a hmap into hmap.
>>>> + * Dest and inc have to have the same mask. The merge is performed
>>>> + * by extending the element list for bucket N in the dest hmap with the list
>>>> + * from bucket N in inc.
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
>>>> +
>>>> +/* Run a pool, without any default processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool);
>>>> +
>>>> +/* Run a pool, merge results from hash frags into a final hash result.
>>>> + * The hash frags must be pre-sized to the same size.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_hash(struct worker_pool *pool,
>>>> +                       struct hmap *result, struct hmap *result_frags);
>>>> +/* Run a pool, merge results from list frags into a final list result.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_list(struct worker_pool *pool,
>>>> +                       struct ovs_list *result, struct ovs_list *result_frags);
>>>> +
>>>> +/* Run a pool, call a callback function to perform processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
>>>> +                    void *result_frags,
>>>> +                    void (*helper_func)(struct worker_pool *pool,
>>>> +                        void *fin_result, void *result_frags, int index));
>>>> +
>>>> +
>>>> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
>>>> + * would land, or a null pointer if that bucket is empty. */
>>>> +
>>>> +static inline struct hmap_node *
>>>> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
>>>> +{
>>>> +    return hmap->buckets[num];
>>>> +}
>>>> +
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
>>>> +{
>>>> +    size_t i;
>>>> +    for (i = start; i <= hmap->mask; i+= pool_size) {
>>>> +        struct hmap_node *node = hmap->buckets[i];
>>>> +        if (node) {
>>>> +            return node;
>>>> +        }
>>>> +    }
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +/* Returns the first node in 'hmap', as expected by thread with job_id
>>>> + * for parallel processing in arbitrary order, or a null pointer if
>>>> + * the slice of 'hmap' for that job_id is empty. */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
>>>> +{
>>>> +    return parallel_hmap_next__(hmap, job_id, pool_size);
>>>> +}
>>>> +
>>>> +/* Returns the next node in the slice of 'hmap' following 'node',
>>>> + * in arbitrary order, or a * null pointer if 'node' is the last node in
>>>> + * the 'hmap' slice.
>>>> + *
>>>> + */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next(const struct hmap *hmap,
>>>> +                   const struct hmap_node *node, ssize_t pool_size)
>>>> +{
>>>> +    return (node->next
>>>> +            ? node->next
>>>> +            : parallel_hmap_next__(hmap,
>>>> +                (node->hash & hmap->mask) + pool_size, pool_size));
>>>> +}
>>>> +
>>>> +static inline void post_completed_work(struct worker_control *control)
>>>> +{
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +    atomic_store_relaxed(&control->finished, true);
>>>> +    sem_post(control->done);
>>>> +}
>>>> +
>>>> +static inline void wait_for_work(struct worker_control *control)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(control->fire);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +static inline void wait_for_work_completion(struct worker_pool *pool)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(pool->done);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +
>>>> +
>>>> +/* Hash per-row locking support - to be used only in conjunction
>>>> + * with fast hash inserts. Normal hash inserts may resize the hash
>>>> + * rendering the locking invalid.
>>>> + */
>>>> +
>>>> +struct hashrow_locks {
>>>> +    ssize_t mask;
>>>> +    struct ovs_mutex *row_locks;
>>>> +};
>>>> +
>>>> +/* Update an hash row locks structure to match the current hash size */
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
>>>> +
>>>> +/* Lock a hash row */
>>>> +
>>>> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +
>>>> +/* Unlock a hash row */
>>>> +
>>>> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +/* Init the row locks structure */
>>>> +
>>>> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
>>>> +{
>>>> +    hrl->mask = 0;
>>>> +    hrl->row_locks = NULL;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel);
>>>> +
>>>> +/* Use the OVN library functions for stuff which OVS has not defined
>>>> + * If OVS has defined these, they will still compile using the OVN
>>>> + * local names, but will be dropped by the linker in favour of the OVS
>>>> + * supplied functions.
>>>> + */
>>>> +
>>>> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
>>>> +
>>>> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
>>>> +
>>>> +#define stop_parallel_processing() ovn_stop_parallel_processing()
>>>> +
>>>> +#define add_worker_pool(start) ovn_add_worker_pool(start)
>>>> +
>>>> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>>>> +
>>>> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
>>>> +
>>>> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
>>>> +
>>>> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
>>>> +
>>>> +#define ovn_run_pool(pool) ovn_run_pool(pool)
>>>> +
>>>> +#define run_pool_hash(pool, result, result_frags) \
>>>> +    ovn_run_pool_hash(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_list(pool, result, result_frags) \
>>>> +    ovn_run_pool_list(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
>>>> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
>>>> +
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic pop
>>>> +#endif
>>>> +
>>>> +#endif
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +
>>>> +#endif /* lib/fasthmap.h */
>>>> --
>>>> 2.20.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>>
>>
>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661
>> https://www.cambridgegreys.com/
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Anton Ivanov March 26, 2021, 7:02 p.m. UTC | #9
On 26/03/2021 03:25, Numan Siddique wrote:
> On Thu, Mar 25, 2021 at 3:01 PM Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>>
>>
>> On 24/03/2021 15:31, Numan Siddique wrote:
>>> On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>>>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>>
>>>> This adds a set of functions and macros intended to process
>>>> hashes in parallel.
>>>>
>>>> The principles of operation are documented in the ovn-parallel-hmap.h
>>>>
>>>> If these one day go into the OVS tree, the OVS tree versions
>>>> would be used in preference.
>>>>
>>>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>> Hi Anton,
>>>
>>> I tested the first 2 patches of this series and it crashes again for me.
>>>
>>> This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
>>> E3-1220 v5 @ 3.00GHz
>>>
>>> The below trace is seen for both gcc and clang.
>>>
>>> ----
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `ovn-northd -vjsonrpc
>>> --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
>>> Missing separate debuginfos, use: dnf debuginfo-install
>>> glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
>>> libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
>>> python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
>>> zlib-1.2.11-23.fc33.x86_64
>>> (gdb) bt
>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>> /lib64/libpthread.so.0
>>> #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
>>> ../lib/ovn-parallel-hmap.h:203
>>> #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
>>> #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
>>> ../lib/ovs-thread.c:383
>>> #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
>>> #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
>>> -----
>>>
>>> I'm not sure why you're not able to reproduce this issue.

I found a machine (an old 4 core Athlon) on which I can reproduce it from time to time. Not as easy as you though - it takes an hour or so for it to show up.

Based on using that for tests - no it's not the BFD race, there is one more (at least) race somewhere else as well.

Brgds,

A.

>> I can't. I have run it for days in a loop.
>>
>> One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.
>>
>> There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.
>>
>> Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.
>>
>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>> index e83ae23cb..3597f896f 100644
>> --- a/lib/ovn-parallel-hmap.c
>> +++ b/lib/ovn-parallel-hmap.c
>> @@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>            }
>>
>>            for (i = 0; i < pool_size; i++) {
>> -            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>> +            new_pool->controls[i].worker =
>> +                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>            }
>>            ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>        }
>> @@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_post(pool->controls[i].fire);
>>            }
>> +        for (i = 0; i < pool->size ; i++) {
>> +            pthread_join(pool->controls[i].worker, NULL);
>> +        }
>>            for (i = 0; i < pool->size ; i++) {
>>                sem_close(pool->controls[i].fire);
>>                sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>> index 8db61eaba..d62ca3da5 100644
>> --- a/lib/ovn-parallel-hmap.h
>> +++ b/lib/ovn-parallel-hmap.h
>> @@ -82,6 +82,7 @@ struct worker_control {
>>        struct ovs_mutex mutex; /* Guards the data. */
>>        void *data; /* Pointer to data to be processed. */
>>        void *workload; /* back-pointer to the worker pool structure. */
>> +    pthread_t worker;
>>    };
>>
>>    struct worker_pool {
>>
> I applied the above diff on top of patch 2  and did some tests.  I see
> a big improvement
> with this.  On my "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz"  server,
> I saw just one
> crash only once when I ran the test suite multiple times.
>
> On my work laptop (in which the tests used to hang earlier), all the
> tests are passing now.
> But I see a lot more consistent crashes here.  For all single run of
> whole testsuite (with make check -j5)
> I observed around 7 crashes.  Definitely an improvement when compared
> to my previous runs with v14.
>
> Here are the back traces details of the core dumps I observed -
> https://gist.github.com/numansiddique/5cab90ec4a1ee6e1adbfd3cd90eccf5a
>
> Crash 1 and Crash 2 are frequent.  Let me know in case you want the core files.
>
> Thanks
> Numan
>
>>> All the test cases passed for me. So maybe something's wrong when
>>> ovn-northd exits.
>>> IMHO, these crashes should be addressed before these patches can be considered.
>>>
>>> Thanks
>>> Numan
>>>
>>>> ---
>>>>    lib/automake.mk         |   2 +
>>>>    lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>>>>    lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>>>>    3 files changed, 758 insertions(+)
>>>>    create mode 100644 lib/ovn-parallel-hmap.c
>>>>    create mode 100644 lib/ovn-parallel-hmap.h
>>>>
>>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>>> index 250c7aefa..781be2109 100644
>>>> --- a/lib/automake.mk
>>>> +++ b/lib/automake.mk
>>>> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>>>>           lib/expr.c \
>>>>           lib/extend-table.h \
>>>>           lib/extend-table.c \
>>>> +       lib/ovn-parallel-hmap.h \
>>>> +       lib/ovn-parallel-hmap.c \
>>>>           lib/ip-mcast-index.c \
>>>>           lib/ip-mcast-index.h \
>>>>           lib/mcast-group-index.c \
>>>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>>>> new file mode 100644
>>>> index 000000000..e83ae23cb
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.c
>>>> @@ -0,0 +1,455 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#include <config.h>
>>>> +#include <stdint.h>
>>>> +#include <string.h>
>>>> +#include <stdlib.h>
>>>> +#include <fcntl.h>
>>>> +#include <unistd.h>
>>>> +#include <errno.h>
>>>> +#include <semaphore.h>
>>>> +#include "fatal-signal.h"
>>>> +#include "util.h"
>>>> +#include "openvswitch/vlog.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovn-parallel-hmap.h"
>>>> +#include "ovs-atomic.h"
>>>> +#include "ovs-thread.h"
>>>> +#include "ovs-numa.h"
>>>> +#include "random.h"
>>>> +
>>>> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
>>>> +
>>>> +#ifndef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#define WORKER_SEM_NAME "%x-%p-%x"
>>>> +#define MAIN_SEM_NAME "%x-%p-main"
>>>> +
>>>> +/* These are accessed under mutex inside add_worker_pool().
>>>> + * They do not need to be atomic.
>>>> + */
>>>> +
>>>> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
>>>> +static bool can_parallelize = false;
>>>> +
>>>> +/* This is set only in the process of exit and the set is
>>>> + * accompanied by a fence. It does not need to be atomic or be
>>>> + * accessed under a lock.
>>>> + */
>>>> +
>>>> +static bool workers_must_exit = false;
>>>> +
>>>> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
>>>> +
>>>> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
>>>> +
>>>> +static int pool_size;
>>>> +
>>>> +static int sembase;
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED);
>>>> +static void setup_worker_pools(bool force);
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index);
>>>> +
>>>> +bool ovn_stop_parallel_processing(void)
>>>> +{
>>>> +    return workers_must_exit;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel)
>>>> +{
>>>> +    bool test = false;
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(force_parallel);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +    return can_parallelize;
>>>> +}
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>>> +
>>>> +    struct worker_pool *new_pool = NULL;
>>>> +    struct worker_control *new_control;
>>>> +    bool test = false;
>>>> +    int i;
>>>> +    char sem_name[256];
>>>> +
>>>> +
>>>> +    /* Belt and braces - initialize the pool system just in case if
>>>> +     * if it is not yet initialized.
>>>> +     */
>>>> +
>>>> +    if (atomic_compare_exchange_strong(
>>>> +            &initial_pool_setup,
>>>> +            &test,
>>>> +            true)) {
>>>> +        ovs_mutex_lock(&init_mutex);
>>>> +        setup_worker_pools(false);
>>>> +        ovs_mutex_unlock(&init_mutex);
>>>> +    }
>>>> +
>>>> +    ovs_mutex_lock(&init_mutex);
>>>> +    if (can_parallelize) {
>>>> +        new_pool = xmalloc(sizeof(struct worker_pool));
>>>> +        new_pool->size = pool_size;
>>>> +        new_pool->controls = NULL;
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +        if (new_pool->done == SEM_FAILED) {
>>>> +            goto cleanup;
>>>> +        }
>>>> +
>>>> +        new_pool->controls =
>>>> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
>>>> +
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            new_control = &new_pool->controls[i];
>>>> +            new_control->id = i;
>>>> +            new_control->done = new_pool->done;
>>>> +            new_control->data = NULL;
>>>> +            ovs_mutex_init(&new_control->mutex);
>>>> +            new_control->finished = ATOMIC_VAR_INIT(false);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>> +            if (new_control->fire == SEM_FAILED) {
>>>> +                goto cleanup;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        for (i = 0; i < pool_size; i++) {
>>>> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>>> +        }
>>>> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return new_pool;
>>>> +cleanup:
>>>> +
>>>> +    /* Something went wrong when opening semaphores. In this case
>>>> +     * it is better to shut off parallel procesing altogether
>>>> +     */
>>>> +
>>>> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
>>>> +    can_parallelize = false;
>>>> +    if (new_pool->controls) {
>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>> +            if (new_pool->controls[i].fire != SEM_FAILED) {
>>>> +                sem_close(new_pool->controls[i].fire);
>>>> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>> +                sem_unlink(sem_name);
>>>> +                break; /* semaphores past this one are uninitialized */
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    if (new_pool->done != SEM_FAILED) {
>>>> +        sem_close(new_pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +    ovs_mutex_unlock(&init_mutex);
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table with mask N. */
>>>> +void
>>>> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
>>>> +    hmap->one = NULL;
>>>> +    hmap->mask = mask;
>>>> +    hmap->n = 0;
>>>> +    for (i = 0; i <= hmap->mask; i++) {
>>>> +        hmap->buckets[i] = NULL;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Initializes 'hmap' as an empty hash table of size X.
>>>> + * Intended for use in parallel processing so that all
>>>> + * fragments used to store results in a parallel job
>>>> + * are the same size.
>>>> + */
>>>> +void
>>>> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
>>>> +{
>>>> +    size_t mask;
>>>> +    mask = size / 2;
>>>> +    mask |= mask >> 1;
>>>> +    mask |= mask >> 2;
>>>> +    mask |= mask >> 4;
>>>> +    mask |= mask >> 8;
>>>> +    mask |= mask >> 16;
>>>> +#if SIZE_MAX > UINT32_MAX
>>>> +    mask |= mask >> 32;
>>>> +#endif
>>>> +
>>>> +    /* If we need to dynamically allocate buckets we might as well allocate at
>>>> +     * least 4 of them. */
>>>> +    mask |= (mask & 1) << 1;
>>>> +
>>>> +    fast_hmap_init(hmap, mask);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which uses a callback function to process results
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool,
>>>> +                           void *fin_result, void *result_frags,
>>>> +                           void (*helper_func)(struct worker_pool *pool,
>>>> +                                               void *fin_result,
>>>> +                                               void *result_frags, int index))
>>>> +{
>>>> +    int index, completed;
>>>> +
>>>> +    /* Ensure that all worker threads see the same data as the
>>>> +     * main thread.
>>>> +     */
>>>> +
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Start workers */
>>>> +
>>>> +    for (index = 0; index < pool->size; index++) {
>>>> +        sem_post(pool->controls[index].fire);
>>>> +    }
>>>> +
>>>> +    completed = 0;
>>>> +
>>>> +    do {
>>>> +        bool test;
>>>> +        /* Note - we do not loop on semaphore until it reaches
>>>> +         * zero, but on pool size/remaining workers.
>>>> +         * This is by design. If the inner loop can handle
>>>> +         * completion for more than one worker within an iteration
>>>> +         * it will do so to ensure no additional iterations and
>>>> +         * waits once all of them are done.
>>>> +         *
>>>> +         * This may result in us having an initial positive value
>>>> +         * of the semaphore when the pool is invoked the next time.
>>>> +         * This is harmless - the loop will spin up a couple of times
>>>> +         * doing nothing while the workers are processing their data
>>>> +         * slices.
>>>> +         */
>>>> +        wait_for_work_completion(pool);
>>>> +        for (index = 0; index < pool->size; index++) {
>>>> +            test = true;
>>>> +            /* If the worker has marked its data chunk as complete,
>>>> +             * invoke the helper function to combine the results of
>>>> +             * this worker into the main result.
>>>> +             *
>>>> +             * The worker must invoke an appropriate memory fence
>>>> +             * (most likely acq_rel) to ensure that the main thread
>>>> +             * sees all of the results produced by the worker.
>>>> +             */
>>>> +            if (atomic_compare_exchange_weak(
>>>> +                    &pool->controls[index].finished,
>>>> +                    &test,
>>>> +                    false)) {
>>>> +                if (helper_func) {
>>>> +                    (helper_func)(pool, fin_result, result_frags, index);
>>>> +                }
>>>> +                completed++;
>>>> +                pool->controls[index].data = NULL;
>>>> +            }
>>>> +        }
>>>> +    } while (completed < pool->size);
>>>> +}
>>>> +
>>>> +/* Run a thread pool - basic, does not do results processing.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool)
>>>> +{
>>>> +    run_pool_callback(pool, NULL, NULL, NULL);
>>>> +}
>>>> +
>>>> +/* Brute force merge of a hashmap into another hashmap.
>>>> + * Intended for use in parallel processing. The destination
>>>> + * hashmap MUST be the same size as the one being merged.
>>>> + *
>>>> + * This can be achieved by pre-allocating them to correct size
>>>> + * and using hmap_insert_fast() instead of hmap_insert()
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>> +    ovs_assert(inc->mask == dest->mask);
>>>> +
>>>> +    if (!inc->n) {
>>>> +        /* Request to merge an empty frag, nothing to do */
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i <= dest->mask; i++) {
>>>> +        struct hmap_node **dest_bucket = &dest->buckets[i];
>>>> +        struct hmap_node **inc_bucket = &inc->buckets[i];
>>>> +        if (*inc_bucket != NULL) {
>>>> +            struct hmap_node *last_node = *inc_bucket;
>>>> +            while (last_node->next != NULL) {
>>>> +                last_node = last_node->next;
>>>> +            }
>>>> +            last_node->next = *dest_bucket;
>>>> +            *dest_bucket = *inc_bucket;
>>>> +            *inc_bucket = NULL;
>>>> +        }
>>>> +    }
>>>> +    dest->n += inc->n;
>>>> +    inc->n = 0;
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array
>>>> + * of hashes. Merge results.
>>>> + */
>>>> +
>>>> +
>>>> +void ovn_run_pool_hash(
>>>> +        struct worker_pool *pool,
>>>> +        struct hmap *result,
>>>> +        struct hmap *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
>>>> +}
>>>> +
>>>> +/* Run a thread pool which gathers results in an array of lists.
>>>> + * Merge results.
>>>> + */
>>>> +void ovn_run_pool_list(
>>>> +        struct worker_pool *pool,
>>>> +        struct ovs_list *result,
>>>> +        struct ovs_list *result_frags)
>>>> +{
>>>> +    run_pool_callback(pool, result, result_frags, merge_list_results);
>>>> +}
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
>>>> +{
>>>> +    int i;
>>>> +    if (hrl->mask != lflows->mask) {
>>>> +        if (hrl->row_locks) {
>>>> +            free(hrl->row_locks);
>>>> +        }
>>>> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
>>>> +        hrl->mask = lflows->mask;
>>>> +        for (i = 0; i <= lflows->mask; i++) {
>>>> +            ovs_mutex_init(&hrl->row_locks[i]);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void worker_pool_hook(void *aux OVS_UNUSED) {
>>>> +    int i;
>>>> +    static struct worker_pool *pool;
>>>> +    char sem_name[256];
>>>> +
>>>> +    workers_must_exit = true;
>>>> +
>>>> +    /* All workers must honour the must_exit flag and check for it regularly.
>>>> +     * We can make it atomic and check it via atomics in workers, but that
>>>> +     * is not really necessary as it is set just once - when the program
>>>> +     * terminates. So we use a fence which is invoked before exiting instead.
>>>> +     */
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +
>>>> +    /* Wake up the workers after the must_exit flag has been set */
>>>> +
>>>> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_post(pool->controls[i].fire);
>>>> +        }
>>>> +        for (i = 0; i < pool->size ; i++) {
>>>> +            sem_close(pool->controls[i].fire);
>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>>>> +            sem_unlink(sem_name);
>>>> +        }
>>>> +        sem_close(pool->done);
>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
>>>> +        sem_unlink(sem_name);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void setup_worker_pools(bool force) {
>>>> +    int cores, nodes;
>>>> +
>>>> +    nodes = ovs_numa_get_n_numas();
>>>> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>>>> +        nodes = 1;
>>>> +    }
>>>> +    cores = ovs_numa_get_n_cores();
>>>> +
>>>> +    /* If there is no NUMA config, use 4 cores.
>>>> +     * If there is NUMA config use half the cores on
>>>> +     * one node so that the OS does not start pushing
>>>> +     * threads to other nodes.
>>>> +     */
>>>> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
>>>> +        /* If there is no NUMA we can try the ovs-threads routine.
>>>> +         * It falls back to sysconf and/or affinity mask.
>>>> +         */
>>>> +        cores = count_cpu_cores();
>>>> +        pool_size = cores;
>>>> +    } else {
>>>> +        pool_size = cores / nodes;
>>>> +    }
>>>> +    if ((pool_size < 4) && force) {
>>>> +        pool_size = 4;
>>>> +    }
>>>> +    can_parallelize = (pool_size >= 3);
>>>> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
>>>> +    sembase = random_uint32();
>>>> +}
>>>> +
>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct ovs_list *result = (struct ovs_list *)fin_result;
>>>> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
>>>> +
>>>> +    if (!ovs_list_is_empty(&res_frags[index])) {
>>>> +        ovs_list_splice(result->next,
>>>> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>> +                               void *fin_result, void *result_frags,
>>>> +                               int index)
>>>> +{
>>>> +    struct hmap *result = (struct hmap *)fin_result;
>>>> +    struct hmap *res_frags = (struct hmap *)result_frags;
>>>> +
>>>> +    fast_hmap_merge(result, &res_frags[index]);
>>>> +    hmap_destroy(&res_frags[index]);
>>>> +}
>>>> +
>>>> +#endif
>>>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>>>> new file mode 100644
>>>> index 000000000..8db61eaba
>>>> --- /dev/null
>>>> +++ b/lib/ovn-parallel-hmap.h
>>>> @@ -0,0 +1,301 @@
>>>> +/*
>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at:
>>>> + *
>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#ifndef OVN_PARALLEL_HMAP
>>>> +#define OVN_PARALLEL_HMAP 1
>>>> +
>>>> +/* if the parallel macros are defined by hmap.h or any other ovs define
>>>> + * we skip over the ovn specific definitions.
>>>> + */
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdlib.h>
>>>> +#include <semaphore.h>
>>>> +#include <errno.h>
>>>> +#include "openvswitch/util.h"
>>>> +#include "openvswitch/hmap.h"
>>>> +#include "openvswitch/thread.h"
>>>> +#include "ovs-atomic.h"
>>>> +
>>>> +/* Process this include only if OVS does not supply parallel definitions
>>>> + */
>>>> +
>>>> +#ifdef OVS_HAS_PARALLEL_HMAP
>>>> +
>>>> +#include "parallel-hmap.h"
>>>> +
>>>> +#else
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic push
>>>> +#pragma clang diagnostic ignored "-Wthread-safety"
>>>> +#endif
>>>> +
>>>> +
>>>> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
>>>> + * of parallel processing.
>>>> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
>>>> + * and will iterate hash buckets ThreadID, ThreadID + step,
>>>> + * ThreadID + step * 2, etc. The actual macro accepts
>>>> + * ThreadID + step * i as the JOBID parameter.
>>>> + */
>>>> +
>>>> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
>>>> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
>>>> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
>>>> +       || ((NODE = NULL), false); \
>>>> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
>>>> +
>>>> +/* We do not have a SAFE version of the macro, because the hash size is not
>>>> + * atomic and hash removal operations would need to be wrapped with
>>>> + * locks. This will defeat most of the benefits from doing anything in
>>>> + * parallel.
>>>> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
>>>> + * each thread should store them in a temporary list result instead, merging
>>>> + * the lists into a combined result at the end */
>>>> +
>>>> +/* Work "Handle" */
>>>> +
>>>> +struct worker_control {
>>>> +    int id; /* Used as a modulo when iterating over a hash. */
>>>> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
>>>> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
>>>> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
>>>> +    struct ovs_mutex mutex; /* Guards the data. */
>>>> +    void *data; /* Pointer to data to be processed. */
>>>> +    void *workload; /* back-pointer to the worker pool structure. */
>>>> +};
>>>> +
>>>> +struct worker_pool {
>>>> +    int size;   /* Number of threads in the pool. */
>>>> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
>>>> +    struct worker_control *controls; /* "Handles" in this pool. */
>>>> +    sem_t *done; /* Work completion semaphorew. */
>>>> +};
>>>> +
>>>> +/* Add a worker pool for thread function start() which expects a pointer to
>>>> + * a worker_control structure as an argument. */
>>>> +
>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
>>>> +
>>>> +/* Setting this to true will make all processing threads exit */
>>>> +
>>>> +bool ovn_stop_parallel_processing(void);
>>>> +
>>>> +/* Build a hmap pre-sized for size elements */
>>>> +
>>>> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
>>>> +
>>>> +/* Build a hmap with a mask equals to size */
>>>> +
>>>> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
>>>> +
>>>> +/* Brute-force merge a hmap into hmap.
>>>> + * Dest and inc have to have the same mask. The merge is performed
>>>> + * by extending the element list for bucket N in the dest hmap with the list
>>>> + * from bucket N in inc.
>>>> + */
>>>> +
>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
>>>> +
>>>> +/* Run a pool, without any default processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool(struct worker_pool *pool);
>>>> +
>>>> +/* Run a pool, merge results from hash frags into a final hash result.
>>>> + * The hash frags must be pre-sized to the same size.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_hash(struct worker_pool *pool,
>>>> +                       struct hmap *result, struct hmap *result_frags);
>>>> +/* Run a pool, merge results from list frags into a final list result.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_list(struct worker_pool *pool,
>>>> +                       struct ovs_list *result, struct ovs_list *result_frags);
>>>> +
>>>> +/* Run a pool, call a callback function to perform processing of results.
>>>> + */
>>>> +
>>>> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
>>>> +                    void *result_frags,
>>>> +                    void (*helper_func)(struct worker_pool *pool,
>>>> +                        void *fin_result, void *result_frags, int index));
>>>> +
>>>> +
>>>> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
>>>> + * would land, or a null pointer if that bucket is empty. */
>>>> +
>>>> +static inline struct hmap_node *
>>>> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
>>>> +{
>>>> +    return hmap->buckets[num];
>>>> +}
>>>> +
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
>>>> +{
>>>> +    size_t i;
>>>> +    for (i = start; i <= hmap->mask; i+= pool_size) {
>>>> +        struct hmap_node *node = hmap->buckets[i];
>>>> +        if (node) {
>>>> +            return node;
>>>> +        }
>>>> +    }
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +/* Returns the first node in 'hmap', as expected by thread with job_id
>>>> + * for parallel processing in arbitrary order, or a null pointer if
>>>> + * the slice of 'hmap' for that job_id is empty. */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
>>>> +{
>>>> +    return parallel_hmap_next__(hmap, job_id, pool_size);
>>>> +}
>>>> +
>>>> +/* Returns the next node in the slice of 'hmap' following 'node',
>>>> + * in arbitrary order, or a * null pointer if 'node' is the last node in
>>>> + * the 'hmap' slice.
>>>> + *
>>>> + */
>>>> +static inline struct hmap_node *
>>>> +parallel_hmap_next(const struct hmap *hmap,
>>>> +                   const struct hmap_node *node, ssize_t pool_size)
>>>> +{
>>>> +    return (node->next
>>>> +            ? node->next
>>>> +            : parallel_hmap_next__(hmap,
>>>> +                (node->hash & hmap->mask) + pool_size, pool_size));
>>>> +}
>>>> +
>>>> +static inline void post_completed_work(struct worker_control *control)
>>>> +{
>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>> +    atomic_store_relaxed(&control->finished, true);
>>>> +    sem_post(control->done);
>>>> +}
>>>> +
>>>> +static inline void wait_for_work(struct worker_control *control)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(control->fire);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +static inline void wait_for_work_completion(struct worker_pool *pool)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    do {
>>>> +        ret = sem_wait(pool->done);
>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>> +    ovs_assert(ret == 0);
>>>> +}
>>>> +
>>>> +
>>>> +/* Hash per-row locking support - to be used only in conjunction
>>>> + * with fast hash inserts. Normal hash inserts may resize the hash
>>>> + * rendering the locking invalid.
>>>> + */
>>>> +
>>>> +struct hashrow_locks {
>>>> +    ssize_t mask;
>>>> +    struct ovs_mutex *row_locks;
>>>> +};
>>>> +
>>>> +/* Update an hash row locks structure to match the current hash size */
>>>> +
>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
>>>> +
>>>> +/* Lock a hash row */
>>>> +
>>>> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +
>>>> +/* Unlock a hash row */
>>>> +
>>>> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>> +{
>>>> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
>>>> +}
>>>> +/* Init the row locks structure */
>>>> +
>>>> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
>>>> +{
>>>> +    hrl->mask = 0;
>>>> +    hrl->row_locks = NULL;
>>>> +}
>>>> +
>>>> +bool ovn_can_parallelize_hashes(bool force_parallel);
>>>> +
>>>> +/* Use the OVN library functions for stuff which OVS has not defined
>>>> + * If OVS has defined these, they will still compile using the OVN
>>>> + * local names, but will be dropped by the linker in favour of the OVS
>>>> + * supplied functions.
>>>> + */
>>>> +
>>>> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
>>>> +
>>>> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
>>>> +
>>>> +#define stop_parallel_processing() ovn_stop_parallel_processing()
>>>> +
>>>> +#define add_worker_pool(start) ovn_add_worker_pool(start)
>>>> +
>>>> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>>>> +
>>>> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
>>>> +
>>>> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
>>>> +
>>>> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
>>>> +
>>>> +#define ovn_run_pool(pool) ovn_run_pool(pool)
>>>> +
>>>> +#define run_pool_hash(pool, result, result_frags) \
>>>> +    ovn_run_pool_hash(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_list(pool, result, result_frags) \
>>>> +    ovn_run_pool_list(pool, result, result_frags)
>>>> +
>>>> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
>>>> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
>>>> +
>>>> +
>>>> +
>>>> +#ifdef __clang__
>>>> +#pragma clang diagnostic pop
>>>> +#endif
>>>> +
>>>> +#endif
>>>> +
>>>> +#ifdef  __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +
>>>> +#endif /* lib/fasthmap.h */
>>>> --
>>>> 2.20.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661
>> https://www.cambridgegreys.com/
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Anton Ivanov March 29, 2021, 3:48 p.m. UTC | #10
I have the it cleaned up and it is stable on the machine where I could reproduce your issue.

It was all in the exit sequence. That is why the testsuite was green - crashes were after all processing was complete.

I will clean it up, run it overnight for a final test and if it's all OK send a new version tomorrow.

A.

On 26/03/2021 08:07, Anton Ivanov wrote:
> On 26/03/2021 03:25, Numan Siddique wrote:
>> On Thu, Mar 25, 2021 at 3:01 PM Anton Ivanov
>> <anton.ivanov@cambridgegreys.com> wrote:
>>>
>>>
>>> On 24/03/2021 15:31, Numan Siddique wrote:
>>>> On Mon, Mar 1, 2021 at 6:35 PM <anton.ivanov@cambridgegreys.com> wrote:
>>>>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>>>
>>>>> This adds a set of functions and macros intended to process
>>>>> hashes in parallel.
>>>>>
>>>>> The principles of operation are documented in the ovn-parallel-hmap.h
>>>>>
>>>>> If these one day go into the OVS tree, the OVS tree versions
>>>>> would be used in preference.
>>>>>
>>>>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>> Hi Anton,
>>>>
>>>> I tested the first 2 patches of this series and it crashes again for me.
>>>>
>>>> This time I ran tests on a 4 core  machine - Intel(R) Xeon(R) CPU
>>>> E3-1220 v5 @ 3.00GHz
>>>>
>>>> The below trace is seen for both gcc and clang.
>>>>
>>>> ----
>>>> [Thread debugging using libthread_db enabled]
>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>> Core was generated by `ovn-northd -vjsonrpc
>>>> --ovnnb-db=unix:/mnt/mydisk/myhome/numan_alt/work/ovs_ovn/'.
>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>>> /lib64/libpthread.so.0
>>>> [Current thread is 1 (Thread 0x7f2758c68640 (LWP 347378))]
>>>> Missing separate debuginfos, use: dnf debuginfo-install
>>>> glibc-2.32-3.fc33.x86_64 libcap-ng-0.8-1.fc33.x86_64
>>>> libevent-2.1.8-10.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64
>>>> python3-libs-3.9.1-2.fc33.x86_64 unbound-libs-1.10.1-4.fc33.x86_64
>>>> zlib-1.2.11-23.fc33.x86_64
>>>> (gdb) bt
>>>> #0  0x00007f27594ae212 in __new_sem_wait_slow.constprop.0 () from
>>>> /lib64/libpthread.so.0
>>>> #1  0x0000000000422184 in wait_for_work (control=<optimized out>) at
>>>> ../lib/ovn-parallel-hmap.h:203
>>>> #2  build_lflows_thread (arg=0x2538420) at ../northd/ovn-northd.c:11855
>>>> #3  0x000000000049cd12 in ovsthread_wrapper (aux_=<optimized out>) at
>>>> ../lib/ovs-thread.c:383
>>>> #4  0x00007f27594a53f9 in start_thread () from /lib64/libpthread.so.0
>>>> #5  0x00007f2759142903 in clone () from /lib64/libc.so.6
>>>> -----
>>>>
>>>> I'm not sure why you're not able to reproduce this issue.
>>> I can't. I have run it for days in a loop.
>>>
>>> One possibility is that for whatever reason your machine has slower IPC speeds compared to linear execution speeds. Thread debugging? AMD vs Intel? No idea.
>>>
>>> There is a race on-exit in the current code which I have found by inspection and which I have never been able to trigger. On my machines the workers always exit in time before the main thread has finished, so I cannot trigger this.
>>>
>>> Can you try this incremental fix to see if it fixes the problem for you. If that works, I will incorporate it and reissue the patch. If not - I will continue digging.
>>>
>>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>>> index e83ae23cb..3597f896f 100644
>>> --- a/lib/ovn-parallel-hmap.c
>>> +++ b/lib/ovn-parallel-hmap.c
>>> @@ -143,7 +143,8 @@ struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>>            }
>>>
>>>            for (i = 0; i < pool_size; i++) {
>>> -            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>> +            new_pool->controls[i].worker =
>>> +                ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>>            }
>>>            ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>>        }
>>> @@ -386,6 +387,9 @@ static void worker_pool_hook(void *aux OVS_UNUSED) {
>>>            for (i = 0; i < pool->size ; i++) {
>>>                sem_post(pool->controls[i].fire);
>>>            }
>>> +        for (i = 0; i < pool->size ; i++) {
>>> +            pthread_join(pool->controls[i].worker, NULL);
>>> +        }
>>>            for (i = 0; i < pool->size ; i++) {
>>>                sem_close(pool->controls[i].fire);
>>>                sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>>> index 8db61eaba..d62ca3da5 100644
>>> --- a/lib/ovn-parallel-hmap.h
>>> +++ b/lib/ovn-parallel-hmap.h
>>> @@ -82,6 +82,7 @@ struct worker_control {
>>>        struct ovs_mutex mutex; /* Guards the data. */
>>>        void *data; /* Pointer to data to be processed. */
>>>        void *workload; /* back-pointer to the worker pool structure. */
>>> +    pthread_t worker;
>>>    };
>>>
>>>    struct worker_pool {
>>>
>> I applied the above diff on top of patch 2  and did some tests. I see
>> a big improvement
>> with this.  On my "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz" server,
>> I saw just one
>> crash only once when I ran the test suite multiple times.
>>
>> On my work laptop (in which the tests used to hang earlier), all the
>> tests are passing now.
>> But I see a lot more consistent crashes here.  For all single run of
>> whole testsuite (with make check -j5)
>> I observed around 7 crashes.  Definitely an improvement when compared
>> to my previous runs with v14.
>>
>> Here are the back traces details of the core dumps I observed -
>> https://gist.github.com/numansiddique/5cab90ec4a1ee6e1adbfd3cd90eccf5a
>>
>> Crash 1 and Crash 2 are frequent.  Let me know in case you want the core files.
>
> Not really. Traces are informative.
>
> I have no idea why I cannot reproduce these, but I can see where (roughly) is the problem.
>
> I can't see why (yet). The place where it crashes in 1 and 2 is the brute-force hash merge code which is trivial, runs on every iteration and has been tested quite thoroughly.
>
> I will look at it later today.
>
> Brgds,
>
>>
>> Thanks
>> Numan
>>
>>>> All the test cases passed for me. So maybe something's wrong when
>>>> ovn-northd exits.
>>>> IMHO, these crashes should be addressed before these patches can be considered.
>>>>
>>>> Thanks
>>>> Numan
>>>>
>>>>> ---
>>>>>    lib/automake.mk         |   2 +
>>>>>    lib/ovn-parallel-hmap.c | 455 ++++++++++++++++++++++++++++++++++++++++
>>>>>    lib/ovn-parallel-hmap.h | 301 ++++++++++++++++++++++++++
>>>>>    3 files changed, 758 insertions(+)
>>>>>    create mode 100644 lib/ovn-parallel-hmap.c
>>>>>    create mode 100644 lib/ovn-parallel-hmap.h
>>>>>
>>>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>>>> index 250c7aefa..781be2109 100644
>>>>> --- a/lib/automake.mk
>>>>> +++ b/lib/automake.mk
>>>>> @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \
>>>>>           lib/expr.c \
>>>>>           lib/extend-table.h \
>>>>>           lib/extend-table.c \
>>>>> +       lib/ovn-parallel-hmap.h \
>>>>> +       lib/ovn-parallel-hmap.c \
>>>>>           lib/ip-mcast-index.c \
>>>>>           lib/ip-mcast-index.h \
>>>>>           lib/mcast-group-index.c \
>>>>> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
>>>>> new file mode 100644
>>>>> index 000000000..e83ae23cb
>>>>> --- /dev/null
>>>>> +++ b/lib/ovn-parallel-hmap.c
>>>>> @@ -0,0 +1,455 @@
>>>>> +/*
>>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>>> + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
>>>>> + *
>>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>>> + * you may not use this file except in compliance with the License.
>>>>> + * You may obtain a copy of the License at:
>>>>> + *
>>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>>> + *
>>>>> + * Unless required by applicable law or agreed to in writing, software
>>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>>> + * See the License for the specific language governing permissions and
>>>>> + * limitations under the License.
>>>>> + */
>>>>> +
>>>>> +#include <config.h>
>>>>> +#include <stdint.h>
>>>>> +#include <string.h>
>>>>> +#include <stdlib.h>
>>>>> +#include <fcntl.h>
>>>>> +#include <unistd.h>
>>>>> +#include <errno.h>
>>>>> +#include <semaphore.h>
>>>>> +#include "fatal-signal.h"
>>>>> +#include "util.h"
>>>>> +#include "openvswitch/vlog.h"
>>>>> +#include "openvswitch/hmap.h"
>>>>> +#include "openvswitch/thread.h"
>>>>> +#include "ovn-parallel-hmap.h"
>>>>> +#include "ovs-atomic.h"
>>>>> +#include "ovs-thread.h"
>>>>> +#include "ovs-numa.h"
>>>>> +#include "random.h"
>>>>> +
>>>>> +VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
>>>>> +
>>>>> +#ifndef OVS_HAS_PARALLEL_HMAP
>>>>> +
>>>>> +#define WORKER_SEM_NAME "%x-%p-%x"
>>>>> +#define MAIN_SEM_NAME "%x-%p-main"
>>>>> +
>>>>> +/* These are accessed under mutex inside add_worker_pool().
>>>>> + * They do not need to be atomic.
>>>>> + */
>>>>> +
>>>>> +static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
>>>>> +static bool can_parallelize = false;
>>>>> +
>>>>> +/* This is set only in the process of exit and the set is
>>>>> + * accompanied by a fence. It does not need to be atomic or be
>>>>> + * accessed under a lock.
>>>>> + */
>>>>> +
>>>>> +static bool workers_must_exit = false;
>>>>> +
>>>>> +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
>>>>> +
>>>>> +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
>>>>> +
>>>>> +static int pool_size;
>>>>> +
>>>>> +static int sembase;
>>>>> +
>>>>> +static void worker_pool_hook(void *aux OVS_UNUSED);
>>>>> +static void setup_worker_pools(bool force);
>>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>>> +                               void *fin_result, void *result_frags,
>>>>> +                               int index);
>>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>>> +                               void *fin_result, void *result_frags,
>>>>> +                               int index);
>>>>> +
>>>>> +bool ovn_stop_parallel_processing(void)
>>>>> +{
>>>>> +    return workers_must_exit;
>>>>> +}
>>>>> +
>>>>> +bool ovn_can_parallelize_hashes(bool force_parallel)
>>>>> +{
>>>>> +    bool test = false;
>>>>> +
>>>>> +    if (atomic_compare_exchange_strong(
>>>>> +            &initial_pool_setup,
>>>>> +            &test,
>>>>> +            true)) {
>>>>> +        ovs_mutex_lock(&init_mutex);
>>>>> +        setup_worker_pools(force_parallel);
>>>>> +        ovs_mutex_unlock(&init_mutex);
>>>>> +    }
>>>>> +    return can_parallelize;
>>>>> +}
>>>>> +
>>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
>>>>> +
>>>>> +    struct worker_pool *new_pool = NULL;
>>>>> +    struct worker_control *new_control;
>>>>> +    bool test = false;
>>>>> +    int i;
>>>>> +    char sem_name[256];
>>>>> +
>>>>> +
>>>>> +    /* Belt and braces - initialize the pool system just in case if
>>>>> +     * if it is not yet initialized.
>>>>> +     */
>>>>> +
>>>>> +    if (atomic_compare_exchange_strong(
>>>>> +            &initial_pool_setup,
>>>>> +            &test,
>>>>> +            true)) {
>>>>> +        ovs_mutex_lock(&init_mutex);
>>>>> +        setup_worker_pools(false);
>>>>> +        ovs_mutex_unlock(&init_mutex);
>>>>> +    }
>>>>> +
>>>>> +    ovs_mutex_lock(&init_mutex);
>>>>> +    if (can_parallelize) {
>>>>> +        new_pool = xmalloc(sizeof(struct worker_pool));
>>>>> +        new_pool->size = pool_size;
>>>>> +        new_pool->controls = NULL;
>>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>>> +        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>>> +        if (new_pool->done == SEM_FAILED) {
>>>>> +            goto cleanup;
>>>>> +        }
>>>>> +
>>>>> +        new_pool->controls =
>>>>> +            xmalloc(sizeof(struct worker_control) * new_pool->size);
>>>>> +
>>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>>> +            new_control = &new_pool->controls[i];
>>>>> +            new_control->id = i;
>>>>> +            new_control->done = new_pool->done;
>>>>> +            new_control->data = NULL;
>>>>> +            ovs_mutex_init(&new_control->mutex);
>>>>> +            new_control->finished = ATOMIC_VAR_INIT(false);
>>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>>> +            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
>>>>> +            if (new_control->fire == SEM_FAILED) {
>>>>> +                goto cleanup;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        for (i = 0; i < pool_size; i++) {
>>>>> +            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
>>>>> +        }
>>>>> +        ovs_list_push_back(&worker_pools, &new_pool->list_node);
>>>>> +    }
>>>>> +    ovs_mutex_unlock(&init_mutex);
>>>>> +    return new_pool;
>>>>> +cleanup:
>>>>> +
>>>>> +    /* Something went wrong when opening semaphores. In this case
>>>>> +     * it is better to shut off parallel procesing altogether
>>>>> +     */
>>>>> +
>>>>> +    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
>>>>> +    can_parallelize = false;
>>>>> +    if (new_pool->controls) {
>>>>> +        for (i = 0; i < new_pool->size; i++) {
>>>>> +            if (new_pool->controls[i].fire != SEM_FAILED) {
>>>>> +                sem_close(new_pool->controls[i].fire);
>>>>> +                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
>>>>> +                sem_unlink(sem_name);
>>>>> +                break; /* semaphores past this one are uninitialized */
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +    if (new_pool->done != SEM_FAILED) {
>>>>> +        sem_close(new_pool->done);
>>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
>>>>> +        sem_unlink(sem_name);
>>>>> +    }
>>>>> +    ovs_mutex_unlock(&init_mutex);
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>> +
>>>>> +/* Initializes 'hmap' as an empty hash table with mask N. */
>>>>> +void
>>>>> +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
>>>>> +{
>>>>> +    size_t i;
>>>>> +
>>>>> +    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
>>>>> +    hmap->one = NULL;
>>>>> +    hmap->mask = mask;
>>>>> +    hmap->n = 0;
>>>>> +    for (i = 0; i <= hmap->mask; i++) {
>>>>> +        hmap->buckets[i] = NULL;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Initializes 'hmap' as an empty hash table of size X.
>>>>> + * Intended for use in parallel processing so that all
>>>>> + * fragments used to store results in a parallel job
>>>>> + * are the same size.
>>>>> + */
>>>>> +void
>>>>> +ovn_fast_hmap_size_for(struct hmap *hmap, int size)
>>>>> +{
>>>>> +    size_t mask;
>>>>> +    mask = size / 2;
>>>>> +    mask |= mask >> 1;
>>>>> +    mask |= mask >> 2;
>>>>> +    mask |= mask >> 4;
>>>>> +    mask |= mask >> 8;
>>>>> +    mask |= mask >> 16;
>>>>> +#if SIZE_MAX > UINT32_MAX
>>>>> +    mask |= mask >> 32;
>>>>> +#endif
>>>>> +
>>>>> +    /* If we need to dynamically allocate buckets we might as well allocate at
>>>>> +     * least 4 of them. */
>>>>> +    mask |= (mask & 1) << 1;
>>>>> +
>>>>> +    fast_hmap_init(hmap, mask);
>>>>> +}
>>>>> +
>>>>> +/* Run a thread pool which uses a callback function to process results
>>>>> + */
>>>>> +
>>>>> +void ovn_run_pool_callback(struct worker_pool *pool,
>>>>> +                           void *fin_result, void *result_frags,
>>>>> +                           void (*helper_func)(struct worker_pool *pool,
>>>>> +                                               void *fin_result,
>>>>> +                                               void *result_frags, int index))
>>>>> +{
>>>>> +    int index, completed;
>>>>> +
>>>>> +    /* Ensure that all worker threads see the same data as the
>>>>> +     * main thread.
>>>>> +     */
>>>>> +
>>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>>> +
>>>>> +    /* Start workers */
>>>>> +
>>>>> +    for (index = 0; index < pool->size; index++) {
>>>>> +        sem_post(pool->controls[index].fire);
>>>>> +    }
>>>>> +
>>>>> +    completed = 0;
>>>>> +
>>>>> +    do {
>>>>> +        bool test;
>>>>> +        /* Note - we do not loop on semaphore until it reaches
>>>>> +         * zero, but on pool size/remaining workers.
>>>>> +         * This is by design. If the inner loop can handle
>>>>> +         * completion for more than one worker within an iteration
>>>>> +         * it will do so to ensure no additional iterations and
>>>>> +         * waits once all of them are done.
>>>>> +         *
>>>>> +         * This may result in us having an initial positive value
>>>>> +         * of the semaphore when the pool is invoked the next time.
>>>>> +         * This is harmless - the loop will spin up a couple of times
>>>>> +         * doing nothing while the workers are processing their data
>>>>> +         * slices.
>>>>> +         */
>>>>> +        wait_for_work_completion(pool);
>>>>> +        for (index = 0; index < pool->size; index++) {
>>>>> +            test = true;
>>>>> +            /* If the worker has marked its data chunk as complete,
>>>>> +             * invoke the helper function to combine the results of
>>>>> +             * this worker into the main result.
>>>>> +             *
>>>>> +             * The worker must invoke an appropriate memory fence
>>>>> +             * (most likely acq_rel) to ensure that the main thread
>>>>> +             * sees all of the results produced by the worker.
>>>>> +             */
>>>>> +            if (atomic_compare_exchange_weak(
>>>>> + &pool->controls[index].finished,
>>>>> +                    &test,
>>>>> +                    false)) {
>>>>> +                if (helper_func) {
>>>>> +                    (helper_func)(pool, fin_result, result_frags, index);
>>>>> +                }
>>>>> +                completed++;
>>>>> +                pool->controls[index].data = NULL;
>>>>> +            }
>>>>> +        }
>>>>> +    } while (completed < pool->size);
>>>>> +}
>>>>> +
>>>>> +/* Run a thread pool - basic, does not do results processing.
>>>>> + */
>>>>> +
>>>>> +void ovn_run_pool(struct worker_pool *pool)
>>>>> +{
>>>>> +    run_pool_callback(pool, NULL, NULL, NULL);
>>>>> +}
>>>>> +
>>>>> +/* Brute force merge of a hashmap into another hashmap.
>>>>> + * Intended for use in parallel processing. The destination
>>>>> + * hashmap MUST be the same size as the one being merged.
>>>>> + *
>>>>> + * This can be achieved by pre-allocating them to correct size
>>>>> + * and using hmap_insert_fast() instead of hmap_insert()
>>>>> + */
>>>>> +
>>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
>>>>> +{
>>>>> +    size_t i;
>>>>> +
>>>>> +    ovs_assert(inc->mask == dest->mask);
>>>>> +
>>>>> +    if (!inc->n) {
>>>>> +        /* Request to merge an empty frag, nothing to do */
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    for (i = 0; i <= dest->mask; i++) {
>>>>> +        struct hmap_node **dest_bucket = &dest->buckets[i];
>>>>> +        struct hmap_node **inc_bucket = &inc->buckets[i];
>>>>> +        if (*inc_bucket != NULL) {
>>>>> +            struct hmap_node *last_node = *inc_bucket;
>>>>> +            while (last_node->next != NULL) {
>>>>> +                last_node = last_node->next;
>>>>> +            }
>>>>> +            last_node->next = *dest_bucket;
>>>>> +            *dest_bucket = *inc_bucket;
>>>>> +            *inc_bucket = NULL;
>>>>> +        }
>>>>> +    }
>>>>> +    dest->n += inc->n;
>>>>> +    inc->n = 0;
>>>>> +}
>>>>> +
>>>>> +/* Run a thread pool which gathers results in an array
>>>>> + * of hashes. Merge results.
>>>>> + */
>>>>> +
>>>>> +
>>>>> +void ovn_run_pool_hash(
>>>>> +        struct worker_pool *pool,
>>>>> +        struct hmap *result,
>>>>> +        struct hmap *result_frags)
>>>>> +{
>>>>> +    run_pool_callback(pool, result, result_frags, merge_hash_results);
>>>>> +}
>>>>> +
>>>>> +/* Run a thread pool which gathers results in an array of lists.
>>>>> + * Merge results.
>>>>> + */
>>>>> +void ovn_run_pool_list(
>>>>> +        struct worker_pool *pool,
>>>>> +        struct ovs_list *result,
>>>>> +        struct ovs_list *result_frags)
>>>>> +{
>>>>> +    run_pool_callback(pool, result, result_frags, merge_list_results);
>>>>> +}
>>>>> +
>>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
>>>>> +{
>>>>> +    int i;
>>>>> +    if (hrl->mask != lflows->mask) {
>>>>> +        if (hrl->row_locks) {
>>>>> +            free(hrl->row_locks);
>>>>> +        }
>>>>> +        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
>>>>> +        hrl->mask = lflows->mask;
>>>>> +        for (i = 0; i <= lflows->mask; i++) {
>>>>> +            ovs_mutex_init(&hrl->row_locks[i]);
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static void worker_pool_hook(void *aux OVS_UNUSED) {
>>>>> +    int i;
>>>>> +    static struct worker_pool *pool;
>>>>> +    char sem_name[256];
>>>>> +
>>>>> +    workers_must_exit = true;
>>>>> +
>>>>> +    /* All workers must honour the must_exit flag and check for it regularly.
>>>>> +     * We can make it atomic and check it via atomics in workers, but that
>>>>> +     * is not really necessary as it is set just once - when the program
>>>>> +     * terminates. So we use a fence which is invoked before exiting instead.
>>>>> +     */
>>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>>> +
>>>>> +    /* Wake up the workers after the must_exit flag has been set */
>>>>> +
>>>>> +    LIST_FOR_EACH (pool, list_node, &worker_pools) {
>>>>> +        for (i = 0; i < pool->size ; i++) {
>>>>> +            sem_post(pool->controls[i].fire);
>>>>> +        }
>>>>> +        for (i = 0; i < pool->size ; i++) {
>>>>> +            sem_close(pool->controls[i].fire);
>>>>> +            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
>>>>> +            sem_unlink(sem_name);
>>>>> +        }
>>>>> +        sem_close(pool->done);
>>>>> +        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
>>>>> +        sem_unlink(sem_name);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static void setup_worker_pools(bool force) {
>>>>> +    int cores, nodes;
>>>>> +
>>>>> +    nodes = ovs_numa_get_n_numas();
>>>>> +    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>>>>> +        nodes = 1;
>>>>> +    }
>>>>> +    cores = ovs_numa_get_n_cores();
>>>>> +
>>>>> +    /* If there is no NUMA config, use 4 cores.
>>>>> +     * If there is NUMA config use half the cores on
>>>>> +     * one node so that the OS does not start pushing
>>>>> +     * threads to other nodes.
>>>>> +     */
>>>>> +    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
>>>>> +        /* If there is no NUMA we can try the ovs-threads routine.
>>>>> +         * It falls back to sysconf and/or affinity mask.
>>>>> +         */
>>>>> +        cores = count_cpu_cores();
>>>>> +        pool_size = cores;
>>>>> +    } else {
>>>>> +        pool_size = cores / nodes;
>>>>> +    }
>>>>> +    if ((pool_size < 4) && force) {
>>>>> +        pool_size = 4;
>>>>> +    }
>>>>> +    can_parallelize = (pool_size >= 3);
>>>>> +    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
>>>>> +    sembase = random_uint32();
>>>>> +}
>>>>> +
>>>>> +static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
>>>>> +                               void *fin_result, void *result_frags,
>>>>> +                               int index)
>>>>> +{
>>>>> +    struct ovs_list *result = (struct ovs_list *)fin_result;
>>>>> +    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
>>>>> +
>>>>> +    if (!ovs_list_is_empty(&res_frags[index])) {
>>>>> +        ovs_list_splice(result->next,
>>>>> +                ovs_list_front(&res_frags[index]), &res_frags[index]);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
>>>>> +                               void *fin_result, void *result_frags,
>>>>> +                               int index)
>>>>> +{
>>>>> +    struct hmap *result = (struct hmap *)fin_result;
>>>>> +    struct hmap *res_frags = (struct hmap *)result_frags;
>>>>> +
>>>>> +    fast_hmap_merge(result, &res_frags[index]);
>>>>> +    hmap_destroy(&res_frags[index]);
>>>>> +}
>>>>> +
>>>>> +#endif
>>>>> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
>>>>> new file mode 100644
>>>>> index 000000000..8db61eaba
>>>>> --- /dev/null
>>>>> +++ b/lib/ovn-parallel-hmap.h
>>>>> @@ -0,0 +1,301 @@
>>>>> +/*
>>>>> + * Copyright (c) 2020 Red Hat, Inc.
>>>>> + *
>>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>>> + * you may not use this file except in compliance with the License.
>>>>> + * You may obtain a copy of the License at:
>>>>> + *
>>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>>>> + *
>>>>> + * Unless required by applicable law or agreed to in writing, software
>>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>>> + * See the License for the specific language governing permissions and
>>>>> + * limitations under the License.
>>>>> + */
>>>>> +
>>>>> +#ifndef OVN_PARALLEL_HMAP
>>>>> +#define OVN_PARALLEL_HMAP 1
>>>>> +
>>>>> +/* if the parallel macros are defined by hmap.h or any other ovs define
>>>>> + * we skip over the ovn specific definitions.
>>>>> + */
>>>>> +
>>>>> +#ifdef  __cplusplus
>>>>> +extern "C" {
>>>>> +#endif
>>>>> +
>>>>> +#include <stdbool.h>
>>>>> +#include <stdlib.h>
>>>>> +#include <semaphore.h>
>>>>> +#include <errno.h>
>>>>> +#include "openvswitch/util.h"
>>>>> +#include "openvswitch/hmap.h"
>>>>> +#include "openvswitch/thread.h"
>>>>> +#include "ovs-atomic.h"
>>>>> +
>>>>> +/* Process this include only if OVS does not supply parallel definitions
>>>>> + */
>>>>> +
>>>>> +#ifdef OVS_HAS_PARALLEL_HMAP
>>>>> +
>>>>> +#include "parallel-hmap.h"
>>>>> +
>>>>> +#else
>>>>> +
>>>>> +
>>>>> +#ifdef __clang__
>>>>> +#pragma clang diagnostic push
>>>>> +#pragma clang diagnostic ignored "-Wthread-safety"
>>>>> +#endif
>>>>> +
>>>>> +
>>>>> +/* A version of the HMAP_FOR_EACH macro intended for iterating as part
>>>>> + * of parallel processing.
>>>>> + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
>>>>> + * and will iterate hash buckets ThreadID, ThreadID + step,
>>>>> + * ThreadID + step * 2, etc. The actual macro accepts
>>>>> + * ThreadID + step * i as the JOBID parameter.
>>>>> + */
>>>>> +
>>>>> +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
>>>>> +   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
>>>>> +        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
>>>>> +       || ((NODE = NULL), false); \
>>>>> +       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
>>>>> +
>>>>> +/* We do not have a SAFE version of the macro, because the hash size is not
>>>>> + * atomic and hash removal operations would need to be wrapped with
>>>>> + * locks. This will defeat most of the benefits from doing anything in
>>>>> + * parallel.
>>>>> + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
>>>>> + * each thread should store them in a temporary list result instead, merging
>>>>> + * the lists into a combined result at the end */
>>>>> +
>>>>> +/* Work "Handle" */
>>>>> +
>>>>> +struct worker_control {
>>>>> +    int id; /* Used as a modulo when iterating over a hash. */
>>>>> +    atomic_bool finished; /* Set to true after achunk of work is complete. */
>>>>> +    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
>>>>> +    sem_t *done; /* Work completion semaphore - sem_post on completion. */
>>>>> +    struct ovs_mutex mutex; /* Guards the data. */
>>>>> +    void *data; /* Pointer to data to be processed. */
>>>>> +    void *workload; /* back-pointer to the worker pool structure. */
>>>>> +};
>>>>> +
>>>>> +struct worker_pool {
>>>>> +    int size;   /* Number of threads in the pool. */
>>>>> +    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
>>>>> +    struct worker_control *controls; /* "Handles" in this pool. */
>>>>> +    sem_t *done; /* Work completion semaphorew. */
>>>>> +};
>>>>> +
>>>>> +/* Add a worker pool for thread function start() which expects a pointer to
>>>>> + * a worker_control structure as an argument. */
>>>>> +
>>>>> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
>>>>> +
>>>>> +/* Setting this to true will make all processing threads exit */
>>>>> +
>>>>> +bool ovn_stop_parallel_processing(void);
>>>>> +
>>>>> +/* Build a hmap pre-sized for size elements */
>>>>> +
>>>>> +void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
>>>>> +
>>>>> +/* Build a hmap with a mask equals to size */
>>>>> +
>>>>> +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
>>>>> +
>>>>> +/* Brute-force merge a hmap into hmap.
>>>>> + * Dest and inc have to have the same mask. The merge is performed
>>>>> + * by extending the element list for bucket N in the dest hmap with the list
>>>>> + * from bucket N in inc.
>>>>> + */
>>>>> +
>>>>> +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
>>>>> +
>>>>> +/* Run a pool, without any default processing of results.
>>>>> + */
>>>>> +
>>>>> +void ovn_run_pool(struct worker_pool *pool);
>>>>> +
>>>>> +/* Run a pool, merge results from hash frags into a final hash result.
>>>>> + * The hash frags must be pre-sized to the same size.
>>>>> + */
>>>>> +
>>>>> +void ovn_run_pool_hash(struct worker_pool *pool,
>>>>> +                       struct hmap *result, struct hmap *result_frags);
>>>>> +/* Run a pool, merge results from list frags into a final list result.
>>>>> + */
>>>>> +
>>>>> +void ovn_run_pool_list(struct worker_pool *pool,
>>>>> +                       struct ovs_list *result, struct ovs_list *result_frags);
>>>>> +
>>>>> +/* Run a pool, call a callback function to perform processing of results.
>>>>> + */
>>>>> +
>>>>> +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
>>>>> +                    void *result_frags,
>>>>> +                    void (*helper_func)(struct worker_pool *pool,
>>>>> +                        void *fin_result, void *result_frags, int index));
>>>>> +
>>>>> +
>>>>> +/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
>>>>> + * would land, or a null pointer if that bucket is empty. */
>>>>> +
>>>>> +static inline struct hmap_node *
>>>>> +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
>>>>> +{
>>>>> +    return hmap->buckets[num];
>>>>> +}
>>>>> +
>>>>> +static inline struct hmap_node *
>>>>> +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
>>>>> +{
>>>>> +    size_t i;
>>>>> +    for (i = start; i <= hmap->mask; i+= pool_size) {
>>>>> +        struct hmap_node *node = hmap->buckets[i];
>>>>> +        if (node) {
>>>>> +            return node;
>>>>> +        }
>>>>> +    }
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>> +/* Returns the first node in 'hmap', as expected by thread with job_id
>>>>> + * for parallel processing in arbitrary order, or a null pointer if
>>>>> + * the slice of 'hmap' for that job_id is empty. */
>>>>> +static inline struct hmap_node *
>>>>> +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
>>>>> +{
>>>>> +    return parallel_hmap_next__(hmap, job_id, pool_size);
>>>>> +}
>>>>> +
>>>>> +/* Returns the next node in the slice of 'hmap' following 'node',
>>>>> + * in arbitrary order, or a * null pointer if 'node' is the last node in
>>>>> + * the 'hmap' slice.
>>>>> + *
>>>>> + */
>>>>> +static inline struct hmap_node *
>>>>> +parallel_hmap_next(const struct hmap *hmap,
>>>>> +                   const struct hmap_node *node, ssize_t pool_size)
>>>>> +{
>>>>> +    return (node->next
>>>>> +            ? node->next
>>>>> +            : parallel_hmap_next__(hmap,
>>>>> +                (node->hash & hmap->mask) + pool_size, pool_size));
>>>>> +}
>>>>> +
>>>>> +static inline void post_completed_work(struct worker_control *control)
>>>>> +{
>>>>> +    atomic_thread_fence(memory_order_acq_rel);
>>>>> +    atomic_store_relaxed(&control->finished, true);
>>>>> +    sem_post(control->done);
>>>>> +}
>>>>> +
>>>>> +static inline void wait_for_work(struct worker_control *control)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    do {
>>>>> +        ret = sem_wait(control->fire);
>>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>>> +    ovs_assert(ret == 0);
>>>>> +}
>>>>> +static inline void wait_for_work_completion(struct worker_pool *pool)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    do {
>>>>> +        ret = sem_wait(pool->done);
>>>>> +    } while ((ret == -1) && (errno == EINTR));
>>>>> +    ovs_assert(ret == 0);
>>>>> +}
>>>>> +
>>>>> +
>>>>> +/* Hash per-row locking support - to be used only in conjunction
>>>>> + * with fast hash inserts. Normal hash inserts may resize the hash
>>>>> + * rendering the locking invalid.
>>>>> + */
>>>>> +
>>>>> +struct hashrow_locks {
>>>>> +    ssize_t mask;
>>>>> +    struct ovs_mutex *row_locks;
>>>>> +};
>>>>> +
>>>>> +/* Update an hash row locks structure to match the current hash size */
>>>>> +
>>>>> +void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
>>>>> +
>>>>> +/* Lock a hash row */
>>>>> +
>>>>> +static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>>> +{
>>>>> +    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
>>>>> +}
>>>>> +
>>>>> +/* Unlock a hash row */
>>>>> +
>>>>> +static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
>>>>> +{
>>>>> +    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
>>>>> +}
>>>>> +/* Init the row locks structure */
>>>>> +
>>>>> +static inline void init_hash_row_locks(struct hashrow_locks *hrl)
>>>>> +{
>>>>> +    hrl->mask = 0;
>>>>> +    hrl->row_locks = NULL;
>>>>> +}
>>>>> +
>>>>> +bool ovn_can_parallelize_hashes(bool force_parallel);
>>>>> +
>>>>> +/* Use the OVN library functions for stuff which OVS has not defined
>>>>> + * If OVS has defined these, they will still compile using the OVN
>>>>> + * local names, but will be dropped by the linker in favour of the OVS
>>>>> + * supplied functions.
>>>>> + */
>>>>> +
>>>>> +#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
>>>>> +
>>>>> +#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
>>>>> +
>>>>> +#define stop_parallel_processing() ovn_stop_parallel_processing()
>>>>> +
>>>>> +#define add_worker_pool(start) ovn_add_worker_pool(start)
>>>>> +
>>>>> +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>>>>> +
>>>>> +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
>>>>> +
>>>>> +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
>>>>> +
>>>>> +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
>>>>> +
>>>>> +#define ovn_run_pool(pool) ovn_run_pool(pool)
>>>>> +
>>>>> +#define run_pool_hash(pool, result, result_frags) \
>>>>> +    ovn_run_pool_hash(pool, result, result_frags)
>>>>> +
>>>>> +#define run_pool_list(pool, result, result_frags) \
>>>>> +    ovn_run_pool_list(pool, result, result_frags)
>>>>> +
>>>>> +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
>>>>> +    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
>>>>> +
>>>>> +
>>>>> +
>>>>> +#ifdef __clang__
>>>>> +#pragma clang diagnostic pop
>>>>> +#endif
>>>>> +
>>>>> +#endif
>>>>> +
>>>>> +#ifdef  __cplusplus
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>> +
>>>>> +#endif /* lib/fasthmap.h */
>>>>> -- 
>>>>> 2.20.1
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev@openvswitch.org
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>> -- 
>>> Anton R. Ivanov
>>> Cambridgegreys Limited. Registered in England. Company Number 10273661
>>> https://www.cambridgegreys.com/
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>
diff mbox series

Patch

diff --git a/lib/automake.mk b/lib/automake.mk
index 250c7aefa..781be2109 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -13,6 +13,8 @@  lib_libovn_la_SOURCES = \
 	lib/expr.c \
 	lib/extend-table.h \
 	lib/extend-table.c \
+	lib/ovn-parallel-hmap.h \
+	lib/ovn-parallel-hmap.c \
 	lib/ip-mcast-index.c \
 	lib/ip-mcast-index.h \
 	lib/mcast-group-index.c \
diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
new file mode 100644
index 000000000..e83ae23cb
--- /dev/null
+++ b/lib/ovn-parallel-hmap.c
@@ -0,0 +1,455 @@ 
+/*
+ * Copyright (c) 2020 Red Hat, Inc.
+ * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <errno.h>
+#include <semaphore.h>
+#include "fatal-signal.h"
+#include "util.h"
+#include "openvswitch/vlog.h"
+#include "openvswitch/hmap.h"
+#include "openvswitch/thread.h"
+#include "ovn-parallel-hmap.h"
+#include "ovs-atomic.h"
+#include "ovs-thread.h"
+#include "ovs-numa.h"
+#include "random.h"
+
+VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
+
+#ifndef OVS_HAS_PARALLEL_HMAP
+
+#define WORKER_SEM_NAME "%x-%p-%x"
+#define MAIN_SEM_NAME "%x-%p-main"
+
+/* These are accessed under mutex inside add_worker_pool().
+ * They do not need to be atomic.
+ */
+
+static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
+static bool can_parallelize = false;
+
+/* This is set only in the process of exit and the set is
+ * accompanied by a fence. It does not need to be atomic or be
+ * accessed under a lock.
+ */
+
+static bool workers_must_exit = false;
+
+static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
+
+static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER;
+
+static int pool_size;
+
+static int sembase;
+
+static void worker_pool_hook(void *aux OVS_UNUSED);
+static void setup_worker_pools(bool force);
+static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
+                               void *fin_result, void *result_frags,
+                               int index);
+static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
+                               void *fin_result, void *result_frags,
+                               int index);
+
+bool ovn_stop_parallel_processing(void)
+{
+    return workers_must_exit;
+}
+
+bool ovn_can_parallelize_hashes(bool force_parallel)
+{
+    bool test = false;
+
+    if (atomic_compare_exchange_strong(
+            &initial_pool_setup,
+            &test,
+            true)) {
+        ovs_mutex_lock(&init_mutex);
+        setup_worker_pools(force_parallel);
+        ovs_mutex_unlock(&init_mutex);
+    }
+    return can_parallelize;
+}
+
+struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){
+
+    struct worker_pool *new_pool = NULL;
+    struct worker_control *new_control;
+    bool test = false;
+    int i;
+    char sem_name[256];
+
+
+    /* Belt and braces - initialize the pool system just in case if
+     * if it is not yet initialized.
+     */
+
+    if (atomic_compare_exchange_strong(
+            &initial_pool_setup,
+            &test,
+            true)) {
+        ovs_mutex_lock(&init_mutex);
+        setup_worker_pools(false);
+        ovs_mutex_unlock(&init_mutex);
+    }
+
+    ovs_mutex_lock(&init_mutex);
+    if (can_parallelize) {
+        new_pool = xmalloc(sizeof(struct worker_pool));
+        new_pool->size = pool_size;
+        new_pool->controls = NULL;
+        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
+        new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
+        if (new_pool->done == SEM_FAILED) {
+            goto cleanup;
+        }
+
+        new_pool->controls =
+            xmalloc(sizeof(struct worker_control) * new_pool->size);
+
+        for (i = 0; i < new_pool->size; i++) {
+            new_control = &new_pool->controls[i];
+            new_control->id = i;
+            new_control->done = new_pool->done;
+            new_control->data = NULL;
+            ovs_mutex_init(&new_control->mutex);
+            new_control->finished = ATOMIC_VAR_INIT(false);
+            sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
+            new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
+            if (new_control->fire == SEM_FAILED) {
+                goto cleanup;
+            }
+        }
+
+        for (i = 0; i < pool_size; i++) {
+            ovs_thread_create("worker pool helper", start, &new_pool->controls[i]);
+        }
+        ovs_list_push_back(&worker_pools, &new_pool->list_node);
+    }
+    ovs_mutex_unlock(&init_mutex);
+    return new_pool;
+cleanup:
+
+    /* Something went wrong when opening semaphores. In this case
+     * it is better to shut off parallel procesing altogether
+     */
+
+    VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
+    can_parallelize = false;
+    if (new_pool->controls) {
+        for (i = 0; i < new_pool->size; i++) {
+            if (new_pool->controls[i].fire != SEM_FAILED) {
+                sem_close(new_pool->controls[i].fire);
+                sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
+                sem_unlink(sem_name);
+                break; /* semaphores past this one are uninitialized */
+            }
+        }
+    }
+    if (new_pool->done != SEM_FAILED) {
+        sem_close(new_pool->done);
+        sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
+        sem_unlink(sem_name);
+    }
+    ovs_mutex_unlock(&init_mutex);
+    return NULL;
+}
+
+
+/* Initializes 'hmap' as an empty hash table with mask N. */
+void
+ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask)
+{
+    size_t i;
+
+    hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1));
+    hmap->one = NULL;
+    hmap->mask = mask;
+    hmap->n = 0;
+    for (i = 0; i <= hmap->mask; i++) {
+        hmap->buckets[i] = NULL;
+    }
+}
+
+/* Initializes 'hmap' as an empty hash table of size X.
+ * Intended for use in parallel processing so that all
+ * fragments used to store results in a parallel job
+ * are the same size.
+ */
+void
+ovn_fast_hmap_size_for(struct hmap *hmap, int size)
+{
+    size_t mask;
+    mask = size / 2;
+    mask |= mask >> 1;
+    mask |= mask >> 2;
+    mask |= mask >> 4;
+    mask |= mask >> 8;
+    mask |= mask >> 16;
+#if SIZE_MAX > UINT32_MAX
+    mask |= mask >> 32;
+#endif
+
+    /* If we need to dynamically allocate buckets we might as well allocate at
+     * least 4 of them. */
+    mask |= (mask & 1) << 1;
+
+    fast_hmap_init(hmap, mask);
+}
+
+/* Run a thread pool which uses a callback function to process results
+ */
+
+void ovn_run_pool_callback(struct worker_pool *pool,
+                           void *fin_result, void *result_frags,
+                           void (*helper_func)(struct worker_pool *pool,
+                                               void *fin_result,
+                                               void *result_frags, int index))
+{
+    int index, completed;
+
+    /* Ensure that all worker threads see the same data as the
+     * main thread.
+     */
+
+    atomic_thread_fence(memory_order_acq_rel);
+
+    /* Start workers */
+
+    for (index = 0; index < pool->size; index++) {
+        sem_post(pool->controls[index].fire);
+    }
+
+    completed = 0;
+
+    do {
+        bool test;
+        /* Note - we do not loop on semaphore until it reaches
+         * zero, but on pool size/remaining workers.
+         * This is by design. If the inner loop can handle
+         * completion for more than one worker within an iteration
+         * it will do so to ensure no additional iterations and
+         * waits once all of them are done.
+         *
+         * This may result in us having an initial positive value
+         * of the semaphore when the pool is invoked the next time.
+         * This is harmless - the loop will spin up a couple of times
+         * doing nothing while the workers are processing their data
+         * slices.
+         */
+        wait_for_work_completion(pool);
+        for (index = 0; index < pool->size; index++) {
+            test = true;
+            /* If the worker has marked its data chunk as complete,
+             * invoke the helper function to combine the results of
+             * this worker into the main result.
+             *
+             * The worker must invoke an appropriate memory fence
+             * (most likely acq_rel) to ensure that the main thread
+             * sees all of the results produced by the worker.
+             */
+            if (atomic_compare_exchange_weak(
+                    &pool->controls[index].finished,
+                    &test,
+                    false)) {
+                if (helper_func) {
+                    (helper_func)(pool, fin_result, result_frags, index);
+                }
+                completed++;
+                pool->controls[index].data = NULL;
+            }
+        }
+    } while (completed < pool->size);
+}
+
+/* Run a thread pool - basic, does not do results processing.
+ */
+
+void ovn_run_pool(struct worker_pool *pool)
+{
+    run_pool_callback(pool, NULL, NULL, NULL);
+}
+
+/* Brute force merge of a hashmap into another hashmap.
+ * Intended for use in parallel processing. The destination
+ * hashmap MUST be the same size as the one being merged.
+ *
+ * This can be achieved by pre-allocating them to correct size
+ * and using hmap_insert_fast() instead of hmap_insert()
+ */
+
+void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
+{
+    size_t i;
+
+    ovs_assert(inc->mask == dest->mask);
+
+    if (!inc->n) {
+        /* Request to merge an empty frag, nothing to do */
+        return;
+    }
+
+    for (i = 0; i <= dest->mask; i++) {
+        struct hmap_node **dest_bucket = &dest->buckets[i];
+        struct hmap_node **inc_bucket = &inc->buckets[i];
+        if (*inc_bucket != NULL) {
+            struct hmap_node *last_node = *inc_bucket;
+            while (last_node->next != NULL) {
+                last_node = last_node->next;
+            }
+            last_node->next = *dest_bucket;
+            *dest_bucket = *inc_bucket;
+            *inc_bucket = NULL;
+        }
+    }
+    dest->n += inc->n;
+    inc->n = 0;
+}
+
+/* Run a thread pool which gathers results in an array
+ * of hashes. Merge results.
+ */
+
+
+void ovn_run_pool_hash(
+        struct worker_pool *pool,
+        struct hmap *result,
+        struct hmap *result_frags)
+{
+    run_pool_callback(pool, result, result_frags, merge_hash_results);
+}
+
+/* Run a thread pool which gathers results in an array of lists.
+ * Merge results.
+ */
+void ovn_run_pool_list(
+        struct worker_pool *pool,
+        struct ovs_list *result,
+        struct ovs_list *result_frags)
+{
+    run_pool_callback(pool, result, result_frags, merge_list_results);
+}
+
+void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl)
+{
+    int i;
+    if (hrl->mask != lflows->mask) {
+        if (hrl->row_locks) {
+            free(hrl->row_locks);
+        }
+        hrl->row_locks = xcalloc(sizeof(struct ovs_mutex), lflows->mask + 1);
+        hrl->mask = lflows->mask;
+        for (i = 0; i <= lflows->mask; i++) {
+            ovs_mutex_init(&hrl->row_locks[i]);
+        }
+    }
+}
+
+static void worker_pool_hook(void *aux OVS_UNUSED) {
+    int i;
+    static struct worker_pool *pool;
+    char sem_name[256];
+
+    workers_must_exit = true;
+
+    /* All workers must honour the must_exit flag and check for it regularly.
+     * We can make it atomic and check it via atomics in workers, but that
+     * is not really necessary as it is set just once - when the program
+     * terminates. So we use a fence which is invoked before exiting instead.
+     */
+    atomic_thread_fence(memory_order_acq_rel);
+
+    /* Wake up the workers after the must_exit flag has been set */
+
+    LIST_FOR_EACH (pool, list_node, &worker_pools) {
+        for (i = 0; i < pool->size ; i++) {
+            sem_post(pool->controls[i].fire);
+        }
+        for (i = 0; i < pool->size ; i++) {
+            sem_close(pool->controls[i].fire);
+            sprintf(sem_name, WORKER_SEM_NAME, sembase, pool, i);
+            sem_unlink(sem_name);
+        }
+        sem_close(pool->done);
+        sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
+        sem_unlink(sem_name);
+    }
+}
+
+static void setup_worker_pools(bool force) {
+    int cores, nodes;
+
+    nodes = ovs_numa_get_n_numas();
+    if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
+        nodes = 1;
+    }
+    cores = ovs_numa_get_n_cores();
+
+    /* If there is no NUMA config, use 4 cores.
+     * If there is NUMA config use half the cores on
+     * one node so that the OS does not start pushing
+     * threads to other nodes.
+     */
+    if (cores == OVS_CORE_UNSPEC || cores <= 0) {
+        /* If there is no NUMA we can try the ovs-threads routine.
+         * It falls back to sysconf and/or affinity mask.
+         */
+        cores = count_cpu_cores();
+        pool_size = cores;
+    } else {
+        pool_size = cores / nodes;
+    }
+    if ((pool_size < 4) && force) {
+        pool_size = 4;
+    } 
+    can_parallelize = (pool_size >= 3);
+    fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true);
+    sembase = random_uint32();
+}
+
+static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
+                               void *fin_result, void *result_frags,
+                               int index)
+{
+    struct ovs_list *result = (struct ovs_list *)fin_result;
+    struct ovs_list *res_frags = (struct ovs_list *)result_frags;
+
+    if (!ovs_list_is_empty(&res_frags[index])) {
+        ovs_list_splice(result->next,
+                ovs_list_front(&res_frags[index]), &res_frags[index]);
+    }
+}
+
+static void merge_hash_results(struct worker_pool *pool OVS_UNUSED,
+                               void *fin_result, void *result_frags,
+                               int index)
+{
+    struct hmap *result = (struct hmap *)fin_result;
+    struct hmap *res_frags = (struct hmap *)result_frags;
+
+    fast_hmap_merge(result, &res_frags[index]);
+    hmap_destroy(&res_frags[index]);
+}
+
+#endif
diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
new file mode 100644
index 000000000..8db61eaba
--- /dev/null
+++ b/lib/ovn-parallel-hmap.h
@@ -0,0 +1,301 @@ 
+/*
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef OVN_PARALLEL_HMAP
+#define OVN_PARALLEL_HMAP 1
+
+/* if the parallel macros are defined by hmap.h or any other ovs define
+ * we skip over the ovn specific definitions.
+ */
+
+#ifdef  __cplusplus
+extern "C" {
+#endif
+
+#include <stdbool.h>
+#include <stdlib.h>
+#include <semaphore.h>
+#include <errno.h>
+#include "openvswitch/util.h"
+#include "openvswitch/hmap.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+
+/* Process this include only if OVS does not supply parallel definitions
+ */
+
+#ifdef OVS_HAS_PARALLEL_HMAP
+
+#include "parallel-hmap.h"
+
+#else
+
+
+#ifdef __clang__
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wthread-safety"
+#endif
+
+
+/* A version of the HMAP_FOR_EACH macro intended for iterating as part
+ * of parallel processing.
+ * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE
+ * and will iterate hash buckets ThreadID, ThreadID + step,
+ * ThreadID + step * 2, etc. The actual macro accepts
+ * ThreadID + step * i as the JOBID parameter.
+ */
+
+#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \
+   for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \
+        (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \
+       || ((NODE = NULL), false); \
+       ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER))
+
+/* We do not have a SAFE version of the macro, because the hash size is not
+ * atomic and hash removal operations would need to be wrapped with
+ * locks. This will defeat most of the benefits from doing anything in
+ * parallel.
+ * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements,
+ * each thread should store them in a temporary list result instead, merging
+ * the lists into a combined result at the end */
+
+/* Work "Handle" */
+
+struct worker_control {
+    int id; /* Used as a modulo when iterating over a hash. */
+    atomic_bool finished; /* Set to true after achunk of work is complete. */
+    sem_t *fire; /* Work start semaphore - sem_post starts the worker. */
+    sem_t *done; /* Work completion semaphore - sem_post on completion. */
+    struct ovs_mutex mutex; /* Guards the data. */
+    void *data; /* Pointer to data to be processed. */
+    void *workload; /* back-pointer to the worker pool structure. */
+};
+
+struct worker_pool {
+    int size;   /* Number of threads in the pool. */
+    struct ovs_list list_node; /* List of pools - used in cleanup/exit. */
+    struct worker_control *controls; /* "Handles" in this pool. */
+    sem_t *done; /* Work completion semaphorew. */
+};
+
+/* Add a worker pool for thread function start() which expects a pointer to
+ * a worker_control structure as an argument. */
+
+struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
+
+/* Setting this to true will make all processing threads exit */
+
+bool ovn_stop_parallel_processing(void);
+
+/* Build a hmap pre-sized for size elements */
+
+void ovn_fast_hmap_size_for(struct hmap *hmap, int size);
+
+/* Build a hmap with a mask equals to size */
+
+void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size);
+
+/* Brute-force merge a hmap into hmap.
+ * Dest and inc have to have the same mask. The merge is performed
+ * by extending the element list for bucket N in the dest hmap with the list
+ * from bucket N in inc.
+ */
+
+void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc);
+
+/* Run a pool, without any default processing of results.
+ */
+
+void ovn_run_pool(struct worker_pool *pool);
+
+/* Run a pool, merge results from hash frags into a final hash result.
+ * The hash frags must be pre-sized to the same size.
+ */
+
+void ovn_run_pool_hash(struct worker_pool *pool,
+                       struct hmap *result, struct hmap *result_frags);
+/* Run a pool, merge results from list frags into a final list result.
+ */
+
+void ovn_run_pool_list(struct worker_pool *pool,
+                       struct ovs_list *result, struct ovs_list *result_frags);
+
+/* Run a pool, call a callback function to perform processing of results.
+ */
+
+void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result,
+                    void *result_frags,
+                    void (*helper_func)(struct worker_pool *pool,
+                        void *fin_result, void *result_frags, int index));
+
+
+/* Returns the first node in 'hmap' in the bucket in which the given 'hash'
+ * would land, or a null pointer if that bucket is empty. */
+
+static inline struct hmap_node *
+hmap_first_in_bucket_num(const struct hmap *hmap, size_t num)
+{
+    return hmap->buckets[num];
+}
+
+static inline struct hmap_node *
+parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size)
+{
+    size_t i;
+    for (i = start; i <= hmap->mask; i+= pool_size) {
+        struct hmap_node *node = hmap->buckets[i];
+        if (node) {
+            return node;
+        }
+    }
+    return NULL;
+}
+
+/* Returns the first node in 'hmap', as expected by thread with job_id
+ * for parallel processing in arbitrary order, or a null pointer if
+ * the slice of 'hmap' for that job_id is empty. */
+static inline struct hmap_node *
+parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size)
+{
+    return parallel_hmap_next__(hmap, job_id, pool_size);
+}
+
+/* Returns the next node in the slice of 'hmap' following 'node',
+ * in arbitrary order, or a * null pointer if 'node' is the last node in
+ * the 'hmap' slice.
+ *
+ */
+static inline struct hmap_node *
+parallel_hmap_next(const struct hmap *hmap,
+                   const struct hmap_node *node, ssize_t pool_size)
+{
+    return (node->next
+            ? node->next
+            : parallel_hmap_next__(hmap,
+                (node->hash & hmap->mask) + pool_size, pool_size));
+}
+
+static inline void post_completed_work(struct worker_control *control)
+{
+    atomic_thread_fence(memory_order_acq_rel);
+    atomic_store_relaxed(&control->finished, true);
+    sem_post(control->done);
+}
+
+static inline void wait_for_work(struct worker_control *control)
+{
+    int ret;
+
+    do {
+        ret = sem_wait(control->fire);
+    } while ((ret == -1) && (errno == EINTR));
+    ovs_assert(ret == 0);
+}
+static inline void wait_for_work_completion(struct worker_pool *pool)
+{
+    int ret;
+
+    do {
+        ret = sem_wait(pool->done);
+    } while ((ret == -1) && (errno == EINTR));
+    ovs_assert(ret == 0);
+}
+
+
+/* Hash per-row locking support - to be used only in conjunction
+ * with fast hash inserts. Normal hash inserts may resize the hash
+ * rendering the locking invalid.
+ */
+
+struct hashrow_locks {
+    ssize_t mask;
+    struct ovs_mutex *row_locks;
+};
+
+/* Update an hash row locks structure to match the current hash size */
+
+void ovn_update_hashrow_locks(struct hmap *lflows, struct hashrow_locks *hrl);
+
+/* Lock a hash row */
+
+static inline void lock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
+{
+    ovs_mutex_lock(&hrl->row_locks[hash % hrl->mask]);
+}
+
+/* Unlock a hash row */
+
+static inline void unlock_hash_row(struct hashrow_locks *hrl, uint32_t hash)
+{
+    ovs_mutex_unlock(&hrl->row_locks[hash % hrl->mask]);
+}
+/* Init the row locks structure */
+
+static inline void init_hash_row_locks(struct hashrow_locks *hrl)
+{
+    hrl->mask = 0;
+    hrl->row_locks = NULL;   
+}
+
+bool ovn_can_parallelize_hashes(bool force_parallel);
+
+/* Use the OVN library functions for stuff which OVS has not defined
+ * If OVS has defined these, they will still compile using the OVN
+ * local names, but will be dropped by the linker in favour of the OVS
+ * supplied functions.
+ */
+
+#define update_hashrow_locks(lflows, hrl) ovn_update_hashrow_locks(lflows, hrl)
+
+#define can_parallelize_hashes(force) ovn_can_parallelize_hashes(force)
+
+#define stop_parallel_processing() ovn_stop_parallel_processing()
+
+#define add_worker_pool(start) ovn_add_worker_pool(start)
+
+#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
+
+#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size)
+
+#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc)
+
+#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc)
+
+#define ovn_run_pool(pool) ovn_run_pool(pool)
+
+#define run_pool_hash(pool, result, result_frags) \
+    ovn_run_pool_hash(pool, result, result_frags)
+
+#define run_pool_list(pool, result, result_frags) \
+    ovn_run_pool_list(pool, result, result_frags)
+
+#define run_pool_callback(pool, fin_result, result_frags, helper_func) \
+    ovn_run_pool_callback(pool, fin_result, result_frags, helper_func)
+
+
+
+#ifdef __clang__
+#pragma clang diagnostic pop
+#endif
+
+#endif
+
+#ifdef  __cplusplus
+}
+#endif
+
+
+#endif /* lib/fasthmap.h */