From patchwork Thu Jan 7 14:19:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1423324 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DBSzm0hnYz9sW1 for ; Fri, 8 Jan 2021 01:19:46 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 03E4C204E3; Thu, 7 Jan 2021 14:19:44 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bNMAAEZuJHz8; Thu, 7 Jan 2021 14:19:35 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 05D34272E3; Thu, 7 Jan 2021 14:19:35 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id DE2A3C088B; Thu, 7 Jan 2021 14:19:34 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id F0B4EC013A for ; Thu, 7 Jan 2021 14:19:32 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id DA27E85B04 for ; Thu, 7 Jan 2021 14:19:32 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n5q45W0rsczF for ; Thu, 7 Jan 2021 14:19:31 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 9438884806 for ; Thu, 7 Jan 2021 14:19:31 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kxW8P-0006l2-T5; Thu, 07 Jan 2021 14:19:30 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1kxW8M-0000Qg-Hf; Thu, 07 Jan 2021 14:19:28 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Thu, 7 Jan 2021 14:19:20 +0000 Message-Id: <20210107141921.1577-2-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210107141921.1577-1-anton.ivanov@cambridgegreys.com> References: <20210107141921.1577-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [OVN Patch v9 2/3] ovn-northd: Introduce parallel lflow build X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Datapaths, ports, igmp groups and load balancers can now be iterated over in parallel in order to speed up the lflow generation. This decreases the time needed to generate the logical flows by a factor of 4+ on a 6 core/12 thread CPU without datapath groups - from 0.8-1 microseconds per flow down to 0.2-0.3 microseconds per flow on average. The decrease in time to compute lflows with datapath groups enabled is ~2 times for the same hardware - from an average of 2.4 microseconds per flow to 1.2 microseconds per flow. Tested for on an 8 node, 400 pod K8 simulation resulting in > 6K flows. Signed-off-by: Anton Ivanov --- lib/fasthmap.c | 3 + northd/ovn-northd.c | 310 ++++++++++++++++++++++++++++++++++++++------ 2 files changed, 271 insertions(+), 42 deletions(-) diff --git a/lib/fasthmap.c b/lib/fasthmap.c index 3096c90d3..e70ac4553 100644 --- a/lib/fasthmap.c +++ b/lib/fasthmap.c @@ -33,6 +33,7 @@ VLOG_DEFINE_THIS_MODULE(fasthmap); +#ifndef OVS_HAS_PARALLEL_HMAP static bool worker_pool_setup = false; static bool workers_must_exit = false; @@ -279,3 +280,5 @@ void ovn_run_pool_hash( { ovn_run_pool_callback(pool, result, result_frags, merge_hash_results); } + +#endif diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 15d859dbd..8a35ac65d 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -37,6 +37,7 @@ #include "lib/ovn-sb-idl.h" #include "lib/ovn-util.h" #include "lib/lb.h" +#include "lib/fasthmap.h" #include "ovn/actions.h" #include "ovn/logical-fields.h" #include "packets.h" @@ -4173,6 +4174,8 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od, * logical datapath only by creating a datapah group. */ static bool use_logical_dp_groups = false; +static struct ovs_mutex *slice_locks = NULL; + /* Adds a row with the specified contents to the Logical_Flow table. */ static void ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od, @@ -4195,16 +4198,25 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od, hash = ovn_lflow_hash(lflow); if (shared && use_logical_dp_groups) { + if (slice_locks) { + ovs_mutex_lock(&slice_locks[hash % lflow_map->mask]); + } old_lflow = ovn_lflow_find_by_lflow(lflow_map, lflow, hash); if (old_lflow) { ovn_lflow_destroy(NULL, lflow); hmapx_add(&old_lflow->od_group, od); + if (slice_locks) { + ovs_mutex_unlock(&slice_locks[hash % lflow_map->mask]); + } return; } } hmapx_add(&lflow->od_group, od); - hmap_insert(lflow_map, &lflow->hmap_node, hash); + hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); + if (shared && use_logical_dp_groups && slice_locks) { + ovs_mutex_unlock(&slice_locks[hash % lflow_map->mask]); + } } /* Adds a row with the specified contents to the Logical_Flow table. */ @@ -7346,6 +7358,8 @@ build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group *igmp_group, } } +static struct ovs_mutex mcgroups_lock = OVS_MUTEX_INITIALIZER; + /* Ingress table 19: Destination lookup, unicast handling (priority 50), */ static void build_lswitch_ip_unicast_lookup(struct ovn_port *op, @@ -7384,7 +7398,9 @@ build_lswitch_ip_unicast_lookup(struct ovn_port *op, &op->nbsp->header_); } else if (!strcmp(op->nbsp->addresses[i], "unknown")) { if (lsp_is_enabled(op->nbsp)) { + ovs_mutex_lock(&mcgroups_lock); ovn_multicast_add(mcgroups, &mc_unknown, op); + ovs_mutex_unlock(&mcgroups_lock); op->od->has_unknown = true; } } else if (is_dynamic_lsp_address(op->nbsp->addresses[i])) { @@ -11396,6 +11412,127 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, &lsi->match, &lsi->actions); } +struct lflows_thread_pool { + struct worker_pool *pool; +}; + +static void *build_lflows_thread(void *arg) { + struct worker_control *control = (struct worker_control *) arg; + struct lflows_thread_pool *workload; + struct lswitch_flow_build_info *lsi; + + struct ovn_datapath *od; + struct ovn_port *op; + struct ovn_northd_lb *lb; + struct ovn_igmp_group *igmp_group; + int bnum; + + while (!cease_fire()) { + sem_wait(&control->fire); + workload = (struct lflows_thread_pool *) control->workload; + lsi = (struct lswitch_flow_build_info *) control->data; + if (lsi && workload) { + /* Iterate over bucket ThreadID, ThreadID+size, ... */ + for (bnum = control->id; + bnum <= lsi->datapaths->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + od, key_node, bnum, lsi->datapaths) { + if (cease_fire()) { + return NULL; + } + build_lswitch_and_lrouter_iterate_by_od(od, lsi); + } + } + for (bnum = control->id; + bnum <= lsi->ports->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + op, key_node, bnum, lsi->ports) { + if (cease_fire()) { + return NULL; + } + build_lswitch_and_lrouter_iterate_by_op(op, lsi); + } + } + for (bnum = control->id; + bnum <= lsi->lbs->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + lb, hmap_node, bnum, lsi->lbs) { + if (cease_fire()) { + return NULL; + } + build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, + &lsi->match, + &lsi->actions); + } + } + for (bnum = control->id; + bnum <= lsi->igmp_groups->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + igmp_group, hmap_node, bnum, lsi->igmp_groups) { + if (cease_fire()) { + return NULL; + } + build_lswitch_ip_mcast_igmp_mld(igmp_group, lsi->lflows, + &lsi->match, + &lsi->actions); + } + } + atomic_store_relaxed(&control->finished, true); + atomic_thread_fence(memory_order_release); + } + sem_post(control->done); + } + return NULL; +} + +static bool pool_init_done = false; +static struct lflows_thread_pool *build_lflows_pool = NULL; + +static void init_lflows_thread_pool(void) +{ + int index; + + if (!pool_init_done) { + struct worker_pool *pool = add_worker_pool(build_lflows_thread); + pool_init_done = true; + if (pool) { + build_lflows_pool = + xmalloc(sizeof(struct lflows_thread_pool)); + build_lflows_pool->pool = pool; + for (index = 0; index < build_lflows_pool->pool->size; index++) { + build_lflows_pool->pool->controls[index].workload = + build_lflows_pool; + } + } + } +} + +/* TODO: replace hard cutoffs by configurable via commands. These are + * temporary defines to determine single-thread to multi-thread processing + * cutoff. + * Setting to 1 forces "all parallel" lflow build. + */ + +static void +noop_callback(struct worker_pool *pool OVS_UNUSED, + void *fin_result OVS_UNUSED, + void *result_frags OVS_UNUSED, + int index OVS_UNUSED) +{ + /* Do nothing */ +} + + +static bool use_parallel_build = true; + static void build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *port_groups, struct hmap *lflows, @@ -11403,52 +11540,109 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *igmp_groups, struct shash *meter_groups, struct hmap *lbs) { - struct ovn_datapath *od; - struct ovn_port *op; - struct ovn_northd_lb *lb; - struct ovn_igmp_group *igmp_group; char *svc_check_match = xasprintf("eth.dst == %s", svc_monitor_mac); - struct lswitch_flow_build_info lsi = { - .datapaths = datapaths, - .ports = ports, - .port_groups = port_groups, - .lflows = lflows, - .mcgroups = mcgroups, - .igmp_groups = igmp_groups, - .meter_groups = meter_groups, - .lbs = lbs, - .svc_check_match = svc_check_match, - .match = DS_EMPTY_INITIALIZER, - .actions = DS_EMPTY_INITIALIZER, - }; + if (use_parallel_build) { + init_lflows_thread_pool(); + struct hmap *lflow_segs; + struct lswitch_flow_build_info *lsiv; + int index; + + lsiv = xmalloc( + sizeof(struct lswitch_flow_build_info) * + build_lflows_pool->pool->size); + if (use_logical_dp_groups) { + lflow_segs = NULL; + } else { + lflow_segs = xmalloc( + sizeof(struct hmap) * build_lflows_pool->pool->size); + } - /* Combined build - all lflow generation from lswitch and lrouter - * will move here and will be reogranized by iterator type. - */ - HMAP_FOR_EACH (od, key_node, datapaths) { - build_lswitch_and_lrouter_iterate_by_od(od, &lsi); - } - HMAP_FOR_EACH (op, key_node, ports) { - build_lswitch_and_lrouter_iterate_by_op(op, &lsi); - } - HMAP_FOR_EACH (lb, hmap_node, lbs) { - build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, - &lsi.actions, - &lsi.match); - } - HMAP_FOR_EACH (igmp_group, hmap_node, igmp_groups) { - build_lswitch_ip_mcast_igmp_mld(igmp_group, - lsi.lflows, - &lsi.actions, - &lsi.match); - } - free(svc_check_match); + /* Set up "work chunks" for each thread to work on. */ + + for (index = 0; index < build_lflows_pool->pool->size; index++) { + if (use_logical_dp_groups) { + /* if dp_groups are in use we lock a shared lflows hash + * on a per-bucket level instead of merging hash frags */ + lsiv[index].lflows = lflows; + } else { + fast_hmap_init(&lflow_segs[index], lflows->mask); + lsiv[index].lflows = &lflow_segs[index]; + } + + lsiv[index].datapaths = datapaths; + lsiv[index].ports = ports; + lsiv[index].port_groups = port_groups; + lsiv[index].mcgroups = mcgroups; + lsiv[index].igmp_groups = igmp_groups; + lsiv[index].meter_groups = meter_groups; + lsiv[index].lbs = lbs; + lsiv[index].svc_check_match = svc_check_match; + ds_init(&lsiv[index].match); + ds_init(&lsiv[index].actions); + + build_lflows_pool->pool->controls[index].data = &lsiv[index]; + } + + /* Run thread pool. */ + if (use_logical_dp_groups) { + run_pool_callback(build_lflows_pool->pool, NULL, NULL, noop_callback); + } else { + run_pool_hash(build_lflows_pool->pool, lflows, lflow_segs); + } - ds_destroy(&lsi.match); - ds_destroy(&lsi.actions); + for (index = 0; index < build_lflows_pool->pool->size; index++) { + ds_destroy(&lsiv[index].match); + ds_destroy(&lsiv[index].actions); + } + free(lflow_segs); + free(lsiv); + } else { + struct ovn_datapath *od; + struct ovn_port *op; + struct ovn_northd_lb *lb; + struct ovn_igmp_group *igmp_group; + struct lswitch_flow_build_info lsi = { + .datapaths = datapaths, + .ports = ports, + .port_groups = port_groups, + .lflows = lflows, + .mcgroups = mcgroups, + .igmp_groups = igmp_groups, + .meter_groups = meter_groups, + .lbs = lbs, + .svc_check_match = svc_check_match, + .match = DS_EMPTY_INITIALIZER, + .actions = DS_EMPTY_INITIALIZER, + }; + + /* Combined build - all lflow generation from lswitch and lrouter + * will move here and will be reogranized by iterator type. + */ + HMAP_FOR_EACH (od, key_node, datapaths) { + build_lswitch_and_lrouter_iterate_by_od(od, &lsi); + } + HMAP_FOR_EACH (op, key_node, ports) { + build_lswitch_and_lrouter_iterate_by_op(op, &lsi); + } + HMAP_FOR_EACH (lb, hmap_node, lbs) { + build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, + &lsi.actions, + &lsi.match); + } + HMAP_FOR_EACH (igmp_group, hmap_node, igmp_groups) { + build_lswitch_ip_mcast_igmp_mld(igmp_group, + lsi.lflows, + &lsi.actions, + &lsi.match); + } + + ds_destroy(&lsi.match); + ds_destroy(&lsi.actions); + } + free(svc_check_match); build_lswitch_flows(datapaths, lflows); } @@ -11519,6 +11713,25 @@ ovn_sb_set_lflow_logical_dp_group( sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group); } +static ssize_t max_seen_lflow_size = 128; + +static ssize_t recent_lflow_map_mask = 0; + +static void update_lock_array(struct hmap *lflows) +{ + int i; + if (recent_lflow_map_mask != lflows->mask) { + if (slice_locks) { + free(slice_locks); + } + slice_locks = calloc(sizeof(struct ovs_mutex), lflows->mask + 1); + recent_lflow_map_mask = lflows->mask; + for (i = 0; i <= lflows->mask; i++) { + ovs_mutex_init(&slice_locks[i]); + } + } +} + /* Updates the Logical_Flow and Multicast_Group tables in the OVN_SB database, * constructing their contents based on the OVN_NB database. */ static void @@ -11528,12 +11741,25 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, struct shash *meter_groups, struct hmap *lbs) { - struct hmap lflows = HMAP_INITIALIZER(&lflows); + struct hmap lflows; + fast_hmap_size_for(&lflows, max_seen_lflow_size); + if (use_parallel_build) { + update_lock_array(&lflows); + } else { + if (slice_locks) { + free(slice_locks); + } + slice_locks = NULL; + } build_lswitch_and_lrouter_flows(datapaths, ports, port_groups, &lflows, mcgroups, igmp_groups, meter_groups, lbs); + if (hmap_count(&lflows) > max_seen_lflow_size) { + max_seen_lflow_size = hmap_count(&lflows); + } + /* Collecting all unique datapath groups. */ struct hmap dp_groups = HMAP_INITIALIZER(&dp_groups); struct hmapx single_dp_lflows = HMAPX_INITIALIZER(&single_dp_lflows);