diff mbox series

[ovs-dev] northd: Use bitmaps instead of exclusive hash map for datapath groups.

Message ID 20220913113329.7576-1-i.maximets@ovn.org
State Accepted
Headers show
Series [ovs-dev] northd: Use bitmaps instead of exclusive hash map for datapath groups. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Ilya Maximets Sept. 13, 2022, 11:33 a.m. UTC
Every time a new logical flow is created, ovn-northd creates an
exclusive hash map to collect all the datapaths this logical flow is
applicable to.  While adding a new datapath to existing logical flow
hmapx will perform a lookup by hash and then add a new entry by
allocating a piece of memory and putting it into a hash map.  When
all the logical flows are created and all the datapath groups are
ready, northd is collecting all the unique groups by comparing them
to each other, destroying duplicates and replacing them with a pointer
to a one true group.  After that, northd is analyzing all the existing
flows in the Sb DB and comparing their groups to ones that the flow
should have.  At the end all the remaining exclusive hash maps
containing datapath groups are destroyed.

The process is fairly inefficient.  The number of unique datapath
groups is typically much smaller than the number of logical flows.
So, in high-scale environments we may end up generating hundreds
of thousands equal groups just to discard them later.
Every element in a group is a separately allocated piece of memory
added to a hash map.  This imposes significant cost of allocating
them and freeing later.  Comparison of 2 hash maps to check for
equality involves looking up every element of one hash map in another,
which is not a very fast operation either.  Also hash maps are
inherently cache-unfriendly, so every access during iteration
likely results in a cache miss.

As a result, all these datapath group manipulations end up taking
most of the processing time in northd.

Let's enumerate all the logical datapaths we have assigning a
unique index from zero to the total number of datapaths.  And
create an array of datapaths for a quick access by their index.
Having that, instead of collecting datapath pointers in hash
maps, we can use bitmaps instead and turn on bits that correspond
to datapaths relevant to particular group.  Later we can reconstruct
the group by iterating over a bitmap and taking required datapaths
from the array.

Advantages:

 - Bitmaps are relatively small.  It will be only 125 bytes to
   accommodate a setup with 1000 datapaths.  At the same time hmapx
   with just one element will end up with 2 separate memory allocations
   to allocate at least 4 buckets and a hmapx_node.  In total that can
   be less than a bitmap with only one datapath, but it will be only
   2-3x less and comparison becomes irrelevant as soon as there is at
   least one big datapath group.

 - Bitmaps are trivial to compare.  Simple memcmp does the trick.
   And memcmp will have one the most optimized implementations for
   the actual hardware in use.

 - Flipping bits or checking if a certain bit is set in constant time.
   Asymptotically same as in hash maps, but the constant is much lower.

 - Cache-friendly.  Depending on a system and the scale, the whole map
   can fit into a single cache line or two.  All the data is sequential
   in memory.

 - Reduced number of memory allocations/deallocations.  Bitmap is
   allocated and freed only once as a single piece.  hmapx allocates
   memory for every datapath.

Disadvantage:

 - Counting number of bits in a bitmap is a bit heavier than just
   getting hmapx_count(), but we're not doing that very frequently,
   so it doesn't seem to impact performance in any meaningful way.

Testing with ovn-heater shows up to 3x performance improvement for
500-node density-heavy scenario.  In this case northd poll intervals
went down from 10 to 3 seconds, driving the test maximum latency from
21 down to 6.5 seconds.  And overall test time went down from 2400 to
720 seconds.  Memory consumption for the northd process also reduced
from 3.9 GB to 1.6 GB RSS.

One design consideration was to make 'datapaths_array' and the
'n_dataptahs' variables global.  Logically, from the perspective of
the I-P engine, they should be part of the northd-generated 'data'.
And it's not a problem for the array itself, since it is only used in
a couple of places.  However, 'n_datapaths' is used practically
everywhere, including ovn_lflow_add(), so we'll have to pass it around
in most of the functions in northd as an argument.  Taking that into
account, I decided to make them both global to avoid touching half of
the northd code.
We also have a couple of other global variables already (mainly for
parallel processing) that should technically be part of the I-P data,
but it is way simpler to keep them global.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 northd/northd.c | 158 ++++++++++++++++++++++++++----------------------
 northd/northd.h |   3 +
 2 files changed, 89 insertions(+), 72 deletions(-)

Comments

Numan Siddique Sept. 16, 2022, 3:59 p.m. UTC | #1
On Tue, Sep 13, 2022 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> Every time a new logical flow is created, ovn-northd creates an
> exclusive hash map to collect all the datapaths this logical flow is
> applicable to.  While adding a new datapath to existing logical flow
> hmapx will perform a lookup by hash and then add a new entry by
> allocating a piece of memory and putting it into a hash map.  When
> all the logical flows are created and all the datapath groups are
> ready, northd is collecting all the unique groups by comparing them
> to each other, destroying duplicates and replacing them with a pointer
> to a one true group.  After that, northd is analyzing all the existing
> flows in the Sb DB and comparing their groups to ones that the flow
> should have.  At the end all the remaining exclusive hash maps
> containing datapath groups are destroyed.
>
> The process is fairly inefficient.  The number of unique datapath
> groups is typically much smaller than the number of logical flows.
> So, in high-scale environments we may end up generating hundreds
> of thousands equal groups just to discard them later.
> Every element in a group is a separately allocated piece of memory
> added to a hash map.  This imposes significant cost of allocating
> them and freeing later.  Comparison of 2 hash maps to check for
> equality involves looking up every element of one hash map in another,
> which is not a very fast operation either.  Also hash maps are
> inherently cache-unfriendly, so every access during iteration
> likely results in a cache miss.
>
> As a result, all these datapath group manipulations end up taking
> most of the processing time in northd.
>
> Let's enumerate all the logical datapaths we have assigning a
> unique index from zero to the total number of datapaths.  And
> create an array of datapaths for a quick access by their index.
> Having that, instead of collecting datapath pointers in hash
> maps, we can use bitmaps instead and turn on bits that correspond
> to datapaths relevant to particular group.  Later we can reconstruct
> the group by iterating over a bitmap and taking required datapaths
> from the array.
>
> Advantages:
>
>  - Bitmaps are relatively small.  It will be only 125 bytes to
>    accommodate a setup with 1000 datapaths.  At the same time hmapx
>    with just one element will end up with 2 separate memory allocations
>    to allocate at least 4 buckets and a hmapx_node.  In total that can
>    be less than a bitmap with only one datapath, but it will be only
>    2-3x less and comparison becomes irrelevant as soon as there is at
>    least one big datapath group.
>
>  - Bitmaps are trivial to compare.  Simple memcmp does the trick.
>    And memcmp will have one the most optimized implementations for
>    the actual hardware in use.
>
>  - Flipping bits or checking if a certain bit is set in constant time.
>    Asymptotically same as in hash maps, but the constant is much lower.
>
>  - Cache-friendly.  Depending on a system and the scale, the whole map
>    can fit into a single cache line or two.  All the data is sequential
>    in memory.
>
>  - Reduced number of memory allocations/deallocations.  Bitmap is
>    allocated and freed only once as a single piece.  hmapx allocates
>    memory for every datapath.
>
> Disadvantage:
>
>  - Counting number of bits in a bitmap is a bit heavier than just
>    getting hmapx_count(), but we're not doing that very frequently,
>    so it doesn't seem to impact performance in any meaningful way.
>
> Testing with ovn-heater shows up to 3x performance improvement for
> 500-node density-heavy scenario.  In this case northd poll intervals
> went down from 10 to 3 seconds, driving the test maximum latency from
> 21 down to 6.5 seconds.  And overall test time went down from 2400 to
> 720 seconds.  Memory consumption for the northd process also reduced
> from 3.9 GB to 1.6 GB RSS.
>
> One design consideration was to make 'datapaths_array' and the
> 'n_dataptahs' variables global.  Logically, from the perspective of
> the I-P engine, they should be part of the northd-generated 'data'.
> And it's not a problem for the array itself, since it is only used in
> a couple of places.  However, 'n_datapaths' is used practically
> everywhere, including ovn_lflow_add(), so we'll have to pass it around
> in most of the functions in northd as an argument.  Taking that into
> account, I decided to make them both global to avoid touching half of
> the northd code.
> We also have a couple of other global variables already (mainly for
> parallel processing) that should technically be part of the I-P data,
> but it is way simpler to keep them global.
>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Thanks Ilya for these huge improvements.

The patch LGTM.

Acked-by: Numan Siddique <numans@ovn.org>

Numan

> ---
>  northd/northd.c | 158 ++++++++++++++++++++++++++----------------------
>  northd/northd.h |   3 +
>  2 files changed, 89 insertions(+), 72 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 346c0c7c8..eab62b8fc 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -1358,6 +1358,10 @@ ovn_datapath_assign_requested_tnl_id(struct northd_input *input_data,
>      }
>  }
>
> +/* Array of all datapaths, with 'od->index' being their index in the array. */
> +static struct ovn_datapath **datapaths_array = NULL;
> +static size_t n_datapaths = 0; /* Size of the 'datapaths_array'. */
> +
>  /* Updates the southbound Datapath_Binding table so that it contains the
>   * logical switches and routers specified by the northbound database.
>   *
> @@ -1422,6 +1426,19 @@ build_datapaths(struct northd_input *input_data,
>          sbrec_datapath_binding_delete(od->sb);
>          ovn_datapath_destroy(datapaths, od);
>      }
> +
> +    /* Assign unique sequential indexes to all datapaths.  These are not
> +     * visible outside of the northd loop, so, unlike the tunnel keys, it
> +     * doesn't matter if they are different on every iteration. */
> +    size_t index = 0;
> +
> +    n_datapaths = hmap_count(datapaths);
> +    datapaths_array = xrealloc(datapaths_array,
> +                               n_datapaths * sizeof *datapaths_array);
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
> +        od->index = index;
> +        datapaths_array[index++] = od;
> +    }
>  }
>
>  /* Structure representing logical router port
> @@ -4134,19 +4151,19 @@ build_lb_port_related_data(struct hmap *datapaths, struct hmap *ports,
>
>
>  struct ovn_dp_group {
> -    struct hmapx map;
> +    unsigned long *bitmap;
>      struct sbrec_logical_dp_group *dp_group;
>      struct hmap_node node;
>  };
>
>  static struct ovn_dp_group *
>  ovn_dp_group_find(const struct hmap *dp_groups,
> -                  const struct hmapx *od, uint32_t hash)
> +                  const unsigned long *dpg_bitmap, uint32_t hash)
>  {
>      struct ovn_dp_group *dpg;
>
>      HMAP_FOR_EACH_WITH_HASH (dpg, node, hash, dp_groups) {
> -        if (hmapx_equals(&dpg->map, od)) {
> +        if (bitmap_equal(dpg->bitmap, dpg_bitmap, n_datapaths)) {
>              return dpg;
>          }
>      }
> @@ -4155,16 +4172,15 @@ ovn_dp_group_find(const struct hmap *dp_groups,
>
>  static struct sbrec_logical_dp_group *
>  ovn_sb_insert_logical_dp_group(struct ovsdb_idl_txn *ovnsb_txn,
> -                               const struct hmapx *od)
> +                               const unsigned long *dpg_bitmap)
>  {
>      struct sbrec_logical_dp_group *dp_group;
>      const struct sbrec_datapath_binding **sb;
> -    const struct hmapx_node *node;
> -    int n = 0;
> +    size_t n = 0, index;
>
> -    sb = xmalloc(hmapx_count(od) * sizeof *sb);
> -    HMAPX_FOR_EACH (node, od) {
> -        sb[n++] = ((struct ovn_datapath *) node->data)->sb;
> +    sb = xmalloc(bitmap_count1(dpg_bitmap, n_datapaths) * sizeof *sb);
> +    BITMAP_FOR_EACH_1 (index, n_datapaths, dpg_bitmap) {
> +        sb[n++] = datapaths_array[index]->sb;
>      }
>      dp_group = sbrec_logical_dp_group_insert(ovnsb_txn);
>      sbrec_logical_dp_group_set_datapaths(
> @@ -4223,9 +4239,9 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>          }
>
>          struct ovn_dp_group *dpg = xzalloc(sizeof *dpg);
> -        size_t i;
> +        size_t i, n = 0;
>
> -        hmapx_init(&dpg->map);
> +        dpg->bitmap = bitmap_allocate(n_datapaths);
>          for (i = 0; i < dp_group->n_datapaths; i++) {
>              struct ovn_datapath *datapath_od;
>
> @@ -4234,18 +4250,19 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>              if (!datapath_od || ovn_datapath_is_stale(datapath_od)) {
>                  break;
>              }
> -            hmapx_add(&dpg->map, datapath_od);
> +            bitmap_set1(dpg->bitmap, datapath_od->index);
> +            n++;
>          }
>          if (i == dp_group->n_datapaths) {
> -            uint32_t hash = hash_int(hmapx_count(&dpg->map), 0);
> +            uint32_t hash = hash_int(n, 0);
>
> -            if (!ovn_dp_group_find(&dp_groups, &dpg->map, hash)) {
> +            if (!ovn_dp_group_find(&dp_groups, dpg->bitmap, hash)) {
>                  dpg->dp_group = dp_group;
>                  hmap_insert(&dp_groups, &dpg->node, hash);
>                  continue;
>              }
>          }
> -        hmapx_destroy(&dpg->map);
> +        bitmap_free(dpg->bitmap);
>          free(dpg);
>      }
>      hmapx_destroy(&existing_lbs);
> @@ -4278,23 +4295,25 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>          }
>
>          /* Find datapath group for this load balancer. */
> -        struct hmapx lb_dps = HMAPX_INITIALIZER(&lb_dps);
> +        unsigned long *lb_dps_bitmap;
>          struct ovn_dp_group *dpg;
>          uint32_t hash;
>
> +        lb_dps_bitmap = bitmap_allocate(n_datapaths);
>          for (size_t i = 0; i < lb->n_nb_ls; i++) {
> -            hmapx_add(&lb_dps, lb->nb_ls[i]);
> +            bitmap_set1(lb_dps_bitmap, lb->nb_ls[i]->index);
>          }
>
> -        hash = hash_int(hmapx_count(&lb_dps), 0);
> -        dpg = ovn_dp_group_find(&dp_groups, &lb_dps, hash);
> +        hash = hash_int(bitmap_count1(lb_dps_bitmap, n_datapaths), 0);
> +        dpg = ovn_dp_group_find(&dp_groups, lb_dps_bitmap, hash);
>          if (!dpg) {
>              dpg = xzalloc(sizeof *dpg);
> -            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &lb_dps);
> -            hmapx_clone(&dpg->map, &lb_dps);
> +            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
> +                                                           lb_dps_bitmap);
> +            dpg->bitmap = bitmap_clone(lb_dps_bitmap, n_datapaths);
>              hmap_insert(&dp_groups, &dpg->node, hash);
>          }
> -        hmapx_destroy(&lb_dps);
> +        bitmap_free(lb_dps_bitmap);
>
>          /* Update columns. */
>          sbrec_load_balancer_set_name(lb->slb, lb->nlb->name);
> @@ -4309,7 +4328,7 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>
>      struct ovn_dp_group *dpg;
>      HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
> -        hmapx_destroy(&dpg->map);
> +        bitmap_free(dpg->bitmap);
>          free(dpg);
>      }
>      hmap_destroy(&dp_groups);
> @@ -4863,8 +4882,8 @@ struct ovn_lflow {
>      struct hmap_node hmap_node;
>
>      struct ovn_datapath *od;     /* 'logical_datapath' in SB schema.  */
> -    struct ovs_mutex odg_lock;   /* Lock guarding access to od_group */
> -    struct hmapx od_group;       /* Hash map of 'struct ovn_datapath *'. */
> +    struct ovs_mutex dpg_lock;   /* Lock guarding access to 'dpg_bitmap' */
> +    unsigned long *dpg_bitmap;   /* Bitmap of all datapaths by their 'index'.*/
>      enum ovn_stage stage;
>      uint16_t priority;
>      char *match;
> @@ -4919,7 +4938,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
>                 char *match, char *actions, char *io_port, char *ctrl_meter,
>                 char *stage_hint, const char *where)
>  {
> -    hmapx_init(&lflow->od_group);
> +    lflow->dpg_bitmap = bitmap_allocate(n_datapaths);
>      lflow->od = od;
>      lflow->stage = stage;
>      lflow->priority = priority;
> @@ -4931,7 +4950,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
>      lflow->dpg = NULL;
>      lflow->where = where;
>      if (parallelization_state != STATE_NULL) {
> -        ovs_mutex_init(&lflow->odg_lock);
> +        ovs_mutex_init(&lflow->dpg_lock);
>      }
>  }
>
> @@ -4945,11 +4964,11 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
>      }
>
>      if (parallelization_state == STATE_USE_PARALLELIZATION) {
> -        ovs_mutex_lock(&lflow_ref->odg_lock);
> -        hmapx_add(&lflow_ref->od_group, od);
> -        ovs_mutex_unlock(&lflow_ref->odg_lock);
> +        ovs_mutex_lock(&lflow_ref->dpg_lock);
> +        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
> +        ovs_mutex_unlock(&lflow_ref->dpg_lock);
>      } else {
> -        hmapx_add(&lflow_ref->od_group, od);
> +        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
>      }
>
>      return true;
> @@ -5035,7 +5054,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
>                     io_port ? xstrdup(io_port) : NULL,
>                     nullable_xstrdup(ctrl_meter),
>                     ovn_lflow_hint(stage_hint), where);
> -    hmapx_add(&lflow->od_group, od);
> +    bitmap_set1(lflow->dpg_bitmap, od->index);
>      if (parallelization_state != STATE_USE_PARALLELIZATION) {
>          hmap_insert(lflow_map, &lflow->hmap_node, hash);
>      } else {
> @@ -5166,12 +5185,12 @@ ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow)
>  {
>      if (lflow) {
>          if (parallelization_state != STATE_NULL) {
> -            ovs_mutex_destroy(&lflow->odg_lock);
> +            ovs_mutex_destroy(&lflow->dpg_lock);
>          }
>          if (lflows) {
>              hmap_remove(lflows, &lflow->hmap_node);
>          }
> -        hmapx_destroy(&lflow->od_group);
> +        bitmap_free(lflow->dpg_bitmap);
>          free(lflow->match);
>          free(lflow->actions);
>          free(lflow->io_port);
> @@ -14172,23 +14191,25 @@ ovn_sb_set_lflow_logical_dp_group(
>      struct ovsdb_idl_txn *ovnsb_txn,
>      struct hmap *dp_groups,
>      const struct sbrec_logical_flow *sbflow,
> -    const struct hmapx *od_group)
> +    const unsigned long *dpg_bitmap)
>  {
>      struct ovn_dp_group *dpg;
> +    size_t n_ods;
>
> -    if (!hmapx_count(od_group)) {
> +    n_ods = bitmap_count1(dpg_bitmap, n_datapaths);
> +
> +    if (!n_ods) {
>          sbrec_logical_flow_set_logical_dp_group(sbflow, NULL);
>          return;
>      }
>
> -    ovs_assert(hmapx_count(od_group) != 1);
> +    ovs_assert(n_ods != 1);
>
> -    dpg = ovn_dp_group_find(dp_groups, od_group,
> -                            hash_int(hmapx_count(od_group), 0));
> +    dpg = ovn_dp_group_find(dp_groups, dpg_bitmap, hash_int(n_ods, 0));
>      ovs_assert(dpg != NULL);
>
>      if (!dpg->dp_group) {
> -        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &dpg->map);
> +        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, dpg->bitmap);
>      }
>      sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group);
>  }
> @@ -14277,21 +14298,22 @@ void build_lflows(struct lflow_input *input_data,
>      fast_hmap_size_for(&single_dp_lflows, max_seen_lflow_size);
>
>      struct ovn_lflow *lflow;
> -    struct hmapx_node *node;
>      HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflows) {
> -        uint32_t hash;
>          struct ovn_dp_group *dpg;
> +        uint32_t hash, n_ods;
>
> -        ovs_assert(hmapx_count(&lflow->od_group));
> +        n_ods = bitmap_count1(lflow->dpg_bitmap, n_datapaths);
>
> -        if (hmapx_count(&lflow->od_group) == 1) {
> +        ovs_assert(n_ods);
> +
> +        if (n_ods == 1) {
>              /* There is only one datapath, so it should be moved out of the
>               * group to a single 'od'. */
> -            HMAPX_FOR_EACH (node, &lflow->od_group) {
> -                lflow->od = node->data;
> -                break;
> -            }
> -            hmapx_clear(&lflow->od_group);
> +            size_t index = bitmap_scan(lflow->dpg_bitmap, true, 0,
> +                                       n_datapaths);
> +
> +            bitmap_set0(lflow->dpg_bitmap, index);
> +            lflow->od = datapaths_array[index];
>
>              /* Logical flow should be re-hashed to allow lookups. */
>              hash = hmap_node_hash(&lflow->hmap_node);
> @@ -14304,11 +14326,11 @@ void build_lflows(struct lflow_input *input_data,
>              continue;
>          }
>
> -        hash = hash_int(hmapx_count(&lflow->od_group), 0);
> -        dpg = ovn_dp_group_find(&dp_groups, &lflow->od_group, hash);
> +        hash = hash_int(n_ods, 0);
> +        dpg = ovn_dp_group_find(&dp_groups, lflow->dpg_bitmap, hash);
>          if (!dpg) {
>              dpg = xzalloc(sizeof *dpg);
> -            hmapx_clone(&dpg->map, &lflow->od_group);
> +            dpg->bitmap = bitmap_clone(lflow->dpg_bitmap, n_datapaths);
>              hmap_insert(&dp_groups, &dpg->node, hash);
>          }
>          lflow->dpg = dpg;
> @@ -14409,36 +14431,28 @@ void build_lflows(struct lflow_input *input_data,
>              } else {
>                  /* There is a datapath group and we need to perform
>                   * a full comparison. */
> -                struct ovn_datapath **od;
> -                int n_datapaths = 0;
> +                unsigned long *dpg_bitmap;
> +                struct ovn_datapath *od;
>
> -                od = xmalloc(dp_group->n_datapaths * sizeof *od);
> +                dpg_bitmap = bitmap_allocate(n_datapaths);
>                  /* Check all logical datapaths from the group. */
>                  for (i = 0; i < dp_group->n_datapaths; i++) {
> -                    od[n_datapaths] = ovn_datapath_from_sbrec(
> +                    od = ovn_datapath_from_sbrec(
>                              input_data->datapaths, dp_group->datapaths[i]);
> -                    if (!od[n_datapaths]
> -                        || ovn_datapath_is_stale(od[n_datapaths])) {
> +                    if (!od || ovn_datapath_is_stale(od)) {
>                          continue;
>                      }
> -                    n_datapaths++;
> +                    bitmap_set1(dpg_bitmap, od->index);
>                  }
>
> -                if (n_datapaths != hmapx_count(&lflow->od_group)) {
> -                    update_dp_group = true;
> -                }
> -                /* Have to compare datapath groups in full. */
> -                for (i = 0; !update_dp_group && i < n_datapaths; i++) {
> -                    if (od[i] && !hmapx_contains(&lflow->od_group, od[i])) {
> -                        update_dp_group = true;
> -                    }
> -                }
> -                free(od);
> +                update_dp_group = !bitmap_equal(dpg_bitmap, lflow->dpg_bitmap,
> +                                                n_datapaths);
> +                bitmap_free(dpg_bitmap);
>              }
>
>              if (update_dp_group) {
>                  ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
> -                                                  sbflow, &lflow->od_group);
> +                                                  sbflow, lflow->dpg_bitmap);
>              } else if (lflow->dpg && !lflow->dpg->dp_group) {
>                  /* Setting relation between unique datapath group and
>                   * Sb DB datapath goup. */
> @@ -14462,7 +14476,7 @@ void build_lflows(struct lflow_input *input_data,
>              sbrec_logical_flow_set_logical_datapath(sbflow, lflow->od->sb);
>          }
>          ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
> -                                          sbflow, &lflow->od_group);
> +                                          sbflow, lflow->dpg_bitmap);
>          sbrec_logical_flow_set_pipeline(sbflow, pipeline);
>          sbrec_logical_flow_set_table_id(sbflow, table);
>          sbrec_logical_flow_set_priority(sbflow, lflow->priority);
> @@ -14503,7 +14517,7 @@ void build_lflows(struct lflow_input *input_data,
>
>      struct ovn_dp_group *dpg;
>      HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
> -        hmapx_destroy(&dpg->map);
> +        bitmap_free(dpg->bitmap);
>          free(dpg);
>      }
>      hmap_destroy(&dp_groups);
> diff --git a/northd/northd.h b/northd/northd.h
> index 8d299864f..fb903102b 100644
> --- a/northd/northd.h
> +++ b/northd/northd.h
> @@ -180,6 +180,9 @@ struct ovn_datapath {
>      struct hmap_node key_node;  /* Index on 'key'. */
>      struct uuid key;            /* (nbs/nbr)->header_.uuid. */
>
> +    size_t index;   /* A unique index across all datapaths.
> +                     * Datapath indexes are sequential and start from zero. */
> +
>      const struct nbrec_logical_switch *nbs;  /* May be NULL. */
>      const struct nbrec_logical_router *nbr;  /* May be NULL. */
>      const struct sbrec_datapath_binding *sb; /* May be NULL. */
> --
> 2.34.3
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Numan Siddique Sept. 16, 2022, 4:10 p.m. UTC | #2
On Fri, Sep 16, 2022 at 11:59 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Tue, Sep 13, 2022 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote:
> >
> > Every time a new logical flow is created, ovn-northd creates an
> > exclusive hash map to collect all the datapaths this logical flow is
> > applicable to.  While adding a new datapath to existing logical flow
> > hmapx will perform a lookup by hash and then add a new entry by
> > allocating a piece of memory and putting it into a hash map.  When
> > all the logical flows are created and all the datapath groups are
> > ready, northd is collecting all the unique groups by comparing them
> > to each other, destroying duplicates and replacing them with a pointer
> > to a one true group.  After that, northd is analyzing all the existing
> > flows in the Sb DB and comparing their groups to ones that the flow
> > should have.  At the end all the remaining exclusive hash maps
> > containing datapath groups are destroyed.
> >
> > The process is fairly inefficient.  The number of unique datapath
> > groups is typically much smaller than the number of logical flows.
> > So, in high-scale environments we may end up generating hundreds
> > of thousands equal groups just to discard them later.
> > Every element in a group is a separately allocated piece of memory
> > added to a hash map.  This imposes significant cost of allocating
> > them and freeing later.  Comparison of 2 hash maps to check for
> > equality involves looking up every element of one hash map in another,
> > which is not a very fast operation either.  Also hash maps are
> > inherently cache-unfriendly, so every access during iteration
> > likely results in a cache miss.
> >
> > As a result, all these datapath group manipulations end up taking
> > most of the processing time in northd.
> >
> > Let's enumerate all the logical datapaths we have assigning a
> > unique index from zero to the total number of datapaths.  And
> > create an array of datapaths for a quick access by their index.
> > Having that, instead of collecting datapath pointers in hash
> > maps, we can use bitmaps instead and turn on bits that correspond
> > to datapaths relevant to particular group.  Later we can reconstruct
> > the group by iterating over a bitmap and taking required datapaths
> > from the array.
> >
> > Advantages:
> >
> >  - Bitmaps are relatively small.  It will be only 125 bytes to
> >    accommodate a setup with 1000 datapaths.  At the same time hmapx
> >    with just one element will end up with 2 separate memory allocations
> >    to allocate at least 4 buckets and a hmapx_node.  In total that can
> >    be less than a bitmap with only one datapath, but it will be only
> >    2-3x less and comparison becomes irrelevant as soon as there is at
> >    least one big datapath group.
> >
> >  - Bitmaps are trivial to compare.  Simple memcmp does the trick.
> >    And memcmp will have one the most optimized implementations for
> >    the actual hardware in use.
> >
> >  - Flipping bits or checking if a certain bit is set in constant time.
> >    Asymptotically same as in hash maps, but the constant is much lower.
> >
> >  - Cache-friendly.  Depending on a system and the scale, the whole map
> >    can fit into a single cache line or two.  All the data is sequential
> >    in memory.
> >
> >  - Reduced number of memory allocations/deallocations.  Bitmap is
> >    allocated and freed only once as a single piece.  hmapx allocates
> >    memory for every datapath.
> >
> > Disadvantage:
> >
> >  - Counting number of bits in a bitmap is a bit heavier than just
> >    getting hmapx_count(), but we're not doing that very frequently,
> >    so it doesn't seem to impact performance in any meaningful way.
> >
> > Testing with ovn-heater shows up to 3x performance improvement for
> > 500-node density-heavy scenario.  In this case northd poll intervals
> > went down from 10 to 3 seconds, driving the test maximum latency from
> > 21 down to 6.5 seconds.  And overall test time went down from 2400 to
> > 720 seconds.  Memory consumption for the northd process also reduced
> > from 3.9 GB to 1.6 GB RSS.
> >
> > One design consideration was to make 'datapaths_array' and the
> > 'n_dataptahs' variables global.  Logically, from the perspective of
> > the I-P engine, they should be part of the northd-generated 'data'.
> > And it's not a problem for the array itself, since it is only used in
> > a couple of places.  However, 'n_datapaths' is used practically
> > everywhere, including ovn_lflow_add(), so we'll have to pass it around
> > in most of the functions in northd as an argument.  Taking that into
> > account, I decided to make them both global to avoid touching half of
> > the northd code.
> > We also have a couple of other global variables already (mainly for
> > parallel processing) that should technically be part of the I-P data,
> > but it is way simpler to keep them global.
> >
> > Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>
> Thanks Ilya for these huge improvements.
>
> The patch LGTM.
>
> Acked-by: Numan Siddique <numans@ovn.org>

@Mark Michelson @Ilya Maximets   If you want to take a look at this
patch before the 22.09 release that would be great.

Numan

>
> Numan
>
> > ---
> >  northd/northd.c | 158 ++++++++++++++++++++++++++----------------------
> >  northd/northd.h |   3 +
> >  2 files changed, 89 insertions(+), 72 deletions(-)
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 346c0c7c8..eab62b8fc 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -1358,6 +1358,10 @@ ovn_datapath_assign_requested_tnl_id(struct northd_input *input_data,
> >      }
> >  }
> >
> > +/* Array of all datapaths, with 'od->index' being their index in the array. */
> > +static struct ovn_datapath **datapaths_array = NULL;
> > +static size_t n_datapaths = 0; /* Size of the 'datapaths_array'. */
> > +
> >  /* Updates the southbound Datapath_Binding table so that it contains the
> >   * logical switches and routers specified by the northbound database.
> >   *
> > @@ -1422,6 +1426,19 @@ build_datapaths(struct northd_input *input_data,
> >          sbrec_datapath_binding_delete(od->sb);
> >          ovn_datapath_destroy(datapaths, od);
> >      }
> > +
> > +    /* Assign unique sequential indexes to all datapaths.  These are not
> > +     * visible outside of the northd loop, so, unlike the tunnel keys, it
> > +     * doesn't matter if they are different on every iteration. */
> > +    size_t index = 0;
> > +
> > +    n_datapaths = hmap_count(datapaths);
> > +    datapaths_array = xrealloc(datapaths_array,
> > +                               n_datapaths * sizeof *datapaths_array);
> > +    HMAP_FOR_EACH (od, key_node, datapaths) {
> > +        od->index = index;
> > +        datapaths_array[index++] = od;
> > +    }
> >  }
> >
> >  /* Structure representing logical router port
> > @@ -4134,19 +4151,19 @@ build_lb_port_related_data(struct hmap *datapaths, struct hmap *ports,
> >
> >
> >  struct ovn_dp_group {
> > -    struct hmapx map;
> > +    unsigned long *bitmap;
> >      struct sbrec_logical_dp_group *dp_group;
> >      struct hmap_node node;
> >  };
> >
> >  static struct ovn_dp_group *
> >  ovn_dp_group_find(const struct hmap *dp_groups,
> > -                  const struct hmapx *od, uint32_t hash)
> > +                  const unsigned long *dpg_bitmap, uint32_t hash)
> >  {
> >      struct ovn_dp_group *dpg;
> >
> >      HMAP_FOR_EACH_WITH_HASH (dpg, node, hash, dp_groups) {
> > -        if (hmapx_equals(&dpg->map, od)) {
> > +        if (bitmap_equal(dpg->bitmap, dpg_bitmap, n_datapaths)) {
> >              return dpg;
> >          }
> >      }
> > @@ -4155,16 +4172,15 @@ ovn_dp_group_find(const struct hmap *dp_groups,
> >
> >  static struct sbrec_logical_dp_group *
> >  ovn_sb_insert_logical_dp_group(struct ovsdb_idl_txn *ovnsb_txn,
> > -                               const struct hmapx *od)
> > +                               const unsigned long *dpg_bitmap)
> >  {
> >      struct sbrec_logical_dp_group *dp_group;
> >      const struct sbrec_datapath_binding **sb;
> > -    const struct hmapx_node *node;
> > -    int n = 0;
> > +    size_t n = 0, index;
> >
> > -    sb = xmalloc(hmapx_count(od) * sizeof *sb);
> > -    HMAPX_FOR_EACH (node, od) {
> > -        sb[n++] = ((struct ovn_datapath *) node->data)->sb;
> > +    sb = xmalloc(bitmap_count1(dpg_bitmap, n_datapaths) * sizeof *sb);
> > +    BITMAP_FOR_EACH_1 (index, n_datapaths, dpg_bitmap) {
> > +        sb[n++] = datapaths_array[index]->sb;
> >      }
> >      dp_group = sbrec_logical_dp_group_insert(ovnsb_txn);
> >      sbrec_logical_dp_group_set_datapaths(
> > @@ -4223,9 +4239,9 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
> >          }
> >
> >          struct ovn_dp_group *dpg = xzalloc(sizeof *dpg);
> > -        size_t i;
> > +        size_t i, n = 0;
> >
> > -        hmapx_init(&dpg->map);
> > +        dpg->bitmap = bitmap_allocate(n_datapaths);
> >          for (i = 0; i < dp_group->n_datapaths; i++) {
> >              struct ovn_datapath *datapath_od;
> >
> > @@ -4234,18 +4250,19 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
> >              if (!datapath_od || ovn_datapath_is_stale(datapath_od)) {
> >                  break;
> >              }
> > -            hmapx_add(&dpg->map, datapath_od);
> > +            bitmap_set1(dpg->bitmap, datapath_od->index);
> > +            n++;
> >          }
> >          if (i == dp_group->n_datapaths) {
> > -            uint32_t hash = hash_int(hmapx_count(&dpg->map), 0);
> > +            uint32_t hash = hash_int(n, 0);
> >
> > -            if (!ovn_dp_group_find(&dp_groups, &dpg->map, hash)) {
> > +            if (!ovn_dp_group_find(&dp_groups, dpg->bitmap, hash)) {
> >                  dpg->dp_group = dp_group;
> >                  hmap_insert(&dp_groups, &dpg->node, hash);
> >                  continue;
> >              }
> >          }
> > -        hmapx_destroy(&dpg->map);
> > +        bitmap_free(dpg->bitmap);
> >          free(dpg);
> >      }
> >      hmapx_destroy(&existing_lbs);
> > @@ -4278,23 +4295,25 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
> >          }
> >
> >          /* Find datapath group for this load balancer. */
> > -        struct hmapx lb_dps = HMAPX_INITIALIZER(&lb_dps);
> > +        unsigned long *lb_dps_bitmap;
> >          struct ovn_dp_group *dpg;
> >          uint32_t hash;
> >
> > +        lb_dps_bitmap = bitmap_allocate(n_datapaths);
> >          for (size_t i = 0; i < lb->n_nb_ls; i++) {
> > -            hmapx_add(&lb_dps, lb->nb_ls[i]);
> > +            bitmap_set1(lb_dps_bitmap, lb->nb_ls[i]->index);
> >          }
> >
> > -        hash = hash_int(hmapx_count(&lb_dps), 0);
> > -        dpg = ovn_dp_group_find(&dp_groups, &lb_dps, hash);
> > +        hash = hash_int(bitmap_count1(lb_dps_bitmap, n_datapaths), 0);
> > +        dpg = ovn_dp_group_find(&dp_groups, lb_dps_bitmap, hash);
> >          if (!dpg) {
> >              dpg = xzalloc(sizeof *dpg);
> > -            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &lb_dps);
> > -            hmapx_clone(&dpg->map, &lb_dps);
> > +            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
> > +                                                           lb_dps_bitmap);
> > +            dpg->bitmap = bitmap_clone(lb_dps_bitmap, n_datapaths);
> >              hmap_insert(&dp_groups, &dpg->node, hash);
> >          }
> > -        hmapx_destroy(&lb_dps);
> > +        bitmap_free(lb_dps_bitmap);
> >
> >          /* Update columns. */
> >          sbrec_load_balancer_set_name(lb->slb, lb->nlb->name);
> > @@ -4309,7 +4328,7 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
> >
> >      struct ovn_dp_group *dpg;
> >      HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
> > -        hmapx_destroy(&dpg->map);
> > +        bitmap_free(dpg->bitmap);
> >          free(dpg);
> >      }
> >      hmap_destroy(&dp_groups);
> > @@ -4863,8 +4882,8 @@ struct ovn_lflow {
> >      struct hmap_node hmap_node;
> >
> >      struct ovn_datapath *od;     /* 'logical_datapath' in SB schema.  */
> > -    struct ovs_mutex odg_lock;   /* Lock guarding access to od_group */
> > -    struct hmapx od_group;       /* Hash map of 'struct ovn_datapath *'. */
> > +    struct ovs_mutex dpg_lock;   /* Lock guarding access to 'dpg_bitmap' */
> > +    unsigned long *dpg_bitmap;   /* Bitmap of all datapaths by their 'index'.*/
> >      enum ovn_stage stage;
> >      uint16_t priority;
> >      char *match;
> > @@ -4919,7 +4938,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
> >                 char *match, char *actions, char *io_port, char *ctrl_meter,
> >                 char *stage_hint, const char *where)
> >  {
> > -    hmapx_init(&lflow->od_group);
> > +    lflow->dpg_bitmap = bitmap_allocate(n_datapaths);
> >      lflow->od = od;
> >      lflow->stage = stage;
> >      lflow->priority = priority;
> > @@ -4931,7 +4950,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
> >      lflow->dpg = NULL;
> >      lflow->where = where;
> >      if (parallelization_state != STATE_NULL) {
> > -        ovs_mutex_init(&lflow->odg_lock);
> > +        ovs_mutex_init(&lflow->dpg_lock);
> >      }
> >  }
> >
> > @@ -4945,11 +4964,11 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
> >      }
> >
> >      if (parallelization_state == STATE_USE_PARALLELIZATION) {
> > -        ovs_mutex_lock(&lflow_ref->odg_lock);
> > -        hmapx_add(&lflow_ref->od_group, od);
> > -        ovs_mutex_unlock(&lflow_ref->odg_lock);
> > +        ovs_mutex_lock(&lflow_ref->dpg_lock);
> > +        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
> > +        ovs_mutex_unlock(&lflow_ref->dpg_lock);
> >      } else {
> > -        hmapx_add(&lflow_ref->od_group, od);
> > +        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
> >      }
> >
> >      return true;
> > @@ -5035,7 +5054,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
> >                     io_port ? xstrdup(io_port) : NULL,
> >                     nullable_xstrdup(ctrl_meter),
> >                     ovn_lflow_hint(stage_hint), where);
> > -    hmapx_add(&lflow->od_group, od);
> > +    bitmap_set1(lflow->dpg_bitmap, od->index);
> >      if (parallelization_state != STATE_USE_PARALLELIZATION) {
> >          hmap_insert(lflow_map, &lflow->hmap_node, hash);
> >      } else {
> > @@ -5166,12 +5185,12 @@ ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow)
> >  {
> >      if (lflow) {
> >          if (parallelization_state != STATE_NULL) {
> > -            ovs_mutex_destroy(&lflow->odg_lock);
> > +            ovs_mutex_destroy(&lflow->dpg_lock);
> >          }
> >          if (lflows) {
> >              hmap_remove(lflows, &lflow->hmap_node);
> >          }
> > -        hmapx_destroy(&lflow->od_group);
> > +        bitmap_free(lflow->dpg_bitmap);
> >          free(lflow->match);
> >          free(lflow->actions);
> >          free(lflow->io_port);
> > @@ -14172,23 +14191,25 @@ ovn_sb_set_lflow_logical_dp_group(
> >      struct ovsdb_idl_txn *ovnsb_txn,
> >      struct hmap *dp_groups,
> >      const struct sbrec_logical_flow *sbflow,
> > -    const struct hmapx *od_group)
> > +    const unsigned long *dpg_bitmap)
> >  {
> >      struct ovn_dp_group *dpg;
> > +    size_t n_ods;
> >
> > -    if (!hmapx_count(od_group)) {
> > +    n_ods = bitmap_count1(dpg_bitmap, n_datapaths);
> > +
> > +    if (!n_ods) {
> >          sbrec_logical_flow_set_logical_dp_group(sbflow, NULL);
> >          return;
> >      }
> >
> > -    ovs_assert(hmapx_count(od_group) != 1);
> > +    ovs_assert(n_ods != 1);
> >
> > -    dpg = ovn_dp_group_find(dp_groups, od_group,
> > -                            hash_int(hmapx_count(od_group), 0));
> > +    dpg = ovn_dp_group_find(dp_groups, dpg_bitmap, hash_int(n_ods, 0));
> >      ovs_assert(dpg != NULL);
> >
> >      if (!dpg->dp_group) {
> > -        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &dpg->map);
> > +        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, dpg->bitmap);
> >      }
> >      sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group);
> >  }
> > @@ -14277,21 +14298,22 @@ void build_lflows(struct lflow_input *input_data,
> >      fast_hmap_size_for(&single_dp_lflows, max_seen_lflow_size);
> >
> >      struct ovn_lflow *lflow;
> > -    struct hmapx_node *node;
> >      HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflows) {
> > -        uint32_t hash;
> >          struct ovn_dp_group *dpg;
> > +        uint32_t hash, n_ods;
> >
> > -        ovs_assert(hmapx_count(&lflow->od_group));
> > +        n_ods = bitmap_count1(lflow->dpg_bitmap, n_datapaths);
> >
> > -        if (hmapx_count(&lflow->od_group) == 1) {
> > +        ovs_assert(n_ods);
> > +
> > +        if (n_ods == 1) {
> >              /* There is only one datapath, so it should be moved out of the
> >               * group to a single 'od'. */
> > -            HMAPX_FOR_EACH (node, &lflow->od_group) {
> > -                lflow->od = node->data;
> > -                break;
> > -            }
> > -            hmapx_clear(&lflow->od_group);
> > +            size_t index = bitmap_scan(lflow->dpg_bitmap, true, 0,
> > +                                       n_datapaths);
> > +
> > +            bitmap_set0(lflow->dpg_bitmap, index);
> > +            lflow->od = datapaths_array[index];
> >
> >              /* Logical flow should be re-hashed to allow lookups. */
> >              hash = hmap_node_hash(&lflow->hmap_node);
> > @@ -14304,11 +14326,11 @@ void build_lflows(struct lflow_input *input_data,
> >              continue;
> >          }
> >
> > -        hash = hash_int(hmapx_count(&lflow->od_group), 0);
> > -        dpg = ovn_dp_group_find(&dp_groups, &lflow->od_group, hash);
> > +        hash = hash_int(n_ods, 0);
> > +        dpg = ovn_dp_group_find(&dp_groups, lflow->dpg_bitmap, hash);
> >          if (!dpg) {
> >              dpg = xzalloc(sizeof *dpg);
> > -            hmapx_clone(&dpg->map, &lflow->od_group);
> > +            dpg->bitmap = bitmap_clone(lflow->dpg_bitmap, n_datapaths);
> >              hmap_insert(&dp_groups, &dpg->node, hash);
> >          }
> >          lflow->dpg = dpg;
> > @@ -14409,36 +14431,28 @@ void build_lflows(struct lflow_input *input_data,
> >              } else {
> >                  /* There is a datapath group and we need to perform
> >                   * a full comparison. */
> > -                struct ovn_datapath **od;
> > -                int n_datapaths = 0;
> > +                unsigned long *dpg_bitmap;
> > +                struct ovn_datapath *od;
> >
> > -                od = xmalloc(dp_group->n_datapaths * sizeof *od);
> > +                dpg_bitmap = bitmap_allocate(n_datapaths);
> >                  /* Check all logical datapaths from the group. */
> >                  for (i = 0; i < dp_group->n_datapaths; i++) {
> > -                    od[n_datapaths] = ovn_datapath_from_sbrec(
> > +                    od = ovn_datapath_from_sbrec(
> >                              input_data->datapaths, dp_group->datapaths[i]);
> > -                    if (!od[n_datapaths]
> > -                        || ovn_datapath_is_stale(od[n_datapaths])) {
> > +                    if (!od || ovn_datapath_is_stale(od)) {
> >                          continue;
> >                      }
> > -                    n_datapaths++;
> > +                    bitmap_set1(dpg_bitmap, od->index);
> >                  }
> >
> > -                if (n_datapaths != hmapx_count(&lflow->od_group)) {
> > -                    update_dp_group = true;
> > -                }
> > -                /* Have to compare datapath groups in full. */
> > -                for (i = 0; !update_dp_group && i < n_datapaths; i++) {
> > -                    if (od[i] && !hmapx_contains(&lflow->od_group, od[i])) {
> > -                        update_dp_group = true;
> > -                    }
> > -                }
> > -                free(od);
> > +                update_dp_group = !bitmap_equal(dpg_bitmap, lflow->dpg_bitmap,
> > +                                                n_datapaths);
> > +                bitmap_free(dpg_bitmap);
> >              }
> >
> >              if (update_dp_group) {
> >                  ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
> > -                                                  sbflow, &lflow->od_group);
> > +                                                  sbflow, lflow->dpg_bitmap);
> >              } else if (lflow->dpg && !lflow->dpg->dp_group) {
> >                  /* Setting relation between unique datapath group and
> >                   * Sb DB datapath goup. */
> > @@ -14462,7 +14476,7 @@ void build_lflows(struct lflow_input *input_data,
> >              sbrec_logical_flow_set_logical_datapath(sbflow, lflow->od->sb);
> >          }
> >          ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
> > -                                          sbflow, &lflow->od_group);
> > +                                          sbflow, lflow->dpg_bitmap);
> >          sbrec_logical_flow_set_pipeline(sbflow, pipeline);
> >          sbrec_logical_flow_set_table_id(sbflow, table);
> >          sbrec_logical_flow_set_priority(sbflow, lflow->priority);
> > @@ -14503,7 +14517,7 @@ void build_lflows(struct lflow_input *input_data,
> >
> >      struct ovn_dp_group *dpg;
> >      HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
> > -        hmapx_destroy(&dpg->map);
> > +        bitmap_free(dpg->bitmap);
> >          free(dpg);
> >      }
> >      hmap_destroy(&dp_groups);
> > diff --git a/northd/northd.h b/northd/northd.h
> > index 8d299864f..fb903102b 100644
> > --- a/northd/northd.h
> > +++ b/northd/northd.h
> > @@ -180,6 +180,9 @@ struct ovn_datapath {
> >      struct hmap_node key_node;  /* Index on 'key'. */
> >      struct uuid key;            /* (nbs/nbr)->header_.uuid. */
> >
> > +    size_t index;   /* A unique index across all datapaths.
> > +                     * Datapath indexes are sequential and start from zero. */
> > +
> >      const struct nbrec_logical_switch *nbs;  /* May be NULL. */
> >      const struct nbrec_logical_router *nbr;  /* May be NULL. */
> >      const struct sbrec_datapath_binding *sb; /* May be NULL. */
> > --
> > 2.34.3
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
Mark Michelson Sept. 16, 2022, 4:50 p.m. UTC | #3
On 9/16/22 12:10, Numan Siddique wrote:
> On Fri, Sep 16, 2022 at 11:59 AM Numan Siddique <numans@ovn.org> wrote:
>>
>> On Tue, Sep 13, 2022 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>>>
>>> Every time a new logical flow is created, ovn-northd creates an
>>> exclusive hash map to collect all the datapaths this logical flow is
>>> applicable to.  While adding a new datapath to existing logical flow
>>> hmapx will perform a lookup by hash and then add a new entry by
>>> allocating a piece of memory and putting it into a hash map.  When
>>> all the logical flows are created and all the datapath groups are
>>> ready, northd is collecting all the unique groups by comparing them
>>> to each other, destroying duplicates and replacing them with a pointer
>>> to a one true group.  After that, northd is analyzing all the existing
>>> flows in the Sb DB and comparing their groups to ones that the flow
>>> should have.  At the end all the remaining exclusive hash maps
>>> containing datapath groups are destroyed.
>>>
>>> The process is fairly inefficient.  The number of unique datapath
>>> groups is typically much smaller than the number of logical flows.
>>> So, in high-scale environments we may end up generating hundreds
>>> of thousands equal groups just to discard them later.
>>> Every element in a group is a separately allocated piece of memory
>>> added to a hash map.  This imposes significant cost of allocating
>>> them and freeing later.  Comparison of 2 hash maps to check for
>>> equality involves looking up every element of one hash map in another,
>>> which is not a very fast operation either.  Also hash maps are
>>> inherently cache-unfriendly, so every access during iteration
>>> likely results in a cache miss.
>>>
>>> As a result, all these datapath group manipulations end up taking
>>> most of the processing time in northd.
>>>
>>> Let's enumerate all the logical datapaths we have assigning a
>>> unique index from zero to the total number of datapaths.  And
>>> create an array of datapaths for a quick access by their index.
>>> Having that, instead of collecting datapath pointers in hash
>>> maps, we can use bitmaps instead and turn on bits that correspond
>>> to datapaths relevant to particular group.  Later we can reconstruct
>>> the group by iterating over a bitmap and taking required datapaths
>>> from the array.
>>>
>>> Advantages:
>>>
>>>   - Bitmaps are relatively small.  It will be only 125 bytes to
>>>     accommodate a setup with 1000 datapaths.  At the same time hmapx
>>>     with just one element will end up with 2 separate memory allocations
>>>     to allocate at least 4 buckets and a hmapx_node.  In total that can
>>>     be less than a bitmap with only one datapath, but it will be only
>>>     2-3x less and comparison becomes irrelevant as soon as there is at
>>>     least one big datapath group.
>>>
>>>   - Bitmaps are trivial to compare.  Simple memcmp does the trick.
>>>     And memcmp will have one the most optimized implementations for
>>>     the actual hardware in use.
>>>
>>>   - Flipping bits or checking if a certain bit is set in constant time.
>>>     Asymptotically same as in hash maps, but the constant is much lower.
>>>
>>>   - Cache-friendly.  Depending on a system and the scale, the whole map
>>>     can fit into a single cache line or two.  All the data is sequential
>>>     in memory.
>>>
>>>   - Reduced number of memory allocations/deallocations.  Bitmap is
>>>     allocated and freed only once as a single piece.  hmapx allocates
>>>     memory for every datapath.
>>>
>>> Disadvantage:
>>>
>>>   - Counting number of bits in a bitmap is a bit heavier than just
>>>     getting hmapx_count(), but we're not doing that very frequently,
>>>     so it doesn't seem to impact performance in any meaningful way.
>>>
>>> Testing with ovn-heater shows up to 3x performance improvement for
>>> 500-node density-heavy scenario.  In this case northd poll intervals
>>> went down from 10 to 3 seconds, driving the test maximum latency from
>>> 21 down to 6.5 seconds.  And overall test time went down from 2400 to
>>> 720 seconds.  Memory consumption for the northd process also reduced
>>> from 3.9 GB to 1.6 GB RSS.
>>>
>>> One design consideration was to make 'datapaths_array' and the
>>> 'n_dataptahs' variables global.  Logically, from the perspective of
>>> the I-P engine, they should be part of the northd-generated 'data'.
>>> And it's not a problem for the array itself, since it is only used in
>>> a couple of places.  However, 'n_datapaths' is used practically
>>> everywhere, including ovn_lflow_add(), so we'll have to pass it around
>>> in most of the functions in northd as an argument.  Taking that into
>>> account, I decided to make them both global to avoid touching half of
>>> the northd code.
>>> We also have a couple of other global variables already (mainly for
>>> parallel processing) that should technically be part of the I-P data,
>>> but it is way simpler to keep them global.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>
>> Thanks Ilya for these huge improvements.
>>
>> The patch LGTM.
>>
>> Acked-by: Numan Siddique <numans@ovn.org>
> 
> @Mark Michelson @Ilya Maximets   If you want to take a look at this
> patch before the 22.09 release that would be great.
> 
> Numan
> 

I merged this to main and 22.09. Just in time!

>>
>> Numan
>>
>>> ---
>>>   northd/northd.c | 158 ++++++++++++++++++++++++++----------------------
>>>   northd/northd.h |   3 +
>>>   2 files changed, 89 insertions(+), 72 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 346c0c7c8..eab62b8fc 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -1358,6 +1358,10 @@ ovn_datapath_assign_requested_tnl_id(struct northd_input *input_data,
>>>       }
>>>   }
>>>
>>> +/* Array of all datapaths, with 'od->index' being their index in the array. */
>>> +static struct ovn_datapath **datapaths_array = NULL;
>>> +static size_t n_datapaths = 0; /* Size of the 'datapaths_array'. */
>>> +
>>>   /* Updates the southbound Datapath_Binding table so that it contains the
>>>    * logical switches and routers specified by the northbound database.
>>>    *
>>> @@ -1422,6 +1426,19 @@ build_datapaths(struct northd_input *input_data,
>>>           sbrec_datapath_binding_delete(od->sb);
>>>           ovn_datapath_destroy(datapaths, od);
>>>       }
>>> +
>>> +    /* Assign unique sequential indexes to all datapaths.  These are not
>>> +     * visible outside of the northd loop, so, unlike the tunnel keys, it
>>> +     * doesn't matter if they are different on every iteration. */
>>> +    size_t index = 0;
>>> +
>>> +    n_datapaths = hmap_count(datapaths);
>>> +    datapaths_array = xrealloc(datapaths_array,
>>> +                               n_datapaths * sizeof *datapaths_array);
>>> +    HMAP_FOR_EACH (od, key_node, datapaths) {
>>> +        od->index = index;
>>> +        datapaths_array[index++] = od;
>>> +    }
>>>   }
>>>
>>>   /* Structure representing logical router port
>>> @@ -4134,19 +4151,19 @@ build_lb_port_related_data(struct hmap *datapaths, struct hmap *ports,
>>>
>>>
>>>   struct ovn_dp_group {
>>> -    struct hmapx map;
>>> +    unsigned long *bitmap;
>>>       struct sbrec_logical_dp_group *dp_group;
>>>       struct hmap_node node;
>>>   };
>>>
>>>   static struct ovn_dp_group *
>>>   ovn_dp_group_find(const struct hmap *dp_groups,
>>> -                  const struct hmapx *od, uint32_t hash)
>>> +                  const unsigned long *dpg_bitmap, uint32_t hash)
>>>   {
>>>       struct ovn_dp_group *dpg;
>>>
>>>       HMAP_FOR_EACH_WITH_HASH (dpg, node, hash, dp_groups) {
>>> -        if (hmapx_equals(&dpg->map, od)) {
>>> +        if (bitmap_equal(dpg->bitmap, dpg_bitmap, n_datapaths)) {
>>>               return dpg;
>>>           }
>>>       }
>>> @@ -4155,16 +4172,15 @@ ovn_dp_group_find(const struct hmap *dp_groups,
>>>
>>>   static struct sbrec_logical_dp_group *
>>>   ovn_sb_insert_logical_dp_group(struct ovsdb_idl_txn *ovnsb_txn,
>>> -                               const struct hmapx *od)
>>> +                               const unsigned long *dpg_bitmap)
>>>   {
>>>       struct sbrec_logical_dp_group *dp_group;
>>>       const struct sbrec_datapath_binding **sb;
>>> -    const struct hmapx_node *node;
>>> -    int n = 0;
>>> +    size_t n = 0, index;
>>>
>>> -    sb = xmalloc(hmapx_count(od) * sizeof *sb);
>>> -    HMAPX_FOR_EACH (node, od) {
>>> -        sb[n++] = ((struct ovn_datapath *) node->data)->sb;
>>> +    sb = xmalloc(bitmap_count1(dpg_bitmap, n_datapaths) * sizeof *sb);
>>> +    BITMAP_FOR_EACH_1 (index, n_datapaths, dpg_bitmap) {
>>> +        sb[n++] = datapaths_array[index]->sb;
>>>       }
>>>       dp_group = sbrec_logical_dp_group_insert(ovnsb_txn);
>>>       sbrec_logical_dp_group_set_datapaths(
>>> @@ -4223,9 +4239,9 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>>>           }
>>>
>>>           struct ovn_dp_group *dpg = xzalloc(sizeof *dpg);
>>> -        size_t i;
>>> +        size_t i, n = 0;
>>>
>>> -        hmapx_init(&dpg->map);
>>> +        dpg->bitmap = bitmap_allocate(n_datapaths);
>>>           for (i = 0; i < dp_group->n_datapaths; i++) {
>>>               struct ovn_datapath *datapath_od;
>>>
>>> @@ -4234,18 +4250,19 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>>>               if (!datapath_od || ovn_datapath_is_stale(datapath_od)) {
>>>                   break;
>>>               }
>>> -            hmapx_add(&dpg->map, datapath_od);
>>> +            bitmap_set1(dpg->bitmap, datapath_od->index);
>>> +            n++;
>>>           }
>>>           if (i == dp_group->n_datapaths) {
>>> -            uint32_t hash = hash_int(hmapx_count(&dpg->map), 0);
>>> +            uint32_t hash = hash_int(n, 0);
>>>
>>> -            if (!ovn_dp_group_find(&dp_groups, &dpg->map, hash)) {
>>> +            if (!ovn_dp_group_find(&dp_groups, dpg->bitmap, hash)) {
>>>                   dpg->dp_group = dp_group;
>>>                   hmap_insert(&dp_groups, &dpg->node, hash);
>>>                   continue;
>>>               }
>>>           }
>>> -        hmapx_destroy(&dpg->map);
>>> +        bitmap_free(dpg->bitmap);
>>>           free(dpg);
>>>       }
>>>       hmapx_destroy(&existing_lbs);
>>> @@ -4278,23 +4295,25 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>>>           }
>>>
>>>           /* Find datapath group for this load balancer. */
>>> -        struct hmapx lb_dps = HMAPX_INITIALIZER(&lb_dps);
>>> +        unsigned long *lb_dps_bitmap;
>>>           struct ovn_dp_group *dpg;
>>>           uint32_t hash;
>>>
>>> +        lb_dps_bitmap = bitmap_allocate(n_datapaths);
>>>           for (size_t i = 0; i < lb->n_nb_ls; i++) {
>>> -            hmapx_add(&lb_dps, lb->nb_ls[i]);
>>> +            bitmap_set1(lb_dps_bitmap, lb->nb_ls[i]->index);
>>>           }
>>>
>>> -        hash = hash_int(hmapx_count(&lb_dps), 0);
>>> -        dpg = ovn_dp_group_find(&dp_groups, &lb_dps, hash);
>>> +        hash = hash_int(bitmap_count1(lb_dps_bitmap, n_datapaths), 0);
>>> +        dpg = ovn_dp_group_find(&dp_groups, lb_dps_bitmap, hash);
>>>           if (!dpg) {
>>>               dpg = xzalloc(sizeof *dpg);
>>> -            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &lb_dps);
>>> -            hmapx_clone(&dpg->map, &lb_dps);
>>> +            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
>>> +                                                           lb_dps_bitmap);
>>> +            dpg->bitmap = bitmap_clone(lb_dps_bitmap, n_datapaths);
>>>               hmap_insert(&dp_groups, &dpg->node, hash);
>>>           }
>>> -        hmapx_destroy(&lb_dps);
>>> +        bitmap_free(lb_dps_bitmap);
>>>
>>>           /* Update columns. */
>>>           sbrec_load_balancer_set_name(lb->slb, lb->nlb->name);
>>> @@ -4309,7 +4328,7 @@ sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
>>>
>>>       struct ovn_dp_group *dpg;
>>>       HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
>>> -        hmapx_destroy(&dpg->map);
>>> +        bitmap_free(dpg->bitmap);
>>>           free(dpg);
>>>       }
>>>       hmap_destroy(&dp_groups);
>>> @@ -4863,8 +4882,8 @@ struct ovn_lflow {
>>>       struct hmap_node hmap_node;
>>>
>>>       struct ovn_datapath *od;     /* 'logical_datapath' in SB schema.  */
>>> -    struct ovs_mutex odg_lock;   /* Lock guarding access to od_group */
>>> -    struct hmapx od_group;       /* Hash map of 'struct ovn_datapath *'. */
>>> +    struct ovs_mutex dpg_lock;   /* Lock guarding access to 'dpg_bitmap' */
>>> +    unsigned long *dpg_bitmap;   /* Bitmap of all datapaths by their 'index'.*/
>>>       enum ovn_stage stage;
>>>       uint16_t priority;
>>>       char *match;
>>> @@ -4919,7 +4938,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
>>>                  char *match, char *actions, char *io_port, char *ctrl_meter,
>>>                  char *stage_hint, const char *where)
>>>   {
>>> -    hmapx_init(&lflow->od_group);
>>> +    lflow->dpg_bitmap = bitmap_allocate(n_datapaths);
>>>       lflow->od = od;
>>>       lflow->stage = stage;
>>>       lflow->priority = priority;
>>> @@ -4931,7 +4950,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
>>>       lflow->dpg = NULL;
>>>       lflow->where = where;
>>>       if (parallelization_state != STATE_NULL) {
>>> -        ovs_mutex_init(&lflow->odg_lock);
>>> +        ovs_mutex_init(&lflow->dpg_lock);
>>>       }
>>>   }
>>>
>>> @@ -4945,11 +4964,11 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
>>>       }
>>>
>>>       if (parallelization_state == STATE_USE_PARALLELIZATION) {
>>> -        ovs_mutex_lock(&lflow_ref->odg_lock);
>>> -        hmapx_add(&lflow_ref->od_group, od);
>>> -        ovs_mutex_unlock(&lflow_ref->odg_lock);
>>> +        ovs_mutex_lock(&lflow_ref->dpg_lock);
>>> +        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
>>> +        ovs_mutex_unlock(&lflow_ref->dpg_lock);
>>>       } else {
>>> -        hmapx_add(&lflow_ref->od_group, od);
>>> +        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
>>>       }
>>>
>>>       return true;
>>> @@ -5035,7 +5054,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
>>>                      io_port ? xstrdup(io_port) : NULL,
>>>                      nullable_xstrdup(ctrl_meter),
>>>                      ovn_lflow_hint(stage_hint), where);
>>> -    hmapx_add(&lflow->od_group, od);
>>> +    bitmap_set1(lflow->dpg_bitmap, od->index);
>>>       if (parallelization_state != STATE_USE_PARALLELIZATION) {
>>>           hmap_insert(lflow_map, &lflow->hmap_node, hash);
>>>       } else {
>>> @@ -5166,12 +5185,12 @@ ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow)
>>>   {
>>>       if (lflow) {
>>>           if (parallelization_state != STATE_NULL) {
>>> -            ovs_mutex_destroy(&lflow->odg_lock);
>>> +            ovs_mutex_destroy(&lflow->dpg_lock);
>>>           }
>>>           if (lflows) {
>>>               hmap_remove(lflows, &lflow->hmap_node);
>>>           }
>>> -        hmapx_destroy(&lflow->od_group);
>>> +        bitmap_free(lflow->dpg_bitmap);
>>>           free(lflow->match);
>>>           free(lflow->actions);
>>>           free(lflow->io_port);
>>> @@ -14172,23 +14191,25 @@ ovn_sb_set_lflow_logical_dp_group(
>>>       struct ovsdb_idl_txn *ovnsb_txn,
>>>       struct hmap *dp_groups,
>>>       const struct sbrec_logical_flow *sbflow,
>>> -    const struct hmapx *od_group)
>>> +    const unsigned long *dpg_bitmap)
>>>   {
>>>       struct ovn_dp_group *dpg;
>>> +    size_t n_ods;
>>>
>>> -    if (!hmapx_count(od_group)) {
>>> +    n_ods = bitmap_count1(dpg_bitmap, n_datapaths);
>>> +
>>> +    if (!n_ods) {
>>>           sbrec_logical_flow_set_logical_dp_group(sbflow, NULL);
>>>           return;
>>>       }
>>>
>>> -    ovs_assert(hmapx_count(od_group) != 1);
>>> +    ovs_assert(n_ods != 1);
>>>
>>> -    dpg = ovn_dp_group_find(dp_groups, od_group,
>>> -                            hash_int(hmapx_count(od_group), 0));
>>> +    dpg = ovn_dp_group_find(dp_groups, dpg_bitmap, hash_int(n_ods, 0));
>>>       ovs_assert(dpg != NULL);
>>>
>>>       if (!dpg->dp_group) {
>>> -        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &dpg->map);
>>> +        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, dpg->bitmap);
>>>       }
>>>       sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group);
>>>   }
>>> @@ -14277,21 +14298,22 @@ void build_lflows(struct lflow_input *input_data,
>>>       fast_hmap_size_for(&single_dp_lflows, max_seen_lflow_size);
>>>
>>>       struct ovn_lflow *lflow;
>>> -    struct hmapx_node *node;
>>>       HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflows) {
>>> -        uint32_t hash;
>>>           struct ovn_dp_group *dpg;
>>> +        uint32_t hash, n_ods;
>>>
>>> -        ovs_assert(hmapx_count(&lflow->od_group));
>>> +        n_ods = bitmap_count1(lflow->dpg_bitmap, n_datapaths);
>>>
>>> -        if (hmapx_count(&lflow->od_group) == 1) {
>>> +        ovs_assert(n_ods);
>>> +
>>> +        if (n_ods == 1) {
>>>               /* There is only one datapath, so it should be moved out of the
>>>                * group to a single 'od'. */
>>> -            HMAPX_FOR_EACH (node, &lflow->od_group) {
>>> -                lflow->od = node->data;
>>> -                break;
>>> -            }
>>> -            hmapx_clear(&lflow->od_group);
>>> +            size_t index = bitmap_scan(lflow->dpg_bitmap, true, 0,
>>> +                                       n_datapaths);
>>> +
>>> +            bitmap_set0(lflow->dpg_bitmap, index);
>>> +            lflow->od = datapaths_array[index];
>>>
>>>               /* Logical flow should be re-hashed to allow lookups. */
>>>               hash = hmap_node_hash(&lflow->hmap_node);
>>> @@ -14304,11 +14326,11 @@ void build_lflows(struct lflow_input *input_data,
>>>               continue;
>>>           }
>>>
>>> -        hash = hash_int(hmapx_count(&lflow->od_group), 0);
>>> -        dpg = ovn_dp_group_find(&dp_groups, &lflow->od_group, hash);
>>> +        hash = hash_int(n_ods, 0);
>>> +        dpg = ovn_dp_group_find(&dp_groups, lflow->dpg_bitmap, hash);
>>>           if (!dpg) {
>>>               dpg = xzalloc(sizeof *dpg);
>>> -            hmapx_clone(&dpg->map, &lflow->od_group);
>>> +            dpg->bitmap = bitmap_clone(lflow->dpg_bitmap, n_datapaths);
>>>               hmap_insert(&dp_groups, &dpg->node, hash);
>>>           }
>>>           lflow->dpg = dpg;
>>> @@ -14409,36 +14431,28 @@ void build_lflows(struct lflow_input *input_data,
>>>               } else {
>>>                   /* There is a datapath group and we need to perform
>>>                    * a full comparison. */
>>> -                struct ovn_datapath **od;
>>> -                int n_datapaths = 0;
>>> +                unsigned long *dpg_bitmap;
>>> +                struct ovn_datapath *od;
>>>
>>> -                od = xmalloc(dp_group->n_datapaths * sizeof *od);
>>> +                dpg_bitmap = bitmap_allocate(n_datapaths);
>>>                   /* Check all logical datapaths from the group. */
>>>                   for (i = 0; i < dp_group->n_datapaths; i++) {
>>> -                    od[n_datapaths] = ovn_datapath_from_sbrec(
>>> +                    od = ovn_datapath_from_sbrec(
>>>                               input_data->datapaths, dp_group->datapaths[i]);
>>> -                    if (!od[n_datapaths]
>>> -                        || ovn_datapath_is_stale(od[n_datapaths])) {
>>> +                    if (!od || ovn_datapath_is_stale(od)) {
>>>                           continue;
>>>                       }
>>> -                    n_datapaths++;
>>> +                    bitmap_set1(dpg_bitmap, od->index);
>>>                   }
>>>
>>> -                if (n_datapaths != hmapx_count(&lflow->od_group)) {
>>> -                    update_dp_group = true;
>>> -                }
>>> -                /* Have to compare datapath groups in full. */
>>> -                for (i = 0; !update_dp_group && i < n_datapaths; i++) {
>>> -                    if (od[i] && !hmapx_contains(&lflow->od_group, od[i])) {
>>> -                        update_dp_group = true;
>>> -                    }
>>> -                }
>>> -                free(od);
>>> +                update_dp_group = !bitmap_equal(dpg_bitmap, lflow->dpg_bitmap,
>>> +                                                n_datapaths);
>>> +                bitmap_free(dpg_bitmap);
>>>               }
>>>
>>>               if (update_dp_group) {
>>>                   ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
>>> -                                                  sbflow, &lflow->od_group);
>>> +                                                  sbflow, lflow->dpg_bitmap);
>>>               } else if (lflow->dpg && !lflow->dpg->dp_group) {
>>>                   /* Setting relation between unique datapath group and
>>>                    * Sb DB datapath goup. */
>>> @@ -14462,7 +14476,7 @@ void build_lflows(struct lflow_input *input_data,
>>>               sbrec_logical_flow_set_logical_datapath(sbflow, lflow->od->sb);
>>>           }
>>>           ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
>>> -                                          sbflow, &lflow->od_group);
>>> +                                          sbflow, lflow->dpg_bitmap);
>>>           sbrec_logical_flow_set_pipeline(sbflow, pipeline);
>>>           sbrec_logical_flow_set_table_id(sbflow, table);
>>>           sbrec_logical_flow_set_priority(sbflow, lflow->priority);
>>> @@ -14503,7 +14517,7 @@ void build_lflows(struct lflow_input *input_data,
>>>
>>>       struct ovn_dp_group *dpg;
>>>       HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
>>> -        hmapx_destroy(&dpg->map);
>>> +        bitmap_free(dpg->bitmap);
>>>           free(dpg);
>>>       }
>>>       hmap_destroy(&dp_groups);
>>> diff --git a/northd/northd.h b/northd/northd.h
>>> index 8d299864f..fb903102b 100644
>>> --- a/northd/northd.h
>>> +++ b/northd/northd.h
>>> @@ -180,6 +180,9 @@ struct ovn_datapath {
>>>       struct hmap_node key_node;  /* Index on 'key'. */
>>>       struct uuid key;            /* (nbs/nbr)->header_.uuid. */
>>>
>>> +    size_t index;   /* A unique index across all datapaths.
>>> +                     * Datapath indexes are sequential and start from zero. */
>>> +
>>>       const struct nbrec_logical_switch *nbs;  /* May be NULL. */
>>>       const struct nbrec_logical_router *nbr;  /* May be NULL. */
>>>       const struct sbrec_datapath_binding *sb; /* May be NULL. */
>>> --
>>> 2.34.3
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index 346c0c7c8..eab62b8fc 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -1358,6 +1358,10 @@  ovn_datapath_assign_requested_tnl_id(struct northd_input *input_data,
     }
 }
 
+/* Array of all datapaths, with 'od->index' being their index in the array. */
+static struct ovn_datapath **datapaths_array = NULL;
+static size_t n_datapaths = 0; /* Size of the 'datapaths_array'. */
+
 /* Updates the southbound Datapath_Binding table so that it contains the
  * logical switches and routers specified by the northbound database.
  *
@@ -1422,6 +1426,19 @@  build_datapaths(struct northd_input *input_data,
         sbrec_datapath_binding_delete(od->sb);
         ovn_datapath_destroy(datapaths, od);
     }
+
+    /* Assign unique sequential indexes to all datapaths.  These are not
+     * visible outside of the northd loop, so, unlike the tunnel keys, it
+     * doesn't matter if they are different on every iteration. */
+    size_t index = 0;
+
+    n_datapaths = hmap_count(datapaths);
+    datapaths_array = xrealloc(datapaths_array,
+                               n_datapaths * sizeof *datapaths_array);
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        od->index = index;
+        datapaths_array[index++] = od;
+    }
 }
 
 /* Structure representing logical router port
@@ -4134,19 +4151,19 @@  build_lb_port_related_data(struct hmap *datapaths, struct hmap *ports,
 
 
 struct ovn_dp_group {
-    struct hmapx map;
+    unsigned long *bitmap;
     struct sbrec_logical_dp_group *dp_group;
     struct hmap_node node;
 };
 
 static struct ovn_dp_group *
 ovn_dp_group_find(const struct hmap *dp_groups,
-                  const struct hmapx *od, uint32_t hash)
+                  const unsigned long *dpg_bitmap, uint32_t hash)
 {
     struct ovn_dp_group *dpg;
 
     HMAP_FOR_EACH_WITH_HASH (dpg, node, hash, dp_groups) {
-        if (hmapx_equals(&dpg->map, od)) {
+        if (bitmap_equal(dpg->bitmap, dpg_bitmap, n_datapaths)) {
             return dpg;
         }
     }
@@ -4155,16 +4172,15 @@  ovn_dp_group_find(const struct hmap *dp_groups,
 
 static struct sbrec_logical_dp_group *
 ovn_sb_insert_logical_dp_group(struct ovsdb_idl_txn *ovnsb_txn,
-                               const struct hmapx *od)
+                               const unsigned long *dpg_bitmap)
 {
     struct sbrec_logical_dp_group *dp_group;
     const struct sbrec_datapath_binding **sb;
-    const struct hmapx_node *node;
-    int n = 0;
+    size_t n = 0, index;
 
-    sb = xmalloc(hmapx_count(od) * sizeof *sb);
-    HMAPX_FOR_EACH (node, od) {
-        sb[n++] = ((struct ovn_datapath *) node->data)->sb;
+    sb = xmalloc(bitmap_count1(dpg_bitmap, n_datapaths) * sizeof *sb);
+    BITMAP_FOR_EACH_1 (index, n_datapaths, dpg_bitmap) {
+        sb[n++] = datapaths_array[index]->sb;
     }
     dp_group = sbrec_logical_dp_group_insert(ovnsb_txn);
     sbrec_logical_dp_group_set_datapaths(
@@ -4223,9 +4239,9 @@  sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
         }
 
         struct ovn_dp_group *dpg = xzalloc(sizeof *dpg);
-        size_t i;
+        size_t i, n = 0;
 
-        hmapx_init(&dpg->map);
+        dpg->bitmap = bitmap_allocate(n_datapaths);
         for (i = 0; i < dp_group->n_datapaths; i++) {
             struct ovn_datapath *datapath_od;
 
@@ -4234,18 +4250,19 @@  sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
             if (!datapath_od || ovn_datapath_is_stale(datapath_od)) {
                 break;
             }
-            hmapx_add(&dpg->map, datapath_od);
+            bitmap_set1(dpg->bitmap, datapath_od->index);
+            n++;
         }
         if (i == dp_group->n_datapaths) {
-            uint32_t hash = hash_int(hmapx_count(&dpg->map), 0);
+            uint32_t hash = hash_int(n, 0);
 
-            if (!ovn_dp_group_find(&dp_groups, &dpg->map, hash)) {
+            if (!ovn_dp_group_find(&dp_groups, dpg->bitmap, hash)) {
                 dpg->dp_group = dp_group;
                 hmap_insert(&dp_groups, &dpg->node, hash);
                 continue;
             }
         }
-        hmapx_destroy(&dpg->map);
+        bitmap_free(dpg->bitmap);
         free(dpg);
     }
     hmapx_destroy(&existing_lbs);
@@ -4278,23 +4295,25 @@  sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
         }
 
         /* Find datapath group for this load balancer. */
-        struct hmapx lb_dps = HMAPX_INITIALIZER(&lb_dps);
+        unsigned long *lb_dps_bitmap;
         struct ovn_dp_group *dpg;
         uint32_t hash;
 
+        lb_dps_bitmap = bitmap_allocate(n_datapaths);
         for (size_t i = 0; i < lb->n_nb_ls; i++) {
-            hmapx_add(&lb_dps, lb->nb_ls[i]);
+            bitmap_set1(lb_dps_bitmap, lb->nb_ls[i]->index);
         }
 
-        hash = hash_int(hmapx_count(&lb_dps), 0);
-        dpg = ovn_dp_group_find(&dp_groups, &lb_dps, hash);
+        hash = hash_int(bitmap_count1(lb_dps_bitmap, n_datapaths), 0);
+        dpg = ovn_dp_group_find(&dp_groups, lb_dps_bitmap, hash);
         if (!dpg) {
             dpg = xzalloc(sizeof *dpg);
-            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &lb_dps);
-            hmapx_clone(&dpg->map, &lb_dps);
+            dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
+                                                           lb_dps_bitmap);
+            dpg->bitmap = bitmap_clone(lb_dps_bitmap, n_datapaths);
             hmap_insert(&dp_groups, &dpg->node, hash);
         }
-        hmapx_destroy(&lb_dps);
+        bitmap_free(lb_dps_bitmap);
 
         /* Update columns. */
         sbrec_load_balancer_set_name(lb->slb, lb->nlb->name);
@@ -4309,7 +4328,7 @@  sync_lbs(struct northd_input *input_data, struct ovsdb_idl_txn *ovnsb_txn,
 
     struct ovn_dp_group *dpg;
     HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
-        hmapx_destroy(&dpg->map);
+        bitmap_free(dpg->bitmap);
         free(dpg);
     }
     hmap_destroy(&dp_groups);
@@ -4863,8 +4882,8 @@  struct ovn_lflow {
     struct hmap_node hmap_node;
 
     struct ovn_datapath *od;     /* 'logical_datapath' in SB schema.  */
-    struct ovs_mutex odg_lock;   /* Lock guarding access to od_group */
-    struct hmapx od_group;       /* Hash map of 'struct ovn_datapath *'. */
+    struct ovs_mutex dpg_lock;   /* Lock guarding access to 'dpg_bitmap' */
+    unsigned long *dpg_bitmap;   /* Bitmap of all datapaths by their 'index'.*/
     enum ovn_stage stage;
     uint16_t priority;
     char *match;
@@ -4919,7 +4938,7 @@  ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
                char *match, char *actions, char *io_port, char *ctrl_meter,
                char *stage_hint, const char *where)
 {
-    hmapx_init(&lflow->od_group);
+    lflow->dpg_bitmap = bitmap_allocate(n_datapaths);
     lflow->od = od;
     lflow->stage = stage;
     lflow->priority = priority;
@@ -4931,7 +4950,7 @@  ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
     lflow->dpg = NULL;
     lflow->where = where;
     if (parallelization_state != STATE_NULL) {
-        ovs_mutex_init(&lflow->odg_lock);
+        ovs_mutex_init(&lflow->dpg_lock);
     }
 }
 
@@ -4945,11 +4964,11 @@  ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
     }
 
     if (parallelization_state == STATE_USE_PARALLELIZATION) {
-        ovs_mutex_lock(&lflow_ref->odg_lock);
-        hmapx_add(&lflow_ref->od_group, od);
-        ovs_mutex_unlock(&lflow_ref->odg_lock);
+        ovs_mutex_lock(&lflow_ref->dpg_lock);
+        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
+        ovs_mutex_unlock(&lflow_ref->dpg_lock);
     } else {
-        hmapx_add(&lflow_ref->od_group, od);
+        bitmap_set1(lflow_ref->dpg_bitmap, od->index);
     }
 
     return true;
@@ -5035,7 +5054,7 @@  do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
                    io_port ? xstrdup(io_port) : NULL,
                    nullable_xstrdup(ctrl_meter),
                    ovn_lflow_hint(stage_hint), where);
-    hmapx_add(&lflow->od_group, od);
+    bitmap_set1(lflow->dpg_bitmap, od->index);
     if (parallelization_state != STATE_USE_PARALLELIZATION) {
         hmap_insert(lflow_map, &lflow->hmap_node, hash);
     } else {
@@ -5166,12 +5185,12 @@  ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow)
 {
     if (lflow) {
         if (parallelization_state != STATE_NULL) {
-            ovs_mutex_destroy(&lflow->odg_lock);
+            ovs_mutex_destroy(&lflow->dpg_lock);
         }
         if (lflows) {
             hmap_remove(lflows, &lflow->hmap_node);
         }
-        hmapx_destroy(&lflow->od_group);
+        bitmap_free(lflow->dpg_bitmap);
         free(lflow->match);
         free(lflow->actions);
         free(lflow->io_port);
@@ -14172,23 +14191,25 @@  ovn_sb_set_lflow_logical_dp_group(
     struct ovsdb_idl_txn *ovnsb_txn,
     struct hmap *dp_groups,
     const struct sbrec_logical_flow *sbflow,
-    const struct hmapx *od_group)
+    const unsigned long *dpg_bitmap)
 {
     struct ovn_dp_group *dpg;
+    size_t n_ods;
 
-    if (!hmapx_count(od_group)) {
+    n_ods = bitmap_count1(dpg_bitmap, n_datapaths);
+
+    if (!n_ods) {
         sbrec_logical_flow_set_logical_dp_group(sbflow, NULL);
         return;
     }
 
-    ovs_assert(hmapx_count(od_group) != 1);
+    ovs_assert(n_ods != 1);
 
-    dpg = ovn_dp_group_find(dp_groups, od_group,
-                            hash_int(hmapx_count(od_group), 0));
+    dpg = ovn_dp_group_find(dp_groups, dpg_bitmap, hash_int(n_ods, 0));
     ovs_assert(dpg != NULL);
 
     if (!dpg->dp_group) {
-        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &dpg->map);
+        dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, dpg->bitmap);
     }
     sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group);
 }
@@ -14277,21 +14298,22 @@  void build_lflows(struct lflow_input *input_data,
     fast_hmap_size_for(&single_dp_lflows, max_seen_lflow_size);
 
     struct ovn_lflow *lflow;
-    struct hmapx_node *node;
     HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflows) {
-        uint32_t hash;
         struct ovn_dp_group *dpg;
+        uint32_t hash, n_ods;
 
-        ovs_assert(hmapx_count(&lflow->od_group));
+        n_ods = bitmap_count1(lflow->dpg_bitmap, n_datapaths);
 
-        if (hmapx_count(&lflow->od_group) == 1) {
+        ovs_assert(n_ods);
+
+        if (n_ods == 1) {
             /* There is only one datapath, so it should be moved out of the
              * group to a single 'od'. */
-            HMAPX_FOR_EACH (node, &lflow->od_group) {
-                lflow->od = node->data;
-                break;
-            }
-            hmapx_clear(&lflow->od_group);
+            size_t index = bitmap_scan(lflow->dpg_bitmap, true, 0,
+                                       n_datapaths);
+
+            bitmap_set0(lflow->dpg_bitmap, index);
+            lflow->od = datapaths_array[index];
 
             /* Logical flow should be re-hashed to allow lookups. */
             hash = hmap_node_hash(&lflow->hmap_node);
@@ -14304,11 +14326,11 @@  void build_lflows(struct lflow_input *input_data,
             continue;
         }
 
-        hash = hash_int(hmapx_count(&lflow->od_group), 0);
-        dpg = ovn_dp_group_find(&dp_groups, &lflow->od_group, hash);
+        hash = hash_int(n_ods, 0);
+        dpg = ovn_dp_group_find(&dp_groups, lflow->dpg_bitmap, hash);
         if (!dpg) {
             dpg = xzalloc(sizeof *dpg);
-            hmapx_clone(&dpg->map, &lflow->od_group);
+            dpg->bitmap = bitmap_clone(lflow->dpg_bitmap, n_datapaths);
             hmap_insert(&dp_groups, &dpg->node, hash);
         }
         lflow->dpg = dpg;
@@ -14409,36 +14431,28 @@  void build_lflows(struct lflow_input *input_data,
             } else {
                 /* There is a datapath group and we need to perform
                  * a full comparison. */
-                struct ovn_datapath **od;
-                int n_datapaths = 0;
+                unsigned long *dpg_bitmap;
+                struct ovn_datapath *od;
 
-                od = xmalloc(dp_group->n_datapaths * sizeof *od);
+                dpg_bitmap = bitmap_allocate(n_datapaths);
                 /* Check all logical datapaths from the group. */
                 for (i = 0; i < dp_group->n_datapaths; i++) {
-                    od[n_datapaths] = ovn_datapath_from_sbrec(
+                    od = ovn_datapath_from_sbrec(
                             input_data->datapaths, dp_group->datapaths[i]);
-                    if (!od[n_datapaths]
-                        || ovn_datapath_is_stale(od[n_datapaths])) {
+                    if (!od || ovn_datapath_is_stale(od)) {
                         continue;
                     }
-                    n_datapaths++;
+                    bitmap_set1(dpg_bitmap, od->index);
                 }
 
-                if (n_datapaths != hmapx_count(&lflow->od_group)) {
-                    update_dp_group = true;
-                }
-                /* Have to compare datapath groups in full. */
-                for (i = 0; !update_dp_group && i < n_datapaths; i++) {
-                    if (od[i] && !hmapx_contains(&lflow->od_group, od[i])) {
-                        update_dp_group = true;
-                    }
-                }
-                free(od);
+                update_dp_group = !bitmap_equal(dpg_bitmap, lflow->dpg_bitmap,
+                                                n_datapaths);
+                bitmap_free(dpg_bitmap);
             }
 
             if (update_dp_group) {
                 ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
-                                                  sbflow, &lflow->od_group);
+                                                  sbflow, lflow->dpg_bitmap);
             } else if (lflow->dpg && !lflow->dpg->dp_group) {
                 /* Setting relation between unique datapath group and
                  * Sb DB datapath goup. */
@@ -14462,7 +14476,7 @@  void build_lflows(struct lflow_input *input_data,
             sbrec_logical_flow_set_logical_datapath(sbflow, lflow->od->sb);
         }
         ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
-                                          sbflow, &lflow->od_group);
+                                          sbflow, lflow->dpg_bitmap);
         sbrec_logical_flow_set_pipeline(sbflow, pipeline);
         sbrec_logical_flow_set_table_id(sbflow, table);
         sbrec_logical_flow_set_priority(sbflow, lflow->priority);
@@ -14503,7 +14517,7 @@  void build_lflows(struct lflow_input *input_data,
 
     struct ovn_dp_group *dpg;
     HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
-        hmapx_destroy(&dpg->map);
+        bitmap_free(dpg->bitmap);
         free(dpg);
     }
     hmap_destroy(&dp_groups);
diff --git a/northd/northd.h b/northd/northd.h
index 8d299864f..fb903102b 100644
--- a/northd/northd.h
+++ b/northd/northd.h
@@ -180,6 +180,9 @@  struct ovn_datapath {
     struct hmap_node key_node;  /* Index on 'key'. */
     struct uuid key;            /* (nbs/nbr)->header_.uuid. */
 
+    size_t index;   /* A unique index across all datapaths.
+                     * Datapath indexes are sequential and start from zero. */
+
     const struct nbrec_logical_switch *nbs;  /* May be NULL. */
     const struct nbrec_logical_router *nbr;  /* May be NULL. */
     const struct sbrec_datapath_binding *sb; /* May be NULL. */