diff mbox

[ovs-dev,v3,16/16] ovn-controller: Monitor only necessary southbound rows.

Message ID 20161218081834.18541-17-blp@ovn.org
State Accepted
Headers show

Commit Message

Ben Pfaff Dec. 18, 2016, 8:18 a.m. UTC
Until now, ovn-controller has replicated all of the southbound database
(through the IDL).  This is inefficient, especially in a large OVN setup
where many logical networks are not present on an individual hypervisor.
This commit improves on the situation somewhat, by making ovn-controller
replicate (almost) only the port bindings, logical flows, and multicast
groups that are actually relevant to the particular hypervisor on which
ovn-controller is running.  This is easily possible by replicating the
patch ports from the Port_Binding table and using these relationships to
determine connections between datapaths.

This patch is strongly influenced by earlier work from the CCed developers.
I am grateful for their assistance.

CC: Darrell Ball <dlu998@gmail.com>
CC: Liran Schour <LIRANS@il.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 ovn/controller/binding.c        |  2 ++
 ovn/controller/ovn-controller.c | 52 +++++++++++++++++++++++++++++++++++++++++
 tests/ovn.at                    |  4 ++--
 3 files changed, 56 insertions(+), 2 deletions(-)

Comments

Liran Schour Dec. 19, 2016, 8:51 a.m. UTC | #1
Ben Pfaff <blp@ovn.org> wrote on 18/12/2016 10:18:34 AM:

> Until now, ovn-controller has replicated all of the southbound database
> (through the IDL).  This is inefficient, especially in a large OVN setup
> where many logical networks are not present on an individual hypervisor.
> This commit improves on the situation somewhat, by making ovn-controller
> replicate (almost) only the port bindings, logical flows, and multicast
> groups that are actually relevant to the particular hypervisor on which
> ovn-controller is running.  This is easily possible by replicating the
> patch ports from the Port_Binding table and using these relationships to
> determine connections between datapaths.
> 
> This patch is strongly influenced by earlier work from the CCed 
developers.
> I am grateful for their assistance.
> 
> CC: Darrell Ball <dlu998@gmail.com>
> CC: Liran Schour <LIRANS@il.ibm.com>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---

I did some performance evaluation of this patch and compared it to the 
performance of the original patch series and found that they both show 
very similar performance benefit.

Test scenario:
Using ovs-sim to simulate data center with 50 hosts. 167 logical networks, 
each logical network is shared across 6 hosts (12% overlap).
Each host has 20 ports from 20 different logical networks.

Results:

Current master branch:
----------------------
Host 1 # flows 1504
Host 2 # flows 1626
Host 3 # flows 1443
...
Logical flows = 8016
Elapsed time 208,180ms
total of 1002 (in 208ms per port)
Ports created and bound in 8,146,466,629,254 cycles
0% south 71,391,573,236
1% north 91,125,817,279
98% clients 7,983,949,238,739

This patch:
-----------
Host 1 # flows 1358
Host 2 # flows 1482
Host 3 # flows 1296
...
Logical flows = 8016
Elapsed time 208,163ms
total of 1002 (in 208ms per port)
Ports created and bound in 1,391,851,557,883 cycles
2% south 33,923,722,225
3% north 46,647,673,929
94% clients 1,311,280,161,729

Origin patch:
-------------
Host 1 # flows 1358
Host 2 # flows 1482
Host 3 # flows 1296
...
Logical flows = 8016
Elapsed time 208,173ms
total of 1002 (in 208ms per port)
Ports created and bound in 1,431,070,248,059 cycles
2% south 33,509,227,946
3% north 46,578,991,190
94% clients 1,350,982,028,923

Results summary:
----------------
This patch improves computation overhead of ovn-controller at about 80% 
and the overhead of SB ovsdb-server at about 50% in this specific 
scenario.
(these results are very similar to the origin patch)

Acked-by: Liran Schour <lirans@il.ibm.com>
Mickey Spiegel Dec. 19, 2016, 7:39 p.m. UTC | #2
On Sun, Dec 18, 2016 at 12:18 AM, Ben Pfaff <blp@ovn.org> wrote:

> Until now, ovn-controller has replicated all of the southbound database
> (through the IDL).  This is inefficient, especially in a large OVN setup
> where many logical networks are not present on an individual hypervisor.
> This commit improves on the situation somewhat, by making ovn-controller
> replicate (almost) only the port bindings, logical flows, and multicast
> groups that are actually relevant to the particular hypervisor on which
> ovn-controller is running.  This is easily possible by replicating the
> patch ports from the Port_Binding table and using these relationships to
> determine connections between datapaths.
>

This approach is certainly a lot simpler than datapaths of interest.

Thinking about this direct approach using patch ports versus the
related_datapaths concept from datapaths of interest:
- The number of related_datapaths and the number of patch ports
  and l3gateway ports should be the same. The complexity of
  running the algorithm should be similar. The direct approach has
  some advantage in not running the recursion over l3gateway ports.
- One difference to ovn-controller is in getting a full port_binding for
  each versus just a related_datapath. Probably not that significant.
- ovn-controller has a few SBREC_PORT_BINDING_FOR_EACH
  loops, in lport.c, binding.c, patch.c, and physical.c. These do not
  seem to involve much processing for ports that are not local.
Given that none of this seems significant, and that Liran's
performance results are good, it looks like the direct approach is
the way to go.


>
> This patch is strongly influenced by earlier work from the CCed developers.
> I am grateful for their assistance.
>
> CC: Darrell Ball <dlu998@gmail.com>
> CC: Liran Schour <LIRANS@il.ibm.com>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
>

Acked-by: Mickey Spiegel <mickeys.dev@gmail.com>

Three comments below.


> ---
>  ovn/controller/binding.c        |  2 ++
>  ovn/controller/ovn-controller.c | 52 ++++++++++++++++++++++++++++++
> +++++++++++
>  tests/ovn.at                    |  4 ++--
>  3 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index fbdcf67..9543bb2 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -393,6 +393,7 @@ consider_local_datapath(struct controller_ctx *ctx,
>              add_local_datapath(ldatapaths, lports, binding_rec->datapath,
>                                 true, local_datapaths);
>          }
> +        sset_add(local_lports, binding_rec->logical_port);
>

I don't understand why this is necessary. There is already a clause for
port_bindings of type "l3gateway" in update_sb_monitors below.

I ran ovn automated tests including kernel tests without this line.
They all passed.

     } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
>          if (ctx->ovnsb_idl_txn) {
>              VLOG_INFO("Releasing lport %s from this chassis.",
> @@ -402,6 +403,7 @@ consider_local_datapath(struct controller_ctx *ctx,
>              }
>
>              sbrec_port_binding_set_chassis(binding_rec, NULL);
> +
>              sset_find_and_delete(local_lports,
> binding_rec->logical_port);
>          }
>      } else if (!binding_rec->chassis
> diff --git a/ovn/controller/ovn-controller.c
> b/ovn/controller/ovn-controller.c
> index 7a30c2a..37b7756 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -118,6 +118,55 @@ get_bridge(struct ovsdb_idl *ovs_idl, const char
> *br_name)
>      return NULL;
>  }
>
> +static void
> +update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> +                   const struct sset *local_ifaces,
> +                   struct hmap *local_datapaths)
> +{
> +    /* Monitor Port_Bindings rows for local interfaces and local
> datapaths.
> +     *
> +     * Monitor Logical_Flow, MAC_Binding, and Multicast_Group tables for
> +     * local datapaths.
> +     *
> +     * We always monitor patch ports because they allow us to see the
> linkages
> +     * between related logical datapaths.  That way, when we know that we
> have
> +     * a VIF on a particular logical switch, we immediately know to
> monitor all
> +     * the connected logical routers and logical switches. */
> +    struct ovsdb_idl_condition pb = OVSDB_IDL_CONDITION_INIT;
> +    struct ovsdb_idl_condition lf = OVSDB_IDL_CONDITION_INIT;
> +    struct ovsdb_idl_condition mb = OVSDB_IDL_CONDITION_INIT;
> +    struct ovsdb_idl_condition mg = OVSDB_IDL_CONDITION_INIT;
> +    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "patch");
> +    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "l2gateway");
> +    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "l3gateway");
> +    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "localnet");
>

Why do you need a clause for "localnet"?
IMO it should be good enough to receive "localnet" port_bindings based
on the port_binding datapath clause.

I ran ovn automated tests including kernel tests without this line.
They all passed.

+    if (local_ifaces) {
> +        const char *name;
> +        SSET_FOR_EACH (name, local_ifaces) {
> +            sbrec_port_binding_add_clause_logical_port(&pb, OVSDB_F_EQ,
> name);
> +        }
> +    }
> +    if (local_datapaths) {
> +        const struct local_datapath *ld;
> +        HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
> +            const struct uuid *uuid = &ld->datapath->header_.uuid;
> +            sbrec_port_binding_add_clause_datapath(&pb, OVSDB_F_EQ,
> uuid);
> +            sbrec_logical_flow_add_clause_logical_datapath(&lf,
> OVSDB_F_EQ,
> +                                                           uuid);
> +            sbrec_mac_binding_add_clause_datapath(&mb, OVSDB_F_EQ, uuid);
> +            sbrec_multicast_group_add_clause_datapath(&mg, OVSDB_F_EQ,
> uuid);
> +        }
> +    }
> +    sbrec_port_binding_set_condition(ovnsb_idl, &pb);
> +    sbrec_logical_flow_set_condition(ovnsb_idl, &lf);
> +    sbrec_mac_binding_set_condition(ovnsb_idl, &mb);
> +    sbrec_multicast_group_set_condition(ovnsb_idl, &mg);
> +    ovsdb_idl_condition_destroy(&pb);
> +    ovsdb_idl_condition_destroy(&lf);
> +    ovsdb_idl_condition_destroy(&mb);
> +    ovsdb_idl_condition_destroy(&mg);
> +}
> +
>  static const struct ovsrec_bridge *
>  create_br_int(struct controller_ctx *ctx,
>                const struct ovsrec_open_vswitch *cfg,
> @@ -451,6 +500,7 @@ main(int argc, char *argv[])
>      struct ovsdb_idl_loop ovnsb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER(
>          ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
>      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
> +    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL);
>
>      /* Track the southbound idl. */
>      ovsdb_idl_get_initial_snapshot(ovnsb_idl_loop.idl);
> @@ -545,6 +595,8 @@ main(int argc, char *argv[])
>                      }
>                  }
>              }
> +
> +            update_sb_monitors(ctx.ovnsb_idl, &local_lports,
> &local_datapaths);
>          }
>
>          mcgroup_index_destroy(&mcgroups);
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 628d3c8..3779741 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -5403,9 +5403,9 @@ AT_CHECK([ovs-vsctl add-port br-int localvif1 -- set
> Interface localvif1 externa
>  # On hv1, check that there are no flows outputting bcast to tunnel
>  OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip
> | grep output | wc -l` -eq 0])
>
> -# On hv2, check that there is 1 flow outputting bcast to tunnel to hv1.
> +# On hv2, check that no flow outputs bcast to tunnel to hv1.
>  as hv2
> -OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip
> | grep output | wc -l` -eq 1])
> +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip
> | grep output | wc -l` -eq 0])
>
>  # Now bind vif2 on hv2.
>  AT_CHECK([ovs-vsctl add-port br-int localvif2 -- set Interface localvif2
> external_ids:iface-id=localvif2])
>

If you take my suggestion for patch 10 to filter to only local_datapaths in
consider_mc_group in physical.c, then these changes to ovn.at would
move to patch 10.

Mickey

--
> 2.10.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ben Pfaff Dec. 20, 2016, 1:58 a.m. UTC | #3
On Mon, Dec 19, 2016 at 10:51:36AM +0200, Liran Schour wrote:
> Ben Pfaff <blp@ovn.org> wrote on 18/12/2016 10:18:34 AM:
> 
> > Until now, ovn-controller has replicated all of the southbound database
> > (through the IDL).  This is inefficient, especially in a large OVN setup
> > where many logical networks are not present on an individual hypervisor.
> > This commit improves on the situation somewhat, by making ovn-controller
> > replicate (almost) only the port bindings, logical flows, and multicast
> > groups that are actually relevant to the particular hypervisor on which
> > ovn-controller is running.  This is easily possible by replicating the
> > patch ports from the Port_Binding table and using these relationships to
> > determine connections between datapaths.
> > 
> > This patch is strongly influenced by earlier work from the CCed 
> developers.
> > I am grateful for their assistance.
> > 
> > CC: Darrell Ball <dlu998@gmail.com>
> > CC: Liran Schour <LIRANS@il.ibm.com>
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> 
> I did some performance evaluation of this patch and compared it to the 
> performance of the original patch series and found that they both show 
> very similar performance benefit.

Thank you very much for the review and especially for the performance
testing.
diff mbox

Patch

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index fbdcf67..9543bb2 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -393,6 +393,7 @@  consider_local_datapath(struct controller_ctx *ctx,
             add_local_datapath(ldatapaths, lports, binding_rec->datapath,
                                true, local_datapaths);
         }
+        sset_add(local_lports, binding_rec->logical_port);
     } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
         if (ctx->ovnsb_idl_txn) {
             VLOG_INFO("Releasing lport %s from this chassis.",
@@ -402,6 +403,7 @@  consider_local_datapath(struct controller_ctx *ctx,
             }
 
             sbrec_port_binding_set_chassis(binding_rec, NULL);
+
             sset_find_and_delete(local_lports, binding_rec->logical_port);
         }
     } else if (!binding_rec->chassis
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 7a30c2a..37b7756 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -118,6 +118,55 @@  get_bridge(struct ovsdb_idl *ovs_idl, const char *br_name)
     return NULL;
 }
 
+static void
+update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
+                   const struct sset *local_ifaces,
+                   struct hmap *local_datapaths)
+{
+    /* Monitor Port_Bindings rows for local interfaces and local datapaths.
+     *
+     * Monitor Logical_Flow, MAC_Binding, and Multicast_Group tables for
+     * local datapaths.
+     *
+     * We always monitor patch ports because they allow us to see the linkages
+     * between related logical datapaths.  That way, when we know that we have
+     * a VIF on a particular logical switch, we immediately know to monitor all
+     * the connected logical routers and logical switches. */
+    struct ovsdb_idl_condition pb = OVSDB_IDL_CONDITION_INIT;
+    struct ovsdb_idl_condition lf = OVSDB_IDL_CONDITION_INIT;
+    struct ovsdb_idl_condition mb = OVSDB_IDL_CONDITION_INIT;
+    struct ovsdb_idl_condition mg = OVSDB_IDL_CONDITION_INIT;
+    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "patch");
+    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "l2gateway");
+    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "l3gateway");
+    sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "localnet");
+    if (local_ifaces) {
+        const char *name;
+        SSET_FOR_EACH (name, local_ifaces) {
+            sbrec_port_binding_add_clause_logical_port(&pb, OVSDB_F_EQ, name);
+        }
+    }
+    if (local_datapaths) {
+        const struct local_datapath *ld;
+        HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
+            const struct uuid *uuid = &ld->datapath->header_.uuid;
+            sbrec_port_binding_add_clause_datapath(&pb, OVSDB_F_EQ, uuid);
+            sbrec_logical_flow_add_clause_logical_datapath(&lf, OVSDB_F_EQ,
+                                                           uuid);
+            sbrec_mac_binding_add_clause_datapath(&mb, OVSDB_F_EQ, uuid);
+            sbrec_multicast_group_add_clause_datapath(&mg, OVSDB_F_EQ, uuid);
+        }
+    }
+    sbrec_port_binding_set_condition(ovnsb_idl, &pb);
+    sbrec_logical_flow_set_condition(ovnsb_idl, &lf);
+    sbrec_mac_binding_set_condition(ovnsb_idl, &mb);
+    sbrec_multicast_group_set_condition(ovnsb_idl, &mg);
+    ovsdb_idl_condition_destroy(&pb);
+    ovsdb_idl_condition_destroy(&lf);
+    ovsdb_idl_condition_destroy(&mb);
+    ovsdb_idl_condition_destroy(&mg);
+}
+
 static const struct ovsrec_bridge *
 create_br_int(struct controller_ctx *ctx,
               const struct ovsrec_open_vswitch *cfg,
@@ -451,6 +500,7 @@  main(int argc, char *argv[])
     struct ovsdb_idl_loop ovnsb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER(
         ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
     ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
+    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL);
 
     /* Track the southbound idl. */
     ovsdb_idl_get_initial_snapshot(ovnsb_idl_loop.idl);
@@ -545,6 +595,8 @@  main(int argc, char *argv[])
                     }
                 }
             }
+
+            update_sb_monitors(ctx.ovnsb_idl, &local_lports, &local_datapaths);
         }
 
         mcgroup_index_destroy(&mcgroups);
diff --git a/tests/ovn.at b/tests/ovn.at
index 628d3c8..3779741 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -5403,9 +5403,9 @@  AT_CHECK([ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 externa
 # On hv1, check that there are no flows outputting bcast to tunnel
 OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip | grep output | wc -l` -eq 0])
 
-# On hv2, check that there is 1 flow outputting bcast to tunnel to hv1.
+# On hv2, check that no flow outputs bcast to tunnel to hv1.
 as hv2
-OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip | grep output | wc -l` -eq 1])
+OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip | grep output | wc -l` -eq 0])
 
 # Now bind vif2 on hv2.
 AT_CHECK([ovs-vsctl add-port br-int localvif2 -- set Interface localvif2 external_ids:iface-id=localvif2])