mbox series

[ovs-dev,v6,00/16] northd: I-P for load balancer and lb groups

Message ID 20230818085606.1030792-1-numans@ovn.org
Headers show
Series northd: I-P for load balancer and lb groups | expand

Message

Numan Siddique Aug. 18, 2023, 8:56 a.m. UTC
From: Numan Siddique <numans@ovn.org>

This patch series adds the support to handle load balancer and
load balancer group changes incrementally in the "northd" engine
node and "lflow" engine node.  Changes to logical switches and router's load
balancer and load balancer group columns are also handled incrementally
provided other columns do not change.

V4 of this series did not include LB I-P handling in the lflow engine
node.  V5 adds 6 more patches to handle the LB changes in the lflow
engine node.  V6 added 2 additional patches to handle router NAT I-P
handling.

This patch series can be divided into 3 parts

Part 1.  Patches 1 to 8  (LB I-P only in northd)
Part 2.  Patches 9 to 14 (LB I-P in lflow engine too)
Part 3.  Patches 15 and 16  (LR NAT I-P handling in both northd and
lflow)

Submitting all these patches as one series.  These patches can also be
found here - https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6

If there are any conflicts,  request the reviewers to use this branch
for reviewing.

Below are the scale testing results done with these patches applied
using ovn-heater.  The test ran the scenario  -
ocp-500-density-heavy.yml [1].

With these patches applied (with load balancer I-P handling in both
northd and lflow engine nodes) the resuts (Result 1) are:

-------------------------------------------------------------------------------------------------------------------------------------------------------
                        Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
-------------------------------------------------------------------------------------------------------------------------------------------------------
Iteration Total         0.135651        1.130527        1.179357        1.201410        2.180203        0.674606        84.325717       125     0
Namespace.add_ports     0.005218        0.005678        0.006457        0.018936        0.020812        0.006182        0.772796        125     0
WorkerNode.bind_port    0.033631        0.043287        0.051171        0.058223        0.062819        0.043839        10.959757       250     0
WorkerNode.ping_port    0.005460        0.006791        1.041434        1.064807        1.069957        0.274352        68.587878       250     0
-------------------------------------------------------------------------------------------------------------------------------------------------------


With only the first 8 patches applied (with load balancer I-P handling
only in northd engine node) the results (Result 2) are:


-------------------------------------------------------------------------------------------------------------------------------------------------------
                        Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
-------------------------------------------------------------------------------------------------------------------------------------------------------
Iteration Total         0.132929        2.157103        3.314847        3.331561        4.378626        1.581889        197.736147      125     0
Namespace.add_ports     0.005217        0.005760        0.006565        0.013348        0.021014        0.006106        0.763214        125     0
WorkerNode.bind_port    0.035205        0.045458        0.052278        0.059804        0.063941        0.045652        11.413122       250     0
WorkerNode.ping_port    0.005075        0.006814        3.088548        3.192577        4.242026        0.726453        181.613284      250     0
-------------------------------------------------------------------------------------------------------------------------------------------------------


The results with the present main (Result 3) are:

-------------------------------------------------------------------------------------------------------------------------------------------------------
                        Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
-------------------------------------------------------------------------------------------------------------------------------------------------------
Iteration Total         4.377260        6.486962        7.502040        8.322587        8.334701        6.559002        819.875306      125     0
Namespace.add_ports     0.005112        0.005484        0.005953        0.009153        0.011452        0.005662        0.707752        125     0
WorkerNode.bind_port    0.035360        0.042732        0.049152        0.053698        0.056635        0.043215        10.803700       250     0
WorkerNode.ping_port    0.005338        1.599904        7.229649        7.798039        8.206537        3.209860        802.464911      250     0
-------------------------------------------------------------------------------------------------------------------------------------------------------

Few observations:

 - The total time taken has come down significantly from 819 seconds to 197
   to complete the density heavy tests (excluding the base cluster
   bringup) with only the northd engine (Result 2) handling
   and it came further down to around 84 seconds with all the patches
   applied (Result 1)
 - 99%ile with these patches is 3.3 seconds in Result 2 and 1.06 seconds in Result 1
   compared to 8.3 seconds for the main (Result 3).
 - 90%file with these patches is 3.3 seconds in Result 2 and 1.04
   seconds in Result 1 compared to 7.5 seconds for the main (Result 3).
 - CPU utilization of northd during the test with these patches
   is between 100% to 300% which is almost the same as main.
   Main difference being that, with these patches the test duration is
   less and hence overall less CPU utilization.

[1] - https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml


v5 -> v6
-------
  * Rebased.  Added 2 more patches (p15 and p16) for LR NAT I-P handling.

v4 -> v5
-------
  * 6 new patches are added to the series which handles the LB changes
    in the lflow engine node.
v3 -> v4
-------
  * Covered more test scearios.
  * Found few issues and fixed them.  v3 was not handling the scenario of
    a vip getting added or removed from a load balancer.

v2 -> v3
--------
  * v2 was very inefficient in handling the load balancer group changes
    and in associating the load balancers of the lb group to the
    datapaths. This was the main reason for the regression in the full
    recompute time taken.
    v3 addressed these by more efficiently handling the lb group changes
    incrementally.


Numan Siddique (16):
  northd I-P: Sync SB load balancers in a separate engine node.
  northd: Add a new engine node - lb_data.
  northd: Add initial I-P for load balancer and load balancer groups
  northd: Refactor the 'northd' node code which handles logical switch
    changes.
  northd: Handle load balancer changes for a logical switch.
  northd: Handle load balancer group changes for a logical switch.
  northd: Sync SB Port bindings NAT column in a separate engine node.
  northd: Handle load balancer/group changes for a logical router.
  northd: Use objdep mgr for lport to lflow references.
  northd: Fix LSP incremental processing if dhcp options are set.
  northd: Use objdep mgr for datapath/lb to lflow references.
  Reference lb related lflows for lports in a separate objdep mgr type.
  northd: Refactor the northd change tracking.
  northd: Handle load balancer change in lflow engine.
  northd: Move router ports SB PB options sync to sync_to_sb_pb node.
  northd: Handle NAT changes for a logical router incrementally.

 lib/lb.c                 |  320 +-
 lib/lb.h                 |  105 +-
 lib/objdep.h             |    6 +
 lib/ovn-util.c           |   11 +-
 northd/automake.mk       |    2 +
 northd/en-lb-data.c      |  800 +++++
 northd/en-lb-data.h      |  109 +
 northd/en-lflow.c        |   33 +-
 northd/en-northd.c       |  144 +-
 northd/en-northd.h       |    5 +
 northd/en-sync-sb.c      |   76 +
 northd/en-sync-sb.h      |   10 +
 northd/inc-proc-northd.c |   39 +-
 northd/northd.c          | 6969 +++++++++++++++++++++++++-------------
 northd/northd.h          |  149 +-
 northd/ovn-northd.c      |    4 +
 tests/ovn-northd.at      |  869 +++++
 17 files changed, 7145 insertions(+), 2506 deletions(-)
 create mode 100644 northd/en-lb-data.c
 create mode 100644 northd/en-lb-data.h

Comments

Mark Michelson Aug. 31, 2023, 6:46 p.m. UTC | #1
I had a look at patches 1-8. I have no additional notes beyond what Han 
and Ales have already mentioned.

For patches 1-8:

Acked-by: Mark Michelson <mmichels@redhat.com>

On 8/18/23 04:56, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> This patch series adds the support to handle load balancer and
> load balancer group changes incrementally in the "northd" engine
> node and "lflow" engine node.  Changes to logical switches and router's load
> balancer and load balancer group columns are also handled incrementally
> provided other columns do not change.
> 
> V4 of this series did not include LB I-P handling in the lflow engine
> node.  V5 adds 6 more patches to handle the LB changes in the lflow
> engine node.  V6 added 2 additional patches to handle router NAT I-P
> handling.
> 
> This patch series can be divided into 3 parts
> 
> Part 1.  Patches 1 to 8  (LB I-P only in northd)
> Part 2.  Patches 9 to 14 (LB I-P in lflow engine too)
> Part 3.  Patches 15 and 16  (LR NAT I-P handling in both northd and
> lflow)
> 
> Submitting all these patches as one series.  These patches can also be
> found here - https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6
> 
> If there are any conflicts,  request the reviewers to use this branch
> for reviewing.
> 
> Below are the scale testing results done with these patches applied
> using ovn-heater.  The test ran the scenario  -
> ocp-500-density-heavy.yml [1].
> 
> With these patches applied (with load balancer I-P handling in both
> northd and lflow engine nodes) the resuts (Result 1) are:
> 
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>                          Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Iteration Total         0.135651        1.130527        1.179357        1.201410        2.180203        0.674606        84.325717       125     0
> Namespace.add_ports     0.005218        0.005678        0.006457        0.018936        0.020812        0.006182        0.772796        125     0
> WorkerNode.bind_port    0.033631        0.043287        0.051171        0.058223        0.062819        0.043839        10.959757       250     0
> WorkerNode.ping_port    0.005460        0.006791        1.041434        1.064807        1.069957        0.274352        68.587878       250     0
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> With only the first 8 patches applied (with load balancer I-P handling
> only in northd engine node) the results (Result 2) are:
> 
> 
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>                          Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Iteration Total         0.132929        2.157103        3.314847        3.331561        4.378626        1.581889        197.736147      125     0
> Namespace.add_ports     0.005217        0.005760        0.006565        0.013348        0.021014        0.006106        0.763214        125     0
> WorkerNode.bind_port    0.035205        0.045458        0.052278        0.059804        0.063941        0.045652        11.413122       250     0
> WorkerNode.ping_port    0.005075        0.006814        3.088548        3.192577        4.242026        0.726453        181.613284      250     0
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> The results with the present main (Result 3) are:
> 
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>                          Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Iteration Total         4.377260        6.486962        7.502040        8.322587        8.334701        6.559002        819.875306      125     0
> Namespace.add_ports     0.005112        0.005484        0.005953        0.009153        0.011452        0.005662        0.707752        125     0
> WorkerNode.bind_port    0.035360        0.042732        0.049152        0.053698        0.056635        0.043215        10.803700       250     0
> WorkerNode.ping_port    0.005338        1.599904        7.229649        7.798039        8.206537        3.209860        802.464911      250     0
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> Few observations:
> 
>   - The total time taken has come down significantly from 819 seconds to 197
>     to complete the density heavy tests (excluding the base cluster
>     bringup) with only the northd engine (Result 2) handling
>     and it came further down to around 84 seconds with all the patches
>     applied (Result 1)
>   - 99%ile with these patches is 3.3 seconds in Result 2 and 1.06 seconds in Result 1
>     compared to 8.3 seconds for the main (Result 3).
>   - 90%file with these patches is 3.3 seconds in Result 2 and 1.04
>     seconds in Result 1 compared to 7.5 seconds for the main (Result 3).
>   - CPU utilization of northd during the test with these patches
>     is between 100% to 300% which is almost the same as main.
>     Main difference being that, with these patches the test duration is
>     less and hence overall less CPU utilization.
> 
> [1] - https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> 
> 
> v5 -> v6
> -------
>    * Rebased.  Added 2 more patches (p15 and p16) for LR NAT I-P handling.
> 
> v4 -> v5
> -------
>    * 6 new patches are added to the series which handles the LB changes
>      in the lflow engine node.
> v3 -> v4
> -------
>    * Covered more test scearios.
>    * Found few issues and fixed them.  v3 was not handling the scenario of
>      a vip getting added or removed from a load balancer.
> 
> v2 -> v3
> --------
>    * v2 was very inefficient in handling the load balancer group changes
>      and in associating the load balancers of the lb group to the
>      datapaths. This was the main reason for the regression in the full
>      recompute time taken.
>      v3 addressed these by more efficiently handling the lb group changes
>      incrementally.
> 
> 
> Numan Siddique (16):
>    northd I-P: Sync SB load balancers in a separate engine node.
>    northd: Add a new engine node - lb_data.
>    northd: Add initial I-P for load balancer and load balancer groups
>    northd: Refactor the 'northd' node code which handles logical switch
>      changes.
>    northd: Handle load balancer changes for a logical switch.
>    northd: Handle load balancer group changes for a logical switch.
>    northd: Sync SB Port bindings NAT column in a separate engine node.
>    northd: Handle load balancer/group changes for a logical router.
>    northd: Use objdep mgr for lport to lflow references.
>    northd: Fix LSP incremental processing if dhcp options are set.
>    northd: Use objdep mgr for datapath/lb to lflow references.
>    Reference lb related lflows for lports in a separate objdep mgr type.
>    northd: Refactor the northd change tracking.
>    northd: Handle load balancer change in lflow engine.
>    northd: Move router ports SB PB options sync to sync_to_sb_pb node.
>    northd: Handle NAT changes for a logical router incrementally.
> 
>   lib/lb.c                 |  320 +-
>   lib/lb.h                 |  105 +-
>   lib/objdep.h             |    6 +
>   lib/ovn-util.c           |   11 +-
>   northd/automake.mk       |    2 +
>   northd/en-lb-data.c      |  800 +++++
>   northd/en-lb-data.h      |  109 +
>   northd/en-lflow.c        |   33 +-
>   northd/en-northd.c       |  144 +-
>   northd/en-northd.h       |    5 +
>   northd/en-sync-sb.c      |   76 +
>   northd/en-sync-sb.h      |   10 +
>   northd/inc-proc-northd.c |   39 +-
>   northd/northd.c          | 6969 +++++++++++++++++++++++++-------------
>   northd/northd.h          |  149 +-
>   northd/ovn-northd.c      |    4 +
>   tests/ovn-northd.at      |  869 +++++
>   17 files changed, 7145 insertions(+), 2506 deletions(-)
>   create mode 100644 northd/en-lb-data.c
>   create mode 100644 northd/en-lb-data.h
>
Numan Siddique Sept. 11, 2023, 2:32 p.m. UTC | #2
On Thu, Aug 31, 2023 at 2:47 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> I had a look at patches 1-8. I have no additional notes beyond what Han
> and Ales have already mentioned.
>
> For patches 1-8:
>
> Acked-by: Mark Michelson <mmichels@redhat.com>

Thanks Mark, Han and Ales for the reviews.

I applied the first 4 patches of this series to main and backported to
branch-23.09.   I'll submit v7 soon with the patches 5-8 addressing
all the review comments.\

Thanks
Numan



>
> On 8/18/23 04:56, numans@ovn.org wrote:
> > From: Numan Siddique <numans@ovn.org>
> >
> > This patch series adds the support to handle load balancer and
> > load balancer group changes incrementally in the "northd" engine
> > node and "lflow" engine node.  Changes to logical switches and router's load
> > balancer and load balancer group columns are also handled incrementally
> > provided other columns do not change.
> >
> > V4 of this series did not include LB I-P handling in the lflow engine
> > node.  V5 adds 6 more patches to handle the LB changes in the lflow
> > engine node.  V6 added 2 additional patches to handle router NAT I-P
> > handling.
> >
> > This patch series can be divided into 3 parts
> >
> > Part 1.  Patches 1 to 8  (LB I-P only in northd)
> > Part 2.  Patches 9 to 14 (LB I-P in lflow engine too)
> > Part 3.  Patches 15 and 16  (LR NAT I-P handling in both northd and
> > lflow)
> >
> > Submitting all these patches as one series.  These patches can also be
> > found here - https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6
> >
> > If there are any conflicts,  request the reviewers to use this branch
> > for reviewing.
> >
> > Below are the scale testing results done with these patches applied
> > using ovn-heater.  The test ran the scenario  -
> > ocp-500-density-heavy.yml [1].
> >
> > With these patches applied (with load balancer I-P handling in both
> > northd and lflow engine nodes) the resuts (Result 1) are:
> >
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> >                          Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Iteration Total         0.135651        1.130527        1.179357        1.201410        2.180203        0.674606        84.325717       125     0
> > Namespace.add_ports     0.005218        0.005678        0.006457        0.018936        0.020812        0.006182        0.772796        125     0
> > WorkerNode.bind_port    0.033631        0.043287        0.051171        0.058223        0.062819        0.043839        10.959757       250     0
> > WorkerNode.ping_port    0.005460        0.006791        1.041434        1.064807        1.069957        0.274352        68.587878       250     0
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> >
> > With only the first 8 patches applied (with load balancer I-P handling
> > only in northd engine node) the results (Result 2) are:
> >
> >
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> >                          Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Iteration Total         0.132929        2.157103        3.314847        3.331561        4.378626        1.581889        197.736147      125     0
> > Namespace.add_ports     0.005217        0.005760        0.006565        0.013348        0.021014        0.006106        0.763214        125     0
> > WorkerNode.bind_port    0.035205        0.045458        0.052278        0.059804        0.063941        0.045652        11.413122       250     0
> > WorkerNode.ping_port    0.005075        0.006814        3.088548        3.192577        4.242026        0.726453        181.613284      250     0
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> >
> > The results with the present main (Result 3) are:
> >
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> >                          Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Iteration Total         4.377260        6.486962        7.502040        8.322587        8.334701        6.559002        819.875306      125     0
> > Namespace.add_ports     0.005112        0.005484        0.005953        0.009153        0.011452        0.005662        0.707752        125     0
> > WorkerNode.bind_port    0.035360        0.042732        0.049152        0.053698        0.056635        0.043215        10.803700       250     0
> > WorkerNode.ping_port    0.005338        1.599904        7.229649        7.798039        8.206537        3.209860        802.464911      250     0
> > -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > Few observations:
> >
> >   - The total time taken has come down significantly from 819 seconds to 197
> >     to complete the density heavy tests (excluding the base cluster
> >     bringup) with only the northd engine (Result 2) handling
> >     and it came further down to around 84 seconds with all the patches
> >     applied (Result 1)
> >   - 99%ile with these patches is 3.3 seconds in Result 2 and 1.06 seconds in Result 1
> >     compared to 8.3 seconds for the main (Result 3).
> >   - 90%file with these patches is 3.3 seconds in Result 2 and 1.04
> >     seconds in Result 1 compared to 7.5 seconds for the main (Result 3).
> >   - CPU utilization of northd during the test with these patches
> >     is between 100% to 300% which is almost the same as main.
> >     Main difference being that, with these patches the test duration is
> >     less and hence overall less CPU utilization.
> >
> > [1] - https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> >
> >
> > v5 -> v6
> > -------
> >    * Rebased.  Added 2 more patches (p15 and p16) for LR NAT I-P handling.
> >
> > v4 -> v5
> > -------
> >    * 6 new patches are added to the series which handles the LB changes
> >      in the lflow engine node.
> > v3 -> v4
> > -------
> >    * Covered more test scearios.
> >    * Found few issues and fixed them.  v3 was not handling the scenario of
> >      a vip getting added or removed from a load balancer.
> >
> > v2 -> v3
> > --------
> >    * v2 was very inefficient in handling the load balancer group changes
> >      and in associating the load balancers of the lb group to the
> >      datapaths. This was the main reason for the regression in the full
> >      recompute time taken.
> >      v3 addressed these by more efficiently handling the lb group changes
> >      incrementally.
> >
> >
> > Numan Siddique (16):
> >    northd I-P: Sync SB load balancers in a separate engine node.
> >    northd: Add a new engine node - lb_data.
> >    northd: Add initial I-P for load balancer and load balancer groups
> >    northd: Refactor the 'northd' node code which handles logical switch
> >      changes.
> >    northd: Handle load balancer changes for a logical switch.
> >    northd: Handle load balancer group changes for a logical switch.
> >    northd: Sync SB Port bindings NAT column in a separate engine node.
> >    northd: Handle load balancer/group changes for a logical router.
> >    northd: Use objdep mgr for lport to lflow references.
> >    northd: Fix LSP incremental processing if dhcp options are set.
> >    northd: Use objdep mgr for datapath/lb to lflow references.
> >    Reference lb related lflows for lports in a separate objdep mgr type.
> >    northd: Refactor the northd change tracking.
> >    northd: Handle load balancer change in lflow engine.
> >    northd: Move router ports SB PB options sync to sync_to_sb_pb node.
> >    northd: Handle NAT changes for a logical router incrementally.
> >
> >   lib/lb.c                 |  320 +-
> >   lib/lb.h                 |  105 +-
> >   lib/objdep.h             |    6 +
> >   lib/ovn-util.c           |   11 +-
> >   northd/automake.mk       |    2 +
> >   northd/en-lb-data.c      |  800 +++++
> >   northd/en-lb-data.h      |  109 +
> >   northd/en-lflow.c        |   33 +-
> >   northd/en-northd.c       |  144 +-
> >   northd/en-northd.h       |    5 +
> >   northd/en-sync-sb.c      |   76 +
> >   northd/en-sync-sb.h      |   10 +
> >   northd/inc-proc-northd.c |   39 +-
> >   northd/northd.c          | 6969 +++++++++++++++++++++++++-------------
> >   northd/northd.h          |  149 +-
> >   northd/ovn-northd.c      |    4 +
> >   tests/ovn-northd.at      |  869 +++++
> >   17 files changed, 7145 insertions(+), 2506 deletions(-)
> >   create mode 100644 northd/en-lb-data.c
> >   create mode 100644 northd/en-lb-data.h
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>