mbox series

[ovs-dev,v6,00/13] northd lflow incremental processing

Message ID 20240130212028.1482153-1-numans@ovn.org
Headers show
Series northd lflow incremental processing | expand

Message

Numan Siddique Jan. 30, 2024, 9:20 p.m. UTC
From: Numan Siddique <numans@ovn.org>

This patch series adds incremental processing in the lflow engine
node to handle changes to northd and other engine nodes.
Changed related to load balancers and NAT are mainly handled in
this patch series.

This patch series can also be found here - https://github.com/numansiddique/ovn/tree/northd_lbnatacl_lflow/v5

Prior to this patch series, most of the changes to northd engine
resulted in full recomputation of logical flows.  This series
aims to improve the performance of ovn-northd by adding the I-P
support.  In order to add this support, some of the northd engine
node data (from struct ovn_datapath) is split and moved over to
new engine nodes - mainly related to load balancers, NAT and ACLs.

Below are the scale testing results done with these patches applied
using ovn-heater.  The test ran the scenario  -
ocp-500-density-heavy.yml [1].

With all the lflow I-P patches applied, the resuts are:

-------------------------------------------------------------------------------------------------------------------------------------------------------
                        Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
-------------------------------------------------------------------------------------------------------------------------------------------------------
Iteration Total         0.136883        1.129016        1.192001        1.204167        1.212728        0.665017        83.127099       125     0
Namespace.add_ports     0.005216        0.005736        0.007034        0.015486        0.018978        0.006211        0.776373        125     0
WorkerNode.bind_port    0.035030        0.046082        0.052469        0.058293        0.060311        0.045973        11.493259       250     0
WorkerNode.ping_port    0.005057        0.006727        1.047692        1.069253        1.071336        0.266896        66.724094       250     0
-------------------------------------------------------------------------------------------------------------------------------------------------------

The results with the present main are:

-------------------------------------------------------------------------------------------------------------------------------------------------------
                        Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
-------------------------------------------------------------------------------------------------------------------------------------------------------
Iteration Total         0.135491        2.223805        3.311270        3.339078        3.345346        1.729172        216.146495      125     0
Namespace.add_ports     0.005380        0.005744        0.006819        0.018773        0.020800        0.006292        0.786532        125     0
WorkerNode.bind_port    0.034179        0.046055        0.053488        0.058801        0.071043        0.046117        11.529311       250     0
WorkerNode.ping_port    0.004956        0.006952        3.086952        3.191743        3.192807        0.791544        197.886026      250     0
-------------------------------------------------------------------------------------------------------------------------------------------------------

Please see the link [2] which has a high level description of the
changes done in this patch series.


[1] - https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
[2] - https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410053.html

v5 -> v6
------
   * Applied the first 3 patches of v5 after addressing all the review
     comments (and with the Acks)
 
   * Rebased to latest main and resolved the conflicts.

   * Addressed almost all of the review comments received for v5 from
     Han and Dumitru.
        - Added detailed documentation on 'struct lflow_ref' and life
          cycle of 'struct lflow_ref_node'.
        - Added documentation on the thread safety limitations when
          using 'struct lflow_ref'.

v4 -> v5
-------
   * Rebased to latest main and resolved the conflicts.

   * Addressed the review comments from Han in patch 15 (and in p8).  Removed the
     assert if SB dp group is missing and handled it by returning false
     so that lflow engine recomputes.  Added test cases to cover this
     scenario for both lflows (p8) and SB load balancers (p15) .

v3 -> v4
-------
   * Addressed most of the review comments from Dumitru and Han.

   * Found a couple of bugs in v3 patch 9 -
     "northd: Refactor lflow management into a separate module."
     and addressed them in v4.
     To brief  the issue, if a logical flow L(M, A) is referenced
     by 2 lflow_ref's which belong to the same datapath, then the lflow
     was deleted even if one lflow_ref was cleared due to any changes.
     It is addressed now by maintaining a reference count in the 'struct
     ovn_lflow' for each datapath it is used by.

   * Moved the v3 patch 14 ("northd:  Add I-P for NB_Global and SB_Global.")
     to patch 16 in v4.  There were comments in this patch to not add a
     full I-P for NB_Global and SB_Global.  Made this patch as the last
     in the series so that we can discuss further and not block other patches
     in case we want to drop this one.


v2 -> v3
-------
   * Addressed some of the review comments from Han and Dumitru.  There
     are still a few pending review comments which needs to be addressed
     or discussed.

   * Renamed the engine node from "lr_lbnat_data" to "lr_stateful"
     (v3 patch 5).

   * Renamed the engine node from "ls_lbacls" to "ls_stateful" (v3 patch 8).

   * Removed v2 patch 2 from the series (northd: Track ovn_datapaths in
     northd engine track data.").  This patch is now part of v3 patch 7
     (northd: Add a new node 'ls_stateful').

   * Squashed v2 patch 8 (northd: Don't commit dhcp response flows in
     the conntrack.) into v3 patch 7 (northd: Add a new node
     'ls_stateful'.)


v1 -> v2
--------
   * Now also maintaing array indexes for ls_lbacls, lr_nat and
     lr_lb_nat_data tables (similar to ovn_datapaths->array) to
     make the lookup effecient.  The same ovn_datapath->index
     is reused.

   * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c.
     In v2 we don't use objdep_mgr to maintain the resource to lflow
     references.  Instead we maintain the 'struct lflow' pointer.
     With this we don't need to maintain additional hmap of lflows.


Numan Siddique (13):
  northd: Add a new engine 'lr_nat' to manage lr NAT data.
  northd: Add a new engine 'lr_stateful' to manage lr's stateful data.
  northd:  Generate router's stateful flows using lr_stateful data.
  northd: Add a new node 'ls_stateful'.
  northd: Refactor lflow management into a separate module.
  northd: Use lflow_ref when adding all logical flows.
  northd:  Move ovn_lb_datapaths from lib to northd module.
  northd: Handle lb changes in lflow engine.
  northd: Add lr_stateful handler for lflow engine node.
  northd: Add ls_stateful handler for lflow engine node.
  northd: Add a noop handler for northd SB mac binding.
  northd: Add northd change handler for sync_to_sb_lb node.
  northd:  Add I-P for NB_Global and SB_Global.

 controller/automake.mk    |    2 +
 controller/lb.c           |  146 +
 controller/lb.h           |   55 +
 controller/lflow.c        |    1 +
 lib/lb.c                  |  771 +----
 lib/lb.h                  |  199 +-
 lib/ovn-util.c            |   26 +-
 lib/ovn-util.h            |    5 +-
 lib/stopwatch-names.h     |    5 +
 northd/aging.c            |   21 +-
 northd/automake.mk        |   14 +-
 northd/en-global-config.c |  576 ++++
 northd/en-global-config.h |   65 +
 northd/en-lb-data.c       |    1 +
 northd/en-lflow.c         |  109 +-
 northd/en-lflow.h         |    8 +
 northd/en-lr-nat.c        |  397 +++
 northd/en-lr-nat.h        |  135 +
 northd/en-lr-stateful.c   |  702 +++++
 northd/en-lr-stateful.h   |  153 +
 northd/en-ls-stateful.c   |  440 +++
 northd/en-ls-stateful.h   |  113 +
 northd/en-northd.c        |   58 +-
 northd/en-northd.h        |    2 +-
 northd/en-port-group.h    |    3 +
 northd/en-sync-sb.c       |  565 +++-
 northd/inc-proc-northd.c  |   74 +-
 northd/lb.c               |  654 +++++
 northd/lb.h               |  217 ++
 northd/lflow-mgr.c        | 1409 +++++++++
 northd/lflow-mgr.h        |  189 ++
 northd/northd.c           | 5840 ++++++++++++++++---------------------
 northd/northd.h           |  475 ++-
 northd/ovn-northd.c       |    9 +
 tests/ovn-northd.at       |  887 +++++-
 35 files changed, 9681 insertions(+), 4645 deletions(-)
 create mode 100644 controller/lb.c
 create mode 100644 controller/lb.h
 create mode 100644 northd/en-global-config.c
 create mode 100644 northd/en-global-config.h
 create mode 100644 northd/en-lr-nat.c
 create mode 100644 northd/en-lr-nat.h
 create mode 100644 northd/en-lr-stateful.c
 create mode 100644 northd/en-lr-stateful.h
 create mode 100644 northd/en-ls-stateful.c
 create mode 100644 northd/en-ls-stateful.h
 create mode 100644 northd/lb.c
 create mode 100644 northd/lb.h
 create mode 100644 northd/lflow-mgr.c
 create mode 100644 northd/lflow-mgr.h

Comments

Dumitru Ceara Feb. 2, 2024, 12:21 p.m. UTC | #1
On 1/30/24 22:20, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 

Hi Numan,

> This patch series adds incremental processing in the lflow engine
> node to handle changes to northd and other engine nodes.
> Changed related to load balancers and NAT are mainly handled in
> this patch series.
> 
> This patch series can also be found here - https://github.com/numansiddique/ovn/tree/northd_lbnatacl_lflow/v5
> 
> Prior to this patch series, most of the changes to northd engine
> resulted in full recomputation of logical flows.  This series
> aims to improve the performance of ovn-northd by adding the I-P
> support.  In order to add this support, some of the northd engine
> node data (from struct ovn_datapath) is split and moved over to
> new engine nodes - mainly related to load balancers, NAT and ACLs.
> 
> Below are the scale testing results done with these patches applied
> using ovn-heater.  The test ran the scenario  -
> ocp-500-density-heavy.yml [1].
> 
> With all the lflow I-P patches applied, the resuts are:
> 
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>                         Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Iteration Total         0.136883        1.129016        1.192001        1.204167        1.212728        0.665017        83.127099       125     0
> Namespace.add_ports     0.005216        0.005736        0.007034        0.015486        0.018978        0.006211        0.776373        125     0
> WorkerNode.bind_port    0.035030        0.046082        0.052469        0.058293        0.060311        0.045973        11.493259       250     0
> WorkerNode.ping_port    0.005057        0.006727        1.047692        1.069253        1.071336        0.266896        66.724094       250     0
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> The results with the present main are:
> 
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>                         Min (s)         Median (s)      90%ile (s)      99%ile (s)      Max (s)         Mean (s)        Total (s)       Count   Failed
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Iteration Total         0.135491        2.223805        3.311270        3.339078        3.345346        1.729172        216.146495      125     0
> Namespace.add_ports     0.005380        0.005744        0.006819        0.018773        0.020800        0.006292        0.786532        125     0
> WorkerNode.bind_port    0.034179        0.046055        0.053488        0.058801        0.071043        0.046117        11.529311       250     0
> WorkerNode.ping_port    0.004956        0.006952        3.086952        3.191743        3.192807        0.791544        197.886026      250     0
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> Please see the link [2] which has a high level description of the
> changes done in this patch series.
> 
> 
> [1] - https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> [2] - https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410053.html
> 
> v5 -> v6
> ------
>    * Applied the first 3 patches of v5 after addressing all the review
>      comments (and with the Acks)
>  
>    * Rebased to latest main and resolved the conflicts.
> 
>    * Addressed almost all of the review comments received for v5 from
>      Han and Dumitru.
>         - Added detailed documentation on 'struct lflow_ref' and life
>           cycle of 'struct lflow_ref_node'.
>         - Added documentation on the thread safety limitations when
>           using 'struct lflow_ref'.
> 
> v4 -> v5
> -------
>    * Rebased to latest main and resolved the conflicts.
> 
>    * Addressed the review comments from Han in patch 15 (and in p8).  Removed the
>      assert if SB dp group is missing and handled it by returning false
>      so that lflow engine recomputes.  Added test cases to cover this
>      scenario for both lflows (p8) and SB load balancers (p15) .
> 
> v3 -> v4
> -------
>    * Addressed most of the review comments from Dumitru and Han.
> 
>    * Found a couple of bugs in v3 patch 9 -
>      "northd: Refactor lflow management into a separate module."
>      and addressed them in v4.
>      To brief  the issue, if a logical flow L(M, A) is referenced
>      by 2 lflow_ref's which belong to the same datapath, then the lflow
>      was deleted even if one lflow_ref was cleared due to any changes.
>      It is addressed now by maintaining a reference count in the 'struct
>      ovn_lflow' for each datapath it is used by.
> 
>    * Moved the v3 patch 14 ("northd:  Add I-P for NB_Global and SB_Global.")
>      to patch 16 in v4.  There were comments in this patch to not add a
>      full I-P for NB_Global and SB_Global.  Made this patch as the last
>      in the series so that we can discuss further and not block other patches
>      in case we want to drop this one.
> 
> 
> v2 -> v3
> -------
>    * Addressed some of the review comments from Han and Dumitru.  There
>      are still a few pending review comments which needs to be addressed
>      or discussed.
> 
>    * Renamed the engine node from "lr_lbnat_data" to "lr_stateful"
>      (v3 patch 5).
> 
>    * Renamed the engine node from "ls_lbacls" to "ls_stateful" (v3 patch 8).
> 
>    * Removed v2 patch 2 from the series (northd: Track ovn_datapaths in
>      northd engine track data.").  This patch is now part of v3 patch 7
>      (northd: Add a new node 'ls_stateful').
> 
>    * Squashed v2 patch 8 (northd: Don't commit dhcp response flows in
>      the conntrack.) into v3 patch 7 (northd: Add a new node
>      'ls_stateful'.)
> 
> 
> v1 -> v2
> --------
>    * Now also maintaing array indexes for ls_lbacls, lr_nat and
>      lr_lb_nat_data tables (similar to ovn_datapaths->array) to
>      make the lookup effecient.  The same ovn_datapath->index
>      is reused.
> 
>    * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c.
>      In v2 we don't use objdep_mgr to maintain the resource to lflow
>      references.  Instead we maintain the 'struct lflow' pointer.
>      With this we don't need to maintain additional hmap of lflows.
> 

[...]

>  35 files changed, 9681 insertions(+), 4645 deletions(-)

I had another look at this series and acked the remaining patches.  I
just had some minor comments that can be easily fixed when applying the
patches to the main branch.

Thanks for all the work on this!  It was a very large change but it
improves northd performance significantly.  I just hope we don't
introduce too many bugs.  Hopefully the time we have until release will
allow us to further test this change on the 24.03 branch.

Regards,
Dumitru