mbox series

[ovs-dev,v1,0/4] northd memory and CPU increase fix due to lflow-mgr.

Message ID 20240208214904.12696-1-numans@ovn.org
Headers show
Series northd memory and CPU increase fix due to lflow-mgr. | expand

Message

Numan Siddique Feb. 8, 2024, 9:49 p.m. UTC
From: Numan Siddique <numans@ovn.org>

This patch series fixes the memory and CPU usage increase seen
in ovn-northd after the lflow I-P patches were merged.

The first 2 patches in the series addresses duplicate flows
added by northd into the lflow table for the same datapath.

The 3rd patch fixes a bug in lflow-mgr and the 4th patch
actually addresses the northd memory and CPU increase issue.

We considered 2 approaches to solve this.

Approach 1 (which this series adopts) solves by maintaining dp
refcnt for an lflow only if required.

Approach 2 (which can be found [1]) solves this by resorting to
a full recompute when an lflow is added more than once for a datapath.

Below are the test results with ovn-heater for both the approaches.

Cluster density 500 node test
-----------------------------

                       | Avg. Poll Intervals | Total test time |  northd RSS
--------------------------------+-----------------+-------------------------
Before lflow I-P       |     1.5 seconds     |  1005 seconds   |    2.5 GB
lflow i-p merged       |      6  seconds     |  2246 seconds   |    8.5 GB
Approach 1             |     2.1 seconds     |  1142 seconds   |    2.67 GB
Approach 2             |     1.8 seconds     |  1046 seconds   |    2.41 GB
-----------------------------------------------------------------------------

Node density heavy 500 node test
--------------------------------

                       | Avg. Poll Intervals | Total test time |  northd RSS
--------------------------------+-----------------+-----------------------
Before lflow I-P       |     1.3 seconds     |  192 seconds    |  1.49 GB   
lflow I-P merged       |     4.5 seconds     |  87 seconds     |  7.3 GB
Approach 1             |     2.4 seconds     |  83 seconds     |  2.2 GB
Approach 2             |     1.36 seconds    |  193 seconds    |  2.2 GB
-------------------------------------------------------------------------

Both has advantages and disadvantages

(As outlined by Ilya below about pros and cons)
Approach 1
---------
Pros:
  * Doesn't fall back to recompute more often than current main.
  * Fairly simple.
  * Can be optimized by getting rid of duplicated lflows - we'll allocate less refcounts.
Cons:
  * Higher CPU and memory usage in ovn-heater tests due to actual refcount and
    hash map allocations.

Approach 2:
---------
Pros:
   * Lower memory usage due to no refcounting.
   * Lower CPU usage in cases where we do not fall into recompute.
   * End code is simpler.
   * Can be optimized by getting rid of duplicated lflows - we'll not fall back to recompute that often.

Cons:
   * Falling into recompute more frequently - Higher CPU usage in some cases.
     (whenever users create the same LBs for different protos)
   * Concerning log message in legitimate configurations.


We chose Approach 1 based on the above test results.

[1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1


Ilya Maximets (1):
  northd: lflow-mgr: Allocate DP reference counters on a second use.

Numan Siddique (3):
  northd: Don't add lr_out_delivery default drop flow for each lrp.
  northd:  Don't add ARP request responder flows for NAT multiple times.
  northd: Fix lflow ref node's reference counting.

 northd/en-lr-nat.c |  6 ++++++
 northd/en-lr-nat.h |  2 ++
 northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++--------------------
 northd/northd.c    | 43 +++++++++++++++++++++++++++++---------
 northd/northd.h    |  1 +
 5 files changed, 71 insertions(+), 33 deletions(-)

Comments

Numan Siddique Feb. 8, 2024, 11:28 p.m. UTC | #1
On Thu, Feb 8, 2024, 4:49 PM <numans@ovn.org> wrote:

> From: Numan Siddique <numans@ovn.org>
>
> This patch series fixes the memory and CPU usage increase seen
> in ovn-northd after the lflow I-P patches were merged.
>
> The first 2 patches in the series addresses duplicate flows
> added by northd into the lflow table for the same datapath.
>
> The 3rd patch fixes a bug in lflow-mgr and the 4th patch
> actually addresses the northd memory and CPU increase issue.
>
> We considered 2 approaches to solve this.
>
> Approach 1 (which this series adopts) solves by maintaining dp
> refcnt for an lflow only if required.
>
> Approach 2 (which can be found [1]) solves this by resorting to
> a full recompute when an lflow is added more than once for a datapath.
>
> Below are the test results with ovn-heater for both the approaches.
>
> Cluster density 500 node test
> -----------------------------
>
>                        | Avg. Poll Intervals | Total test time |  northd
> RSS
>
> --------------------------------+-----------------+-------------------------
> Before lflow I-P       |     1.5 seconds     |  1005 seconds   |    2.5 GB
> lflow i-p merged       |      6  seconds     |  2246 seconds   |    8.5 GB
> Approach 1             |     2.1 seconds     |  1142 seconds   |    2.67 GB
> Approach 2             |     1.8 seconds     |  1046 seconds   |    2.41 GB
>
> -----------------------------------------------------------------------------
>
> Node density heavy 500 node test
> --------------------------------
>
>                        | Avg. Poll Intervals | Total test time |  northd
> RSS
> --------------------------------+-----------------+-----------------------
> Before lflow I-P       |     1.3 seconds     |  192 seconds    |  1.49 GB
>
> lflow I-P merged       |     4.5 seconds     |  87 seconds     |  7.3 GB
> Approach 1             |     2.4 seconds     |  83 seconds     |  2.2 GB
>


A clarification here : with Approach 1 even though the 2.4 seconds poll
interval seems high, but it is only with 3 poll interval data.  i.e.
ovn-northd didn't recompute for the most part of  test duration.


Thanks
Numan

Approach 2             |     1.36 seconds    |  193 seconds    |  2.2 GB
> -------------------------------------------------------------------------
>
> Both has advantages and disadvantages
>
> (As outlined by Ilya below about pros and cons)
> Approach 1
> ---------
> Pros:
>   * Doesn't fall back to recompute more often than current main.
>   * Fairly simple.
>   * Can be optimized by getting rid of duplicated lflows - we'll allocate
> less refcounts.
> Cons:
>   * Higher CPU and memory usage in ovn-heater tests due to actual refcount
> and
>     hash map allocations.
>
> Approach 2:
> ---------
> Pros:
>    * Lower memory usage due to no refcounting.
>    * Lower CPU usage in cases where we do not fall into recompute.
>    * End code is simpler.
>    * Can be optimized by getting rid of duplicated lflows - we'll not fall
> back to recompute that often.
>
> Cons:
>    * Falling into recompute more frequently - Higher CPU usage in some
> cases.
>      (whenever users create the same LBs for different protos)
>    * Concerning log message in legitimate configurations.
>
>
> We chose Approach 1 based on the above test results.
>
> [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1
>
>
> Ilya Maximets (1):
>   northd: lflow-mgr: Allocate DP reference counters on a second use.
>
> Numan Siddique (3):
>   northd: Don't add lr_out_delivery default drop flow for each lrp.
>   northd:  Don't add ARP request responder flows for NAT multiple times.
>   northd: Fix lflow ref node's reference counting.
>
>  northd/en-lr-nat.c |  6 ++++++
>  northd/en-lr-nat.h |  2 ++
>  northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++--------------------
>  northd/northd.c    | 43 +++++++++++++++++++++++++++++---------
>  northd/northd.h    |  1 +
>  5 files changed, 71 insertions(+), 33 deletions(-)
>
> --
> 2.43.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Dumitru Ceara Feb. 14, 2024, 2:51 p.m. UTC | #2
On 2/9/24 00:28, Numan Siddique wrote:
> On Thu, Feb 8, 2024, 4:49 PM <numans@ovn.org> wrote:
> 
>> From: Numan Siddique <numans@ovn.org>
>>
>> This patch series fixes the memory and CPU usage increase seen
>> in ovn-northd after the lflow I-P patches were merged.
>>
>> The first 2 patches in the series addresses duplicate flows
>> added by northd into the lflow table for the same datapath.
>>
>> The 3rd patch fixes a bug in lflow-mgr and the 4th patch
>> actually addresses the northd memory and CPU increase issue.
>>
>> We considered 2 approaches to solve this.
>>
>> Approach 1 (which this series adopts) solves by maintaining dp
>> refcnt for an lflow only if required.
>>
>> Approach 2 (which can be found [1]) solves this by resorting to
>> a full recompute when an lflow is added more than once for a datapath.
>>
>> Below are the test results with ovn-heater for both the approaches.
>>
>> Cluster density 500 node test
>> -----------------------------
>>
>>                        | Avg. Poll Intervals | Total test time |  northd
>> RSS
>>
>> --------------------------------+-----------------+-------------------------
>> Before lflow I-P       |     1.5 seconds     |  1005 seconds   |    2.5 GB
>> lflow i-p merged       |      6  seconds     |  2246 seconds   |    8.5 GB
>> Approach 1             |     2.1 seconds     |  1142 seconds   |    2.67 GB
>> Approach 2             |     1.8 seconds     |  1046 seconds   |    2.41 GB
>>
>> -----------------------------------------------------------------------------
>>
>> Node density heavy 500 node test
>> --------------------------------
>>
>>                        | Avg. Poll Intervals | Total test time |  northd
>> RSS
>> --------------------------------+-----------------+-----------------------
>> Before lflow I-P       |     1.3 seconds     |  192 seconds    |  1.49 GB
>>
>> lflow I-P merged       |     4.5 seconds     |  87 seconds     |  7.3 GB
>> Approach 1             |     2.4 seconds     |  83 seconds     |  2.2 GB
>>
> 
> 
> A clarification here : with Approach 1 even though the 2.4 seconds poll
> interval seems high, but it is only with 3 poll interval data.  i.e.
> ovn-northd didn't recompute for the most part of  test duration.
> 
> 
> Thanks
> Numan
> 
> Approach 2             |     1.36 seconds    |  193 seconds    |  2.2 GB
>> -------------------------------------------------------------------------
>>
>> Both has advantages and disadvantages
>>
>> (As outlined by Ilya below about pros and cons)
>> Approach 1
>> ---------
>> Pros:
>>   * Doesn't fall back to recompute more often than current main.
>>   * Fairly simple.
>>   * Can be optimized by getting rid of duplicated lflows - we'll allocate
>> less refcounts.
>> Cons:
>>   * Higher CPU and memory usage in ovn-heater tests due to actual refcount
>> and
>>     hash map allocations.
>>
>> Approach 2:
>> ---------
>> Pros:
>>    * Lower memory usage due to no refcounting.
>>    * Lower CPU usage in cases where we do not fall into recompute.
>>    * End code is simpler.
>>    * Can be optimized by getting rid of duplicated lflows - we'll not fall
>> back to recompute that often.
>>
>> Cons:
>>    * Falling into recompute more frequently - Higher CPU usage in some
>> cases.
>>      (whenever users create the same LBs for different protos)
>>    * Concerning log message in legitimate configurations.
>>
>>
>> We chose Approach 1 based on the above test results.
>>
>> [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1
>>

To add to the ovn-heater results, we also ran some OpenShift scale tests
with this patch set applied.  The tests were run on a 120 node cluster
deployed with the OVN-Kubernetes "interconnect" architecture (northd and
DBs running on each node).  The results are listed below.  CPU and memory
usage are aggregated across the whole cluster:

+----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+
|    Version     |    Test Case    | POD-ready latency (avg) | Pod-Ready latency (max) | northd CPU (avg) | northd CPU (max) | northd RSS (avg) | northd RSS (max) |
+----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+
| branch-23.09   | cluster-density | 5.72s                   | 13s                     | 4620%            | 6363%            | 18.3GB           | 23.6GB           |
| branch-24.09   | cluster-density | 5.74s                   | 13s                     | 3246%            | 5921%            | 18.8GB           | 24.3GB           |
| with patch set | cluster-density | 5.75s                   | 13s                     | 3190%            | 5772%            | 18.6GB           | 24.2GB           |
|                |                 |                         |                         |                  |                  |                  |                  |
| branch-23.09   | node-density    | 1.98s                   | 4s                      | 5054%            | 12893%           | 10.8GB           | 27.3GB           |
| branch-24.09   | node-density    | 1.92s                   | 4s                      | 1348%            | 5655%            | 10.8GB           | 28.1GB           |
| with patch set | node-density    | 1.92s                   | 4s                      | 1433%            | 5698%            | 10.6GB           | 14.1GB           |
+----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+

This shows that in the OpenShift use case, the patch set performs at
least as good as the (buggy) branch-24.09 version which includes
a623606052ea ("northd: Refactor lflow management into a separate module.").

I think it's fine to merge this series (once the few small comments Han
and I had are addressed).

Regards,
Dumitru
Numan Siddique Feb. 14, 2024, 5:55 p.m. UTC | #3
On Wed, Feb 14, 2024 at 9:51 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 2/9/24 00:28, Numan Siddique wrote:
> > On Thu, Feb 8, 2024, 4:49 PM <numans@ovn.org> wrote:
> >
> >> From: Numan Siddique <numans@ovn.org>
> >>
> >> This patch series fixes the memory and CPU usage increase seen
> >> in ovn-northd after the lflow I-P patches were merged.
> >>
> >> The first 2 patches in the series addresses duplicate flows
> >> added by northd into the lflow table for the same datapath.
> >>
> >> The 3rd patch fixes a bug in lflow-mgr and the 4th patch
> >> actually addresses the northd memory and CPU increase issue.
> >>
> >> We considered 2 approaches to solve this.
> >>
> >> Approach 1 (which this series adopts) solves by maintaining dp
> >> refcnt for an lflow only if required.
> >>
> >> Approach 2 (which can be found [1]) solves this by resorting to
> >> a full recompute when an lflow is added more than once for a datapath.
> >>
> >> Below are the test results with ovn-heater for both the approaches.
> >>
> >> Cluster density 500 node test
> >> -----------------------------
> >>
> >>                        | Avg. Poll Intervals | Total test time |  northd
> >> RSS
> >>
> >> --------------------------------+-----------------+-------------------------
> >> Before lflow I-P       |     1.5 seconds     |  1005 seconds   |    2.5 GB
> >> lflow i-p merged       |      6  seconds     |  2246 seconds   |    8.5 GB
> >> Approach 1             |     2.1 seconds     |  1142 seconds   |    2.67 GB
> >> Approach 2             |     1.8 seconds     |  1046 seconds   |    2.41 GB
> >>
> >> -----------------------------------------------------------------------------
> >>
> >> Node density heavy 500 node test
> >> --------------------------------
> >>
> >>                        | Avg. Poll Intervals | Total test time |  northd
> >> RSS
> >> --------------------------------+-----------------+-----------------------
> >> Before lflow I-P       |     1.3 seconds     |  192 seconds    |  1.49 GB
> >>
> >> lflow I-P merged       |     4.5 seconds     |  87 seconds     |  7.3 GB
> >> Approach 1             |     2.4 seconds     |  83 seconds     |  2.2 GB
> >>
> >
> >
> > A clarification here : with Approach 1 even though the 2.4 seconds poll
> > interval seems high, but it is only with 3 poll interval data.  i.e.
> > ovn-northd didn't recompute for the most part of  test duration.
> >
> >
> > Thanks
> > Numan
> >
> > Approach 2             |     1.36 seconds    |  193 seconds    |  2.2 GB
> >> -------------------------------------------------------------------------
> >>
> >> Both has advantages and disadvantages
> >>
> >> (As outlined by Ilya below about pros and cons)
> >> Approach 1
> >> ---------
> >> Pros:
> >>   * Doesn't fall back to recompute more often than current main.
> >>   * Fairly simple.
> >>   * Can be optimized by getting rid of duplicated lflows - we'll allocate
> >> less refcounts.
> >> Cons:
> >>   * Higher CPU and memory usage in ovn-heater tests due to actual refcount
> >> and
> >>     hash map allocations.
> >>
> >> Approach 2:
> >> ---------
> >> Pros:
> >>    * Lower memory usage due to no refcounting.
> >>    * Lower CPU usage in cases where we do not fall into recompute.
> >>    * End code is simpler.
> >>    * Can be optimized by getting rid of duplicated lflows - we'll not fall
> >> back to recompute that often.
> >>
> >> Cons:
> >>    * Falling into recompute more frequently - Higher CPU usage in some
> >> cases.
> >>      (whenever users create the same LBs for different protos)
> >>    * Concerning log message in legitimate configurations.
> >>
> >>
> >> We chose Approach 1 based on the above test results.
> >>
> >> [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1
> >>
>
> To add to the ovn-heater results, we also ran some OpenShift scale tests
> with this patch set applied.  The tests were run on a 120 node cluster
> deployed with the OVN-Kubernetes "interconnect" architecture (northd and
> DBs running on each node).  The results are listed below.  CPU and memory
> usage are aggregated across the whole cluster:
>
> +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+
> |    Version     |    Test Case    | POD-ready latency (avg) | Pod-Ready latency (max) | northd CPU (avg) | northd CPU (max) | northd RSS (avg) | northd RSS (max) |
> +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+
> | branch-23.09   | cluster-density | 5.72s                   | 13s                     | 4620%            | 6363%            | 18.3GB           | 23.6GB           |
> | branch-24.09   | cluster-density | 5.74s                   | 13s                     | 3246%            | 5921%            | 18.8GB           | 24.3GB           |
> | with patch set | cluster-density | 5.75s                   | 13s                     | 3190%            | 5772%            | 18.6GB           | 24.2GB           |
> |                |                 |                         |                         |                  |                  |                  |                  |
> | branch-23.09   | node-density    | 1.98s                   | 4s                      | 5054%            | 12893%           | 10.8GB           | 27.3GB           |
> | branch-24.09   | node-density    | 1.92s                   | 4s                      | 1348%            | 5655%            | 10.8GB           | 28.1GB           |
> | with patch set | node-density    | 1.92s                   | 4s                      | 1433%            | 5698%            | 10.6GB           | 14.1GB           |
> +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+
>
> This shows that in the OpenShift use case, the patch set performs at
> least as good as the (buggy) branch-24.09 version which includes
> a623606052ea ("northd: Refactor lflow management into a separate module.").
>
> I think it's fine to merge this series (once the few small comments Han
> and I had are addressed).
>

Thanks Han, Ilya and Dumitru (and also for the scale tests).  I
addressed the review comments and applied the patches to main and
branch-24.03.

Numan

> Regards,
> Dumitru
>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev