Message ID | 20240208214904.12696-1-numans@ovn.org |
---|---|
Headers | show |
Series | northd memory and CPU increase fix due to lflow-mgr. | expand |
On Thu, Feb 8, 2024, 4:49 PM <numans@ovn.org> wrote: > From: Numan Siddique <numans@ovn.org> > > This patch series fixes the memory and CPU usage increase seen > in ovn-northd after the lflow I-P patches were merged. > > The first 2 patches in the series addresses duplicate flows > added by northd into the lflow table for the same datapath. > > The 3rd patch fixes a bug in lflow-mgr and the 4th patch > actually addresses the northd memory and CPU increase issue. > > We considered 2 approaches to solve this. > > Approach 1 (which this series adopts) solves by maintaining dp > refcnt for an lflow only if required. > > Approach 2 (which can be found [1]) solves this by resorting to > a full recompute when an lflow is added more than once for a datapath. > > Below are the test results with ovn-heater for both the approaches. > > Cluster density 500 node test > ----------------------------- > > | Avg. Poll Intervals | Total test time | northd > RSS > > --------------------------------+-----------------+------------------------- > Before lflow I-P | 1.5 seconds | 1005 seconds | 2.5 GB > lflow i-p merged | 6 seconds | 2246 seconds | 8.5 GB > Approach 1 | 2.1 seconds | 1142 seconds | 2.67 GB > Approach 2 | 1.8 seconds | 1046 seconds | 2.41 GB > > ----------------------------------------------------------------------------- > > Node density heavy 500 node test > -------------------------------- > > | Avg. Poll Intervals | Total test time | northd > RSS > --------------------------------+-----------------+----------------------- > Before lflow I-P | 1.3 seconds | 192 seconds | 1.49 GB > > lflow I-P merged | 4.5 seconds | 87 seconds | 7.3 GB > Approach 1 | 2.4 seconds | 83 seconds | 2.2 GB > A clarification here : with Approach 1 even though the 2.4 seconds poll interval seems high, but it is only with 3 poll interval data. i.e. ovn-northd didn't recompute for the most part of test duration. Thanks Numan Approach 2 | 1.36 seconds | 193 seconds | 2.2 GB > ------------------------------------------------------------------------- > > Both has advantages and disadvantages > > (As outlined by Ilya below about pros and cons) > Approach 1 > --------- > Pros: > * Doesn't fall back to recompute more often than current main. > * Fairly simple. > * Can be optimized by getting rid of duplicated lflows - we'll allocate > less refcounts. > Cons: > * Higher CPU and memory usage in ovn-heater tests due to actual refcount > and > hash map allocations. > > Approach 2: > --------- > Pros: > * Lower memory usage due to no refcounting. > * Lower CPU usage in cases where we do not fall into recompute. > * End code is simpler. > * Can be optimized by getting rid of duplicated lflows - we'll not fall > back to recompute that often. > > Cons: > * Falling into recompute more frequently - Higher CPU usage in some > cases. > (whenever users create the same LBs for different protos) > * Concerning log message in legitimate configurations. > > > We chose Approach 1 based on the above test results. > > [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1 > > > Ilya Maximets (1): > northd: lflow-mgr: Allocate DP reference counters on a second use. > > Numan Siddique (3): > northd: Don't add lr_out_delivery default drop flow for each lrp. > northd: Don't add ARP request responder flows for NAT multiple times. > northd: Fix lflow ref node's reference counting. > > northd/en-lr-nat.c | 6 ++++++ > northd/en-lr-nat.h | 2 ++ > northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++-------------------- > northd/northd.c | 43 +++++++++++++++++++++++++++++--------- > northd/northd.h | 1 + > 5 files changed, 71 insertions(+), 33 deletions(-) > > -- > 2.43.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On 2/9/24 00:28, Numan Siddique wrote: > On Thu, Feb 8, 2024, 4:49 PM <numans@ovn.org> wrote: > >> From: Numan Siddique <numans@ovn.org> >> >> This patch series fixes the memory and CPU usage increase seen >> in ovn-northd after the lflow I-P patches were merged. >> >> The first 2 patches in the series addresses duplicate flows >> added by northd into the lflow table for the same datapath. >> >> The 3rd patch fixes a bug in lflow-mgr and the 4th patch >> actually addresses the northd memory and CPU increase issue. >> >> We considered 2 approaches to solve this. >> >> Approach 1 (which this series adopts) solves by maintaining dp >> refcnt for an lflow only if required. >> >> Approach 2 (which can be found [1]) solves this by resorting to >> a full recompute when an lflow is added more than once for a datapath. >> >> Below are the test results with ovn-heater for both the approaches. >> >> Cluster density 500 node test >> ----------------------------- >> >> | Avg. Poll Intervals | Total test time | northd >> RSS >> >> --------------------------------+-----------------+------------------------- >> Before lflow I-P | 1.5 seconds | 1005 seconds | 2.5 GB >> lflow i-p merged | 6 seconds | 2246 seconds | 8.5 GB >> Approach 1 | 2.1 seconds | 1142 seconds | 2.67 GB >> Approach 2 | 1.8 seconds | 1046 seconds | 2.41 GB >> >> ----------------------------------------------------------------------------- >> >> Node density heavy 500 node test >> -------------------------------- >> >> | Avg. Poll Intervals | Total test time | northd >> RSS >> --------------------------------+-----------------+----------------------- >> Before lflow I-P | 1.3 seconds | 192 seconds | 1.49 GB >> >> lflow I-P merged | 4.5 seconds | 87 seconds | 7.3 GB >> Approach 1 | 2.4 seconds | 83 seconds | 2.2 GB >> > > > A clarification here : with Approach 1 even though the 2.4 seconds poll > interval seems high, but it is only with 3 poll interval data. i.e. > ovn-northd didn't recompute for the most part of test duration. > > > Thanks > Numan > > Approach 2 | 1.36 seconds | 193 seconds | 2.2 GB >> ------------------------------------------------------------------------- >> >> Both has advantages and disadvantages >> >> (As outlined by Ilya below about pros and cons) >> Approach 1 >> --------- >> Pros: >> * Doesn't fall back to recompute more often than current main. >> * Fairly simple. >> * Can be optimized by getting rid of duplicated lflows - we'll allocate >> less refcounts. >> Cons: >> * Higher CPU and memory usage in ovn-heater tests due to actual refcount >> and >> hash map allocations. >> >> Approach 2: >> --------- >> Pros: >> * Lower memory usage due to no refcounting. >> * Lower CPU usage in cases where we do not fall into recompute. >> * End code is simpler. >> * Can be optimized by getting rid of duplicated lflows - we'll not fall >> back to recompute that often. >> >> Cons: >> * Falling into recompute more frequently - Higher CPU usage in some >> cases. >> (whenever users create the same LBs for different protos) >> * Concerning log message in legitimate configurations. >> >> >> We chose Approach 1 based on the above test results. >> >> [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1 >> To add to the ovn-heater results, we also ran some OpenShift scale tests with this patch set applied. The tests were run on a 120 node cluster deployed with the OVN-Kubernetes "interconnect" architecture (northd and DBs running on each node). The results are listed below. CPU and memory usage are aggregated across the whole cluster: +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+ | Version | Test Case | POD-ready latency (avg) | Pod-Ready latency (max) | northd CPU (avg) | northd CPU (max) | northd RSS (avg) | northd RSS (max) | +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+ | branch-23.09 | cluster-density | 5.72s | 13s | 4620% | 6363% | 18.3GB | 23.6GB | | branch-24.09 | cluster-density | 5.74s | 13s | 3246% | 5921% | 18.8GB | 24.3GB | | with patch set | cluster-density | 5.75s | 13s | 3190% | 5772% | 18.6GB | 24.2GB | | | | | | | | | | | branch-23.09 | node-density | 1.98s | 4s | 5054% | 12893% | 10.8GB | 27.3GB | | branch-24.09 | node-density | 1.92s | 4s | 1348% | 5655% | 10.8GB | 28.1GB | | with patch set | node-density | 1.92s | 4s | 1433% | 5698% | 10.6GB | 14.1GB | +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+ This shows that in the OpenShift use case, the patch set performs at least as good as the (buggy) branch-24.09 version which includes a623606052ea ("northd: Refactor lflow management into a separate module."). I think it's fine to merge this series (once the few small comments Han and I had are addressed). Regards, Dumitru
On Wed, Feb 14, 2024 at 9:51 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 2/9/24 00:28, Numan Siddique wrote: > > On Thu, Feb 8, 2024, 4:49 PM <numans@ovn.org> wrote: > > > >> From: Numan Siddique <numans@ovn.org> > >> > >> This patch series fixes the memory and CPU usage increase seen > >> in ovn-northd after the lflow I-P patches were merged. > >> > >> The first 2 patches in the series addresses duplicate flows > >> added by northd into the lflow table for the same datapath. > >> > >> The 3rd patch fixes a bug in lflow-mgr and the 4th patch > >> actually addresses the northd memory and CPU increase issue. > >> > >> We considered 2 approaches to solve this. > >> > >> Approach 1 (which this series adopts) solves by maintaining dp > >> refcnt for an lflow only if required. > >> > >> Approach 2 (which can be found [1]) solves this by resorting to > >> a full recompute when an lflow is added more than once for a datapath. > >> > >> Below are the test results with ovn-heater for both the approaches. > >> > >> Cluster density 500 node test > >> ----------------------------- > >> > >> | Avg. Poll Intervals | Total test time | northd > >> RSS > >> > >> --------------------------------+-----------------+------------------------- > >> Before lflow I-P | 1.5 seconds | 1005 seconds | 2.5 GB > >> lflow i-p merged | 6 seconds | 2246 seconds | 8.5 GB > >> Approach 1 | 2.1 seconds | 1142 seconds | 2.67 GB > >> Approach 2 | 1.8 seconds | 1046 seconds | 2.41 GB > >> > >> ----------------------------------------------------------------------------- > >> > >> Node density heavy 500 node test > >> -------------------------------- > >> > >> | Avg. Poll Intervals | Total test time | northd > >> RSS > >> --------------------------------+-----------------+----------------------- > >> Before lflow I-P | 1.3 seconds | 192 seconds | 1.49 GB > >> > >> lflow I-P merged | 4.5 seconds | 87 seconds | 7.3 GB > >> Approach 1 | 2.4 seconds | 83 seconds | 2.2 GB > >> > > > > > > A clarification here : with Approach 1 even though the 2.4 seconds poll > > interval seems high, but it is only with 3 poll interval data. i.e. > > ovn-northd didn't recompute for the most part of test duration. > > > > > > Thanks > > Numan > > > > Approach 2 | 1.36 seconds | 193 seconds | 2.2 GB > >> ------------------------------------------------------------------------- > >> > >> Both has advantages and disadvantages > >> > >> (As outlined by Ilya below about pros and cons) > >> Approach 1 > >> --------- > >> Pros: > >> * Doesn't fall back to recompute more often than current main. > >> * Fairly simple. > >> * Can be optimized by getting rid of duplicated lflows - we'll allocate > >> less refcounts. > >> Cons: > >> * Higher CPU and memory usage in ovn-heater tests due to actual refcount > >> and > >> hash map allocations. > >> > >> Approach 2: > >> --------- > >> Pros: > >> * Lower memory usage due to no refcounting. > >> * Lower CPU usage in cases where we do not fall into recompute. > >> * End code is simpler. > >> * Can be optimized by getting rid of duplicated lflows - we'll not fall > >> back to recompute that often. > >> > >> Cons: > >> * Falling into recompute more frequently - Higher CPU usage in some > >> cases. > >> (whenever users create the same LBs for different protos) > >> * Concerning log message in legitimate configurations. > >> > >> > >> We chose Approach 1 based on the above test results. > >> > >> [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1 > >> > > To add to the ovn-heater results, we also ran some OpenShift scale tests > with this patch set applied. The tests were run on a 120 node cluster > deployed with the OVN-Kubernetes "interconnect" architecture (northd and > DBs running on each node). The results are listed below. CPU and memory > usage are aggregated across the whole cluster: > > +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+ > | Version | Test Case | POD-ready latency (avg) | Pod-Ready latency (max) | northd CPU (avg) | northd CPU (max) | northd RSS (avg) | northd RSS (max) | > +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+ > | branch-23.09 | cluster-density | 5.72s | 13s | 4620% | 6363% | 18.3GB | 23.6GB | > | branch-24.09 | cluster-density | 5.74s | 13s | 3246% | 5921% | 18.8GB | 24.3GB | > | with patch set | cluster-density | 5.75s | 13s | 3190% | 5772% | 18.6GB | 24.2GB | > | | | | | | | | | > | branch-23.09 | node-density | 1.98s | 4s | 5054% | 12893% | 10.8GB | 27.3GB | > | branch-24.09 | node-density | 1.92s | 4s | 1348% | 5655% | 10.8GB | 28.1GB | > | with patch set | node-density | 1.92s | 4s | 1433% | 5698% | 10.6GB | 14.1GB | > +----------------+-----------------+-------------------------+-------------------------+------------------+------------------+------------------+------------------+ > > This shows that in the OpenShift use case, the patch set performs at > least as good as the (buggy) branch-24.09 version which includes > a623606052ea ("northd: Refactor lflow management into a separate module."). > > I think it's fine to merge this series (once the few small comments Han > and I had are addressed). > Thanks Han, Ilya and Dumitru (and also for the scale tests). I addressed the review comments and applied the patches to main and branch-24.03. Numan > Regards, > Dumitru > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
From: Numan Siddique <numans@ovn.org> This patch series fixes the memory and CPU usage increase seen in ovn-northd after the lflow I-P patches were merged. The first 2 patches in the series addresses duplicate flows added by northd into the lflow table for the same datapath. The 3rd patch fixes a bug in lflow-mgr and the 4th patch actually addresses the northd memory and CPU increase issue. We considered 2 approaches to solve this. Approach 1 (which this series adopts) solves by maintaining dp refcnt for an lflow only if required. Approach 2 (which can be found [1]) solves this by resorting to a full recompute when an lflow is added more than once for a datapath. Below are the test results with ovn-heater for both the approaches. Cluster density 500 node test ----------------------------- | Avg. Poll Intervals | Total test time | northd RSS --------------------------------+-----------------+------------------------- Before lflow I-P | 1.5 seconds | 1005 seconds | 2.5 GB lflow i-p merged | 6 seconds | 2246 seconds | 8.5 GB Approach 1 | 2.1 seconds | 1142 seconds | 2.67 GB Approach 2 | 1.8 seconds | 1046 seconds | 2.41 GB ----------------------------------------------------------------------------- Node density heavy 500 node test -------------------------------- | Avg. Poll Intervals | Total test time | northd RSS --------------------------------+-----------------+----------------------- Before lflow I-P | 1.3 seconds | 192 seconds | 1.49 GB lflow I-P merged | 4.5 seconds | 87 seconds | 7.3 GB Approach 1 | 2.4 seconds | 83 seconds | 2.2 GB Approach 2 | 1.36 seconds | 193 seconds | 2.2 GB ------------------------------------------------------------------------- Both has advantages and disadvantages (As outlined by Ilya below about pros and cons) Approach 1 --------- Pros: * Doesn't fall back to recompute more often than current main. * Fairly simple. * Can be optimized by getting rid of duplicated lflows - we'll allocate less refcounts. Cons: * Higher CPU and memory usage in ovn-heater tests due to actual refcount and hash map allocations. Approach 2: --------- Pros: * Lower memory usage due to no refcounting. * Lower CPU usage in cases where we do not fall into recompute. * End code is simpler. * Can be optimized by getting rid of duplicated lflows - we'll not fall back to recompute that often. Cons: * Falling into recompute more frequently - Higher CPU usage in some cases. (whenever users create the same LBs for different protos) * Concerning log message in legitimate configurations. We chose Approach 1 based on the above test results. [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1 Ilya Maximets (1): northd: lflow-mgr: Allocate DP reference counters on a second use. Numan Siddique (3): northd: Don't add lr_out_delivery default drop flow for each lrp. northd: Don't add ARP request responder flows for NAT multiple times. northd: Fix lflow ref node's reference counting. northd/en-lr-nat.c | 6 ++++++ northd/en-lr-nat.h | 2 ++ northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++-------------------- northd/northd.c | 43 +++++++++++++++++++++++++++++--------- northd/northd.h | 1 + 5 files changed, 71 insertions(+), 33 deletions(-)