From patchwork Thu Feb 8 21:49:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Numan Siddique X-Patchwork-Id: 1896770 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TW9cM16FRz23hb for ; Fri, 9 Feb 2024 08:49:22 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 7F9E84EC3A; Thu, 8 Feb 2024 21:49:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vxt5T80-L0jF; Thu, 8 Feb 2024 21:49:19 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 82BC24E3C5 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id 82BC24E3C5; Thu, 8 Feb 2024 21:49:19 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 57085C0072; Thu, 8 Feb 2024 21:49:19 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 61569C0037 for ; Thu, 8 Feb 2024 21:49:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 3E4E78547E for ; Thu, 8 Feb 2024 21:49:18 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IEkC0NoTJ50l for ; Thu, 8 Feb 2024 21:49:17 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=217.70.183.197; helo=relay5-d.mail.gandi.net; envelope-from=numans@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org D06278547D Authentication-Results: smtp1.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org D06278547D Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by smtp1.osuosl.org (Postfix) with ESMTPS id D06278547D for ; Thu, 8 Feb 2024 21:49:16 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id F176A1C0005; Thu, 8 Feb 2024 21:49:12 +0000 (UTC) From: numans@ovn.org To: dev@openvswitch.org Date: Thu, 8 Feb 2024 16:49:04 -0500 Message-ID: <20240208214904.12696-1-numans@ovn.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-GND-Sasl: numans@ovn.org Subject: [ovs-dev] [PATCH ovn v1 0/4] northd memory and CPU increase fix due to lflow-mgr. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Numan Siddique This patch series fixes the memory and CPU usage increase seen in ovn-northd after the lflow I-P patches were merged. The first 2 patches in the series addresses duplicate flows added by northd into the lflow table for the same datapath. The 3rd patch fixes a bug in lflow-mgr and the 4th patch actually addresses the northd memory and CPU increase issue. We considered 2 approaches to solve this. Approach 1 (which this series adopts) solves by maintaining dp refcnt for an lflow only if required. Approach 2 (which can be found [1]) solves this by resorting to a full recompute when an lflow is added more than once for a datapath. Below are the test results with ovn-heater for both the approaches. Cluster density 500 node test ----------------------------- | Avg. Poll Intervals | Total test time | northd RSS --------------------------------+-----------------+------------------------- Before lflow I-P | 1.5 seconds | 1005 seconds | 2.5 GB lflow i-p merged | 6 seconds | 2246 seconds | 8.5 GB Approach 1 | 2.1 seconds | 1142 seconds | 2.67 GB Approach 2 | 1.8 seconds | 1046 seconds | 2.41 GB ----------------------------------------------------------------------------- Node density heavy 500 node test -------------------------------- | Avg. Poll Intervals | Total test time | northd RSS --------------------------------+-----------------+----------------------- Before lflow I-P | 1.3 seconds | 192 seconds | 1.49 GB lflow I-P merged | 4.5 seconds | 87 seconds | 7.3 GB Approach 1 | 2.4 seconds | 83 seconds | 2.2 GB Approach 2 | 1.36 seconds | 193 seconds | 2.2 GB ------------------------------------------------------------------------- Both has advantages and disadvantages (As outlined by Ilya below about pros and cons) Approach 1 --------- Pros: * Doesn't fall back to recompute more often than current main. * Fairly simple. * Can be optimized by getting rid of duplicated lflows - we'll allocate less refcounts. Cons: * Higher CPU and memory usage in ovn-heater tests due to actual refcount and hash map allocations. Approach 2: --------- Pros: * Lower memory usage due to no refcounting. * Lower CPU usage in cases where we do not fall into recompute. * End code is simpler. * Can be optimized by getting rid of duplicated lflows - we'll not fall back to recompute that often. Cons: * Falling into recompute more frequently - Higher CPU usage in some cases. (whenever users create the same LBs for different protos) * Concerning log message in legitimate configurations. We chose Approach 1 based on the above test results. [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1 Ilya Maximets (1): northd: lflow-mgr: Allocate DP reference counters on a second use. Numan Siddique (3): northd: Don't add lr_out_delivery default drop flow for each lrp. northd: Don't add ARP request responder flows for NAT multiple times. northd: Fix lflow ref node's reference counting. northd/en-lr-nat.c | 6 ++++++ northd/en-lr-nat.h | 2 ++ northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++-------------------- northd/northd.c | 43 +++++++++++++++++++++++++++++--------- northd/northd.h | 1 + 5 files changed, 71 insertions(+), 33 deletions(-)