| Message ID | 20260311060533.52598-1-amorenoz@redhat.com |
|---|---|
| Headers | show
Return-Path: <ovs-dev-bounces@openvswitch.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=R6AQq6Uc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4fW0bz4MTlz1xy3 for <incoming@patchwork.ozlabs.org>; Wed, 11 Mar 2026 17:05:50 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 7869E41EF2; Wed, 11 Mar 2026 06:05:47 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id Rs5j_L7gghN9; Wed, 11 Mar 2026 06:05:46 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=<UNKNOWN> DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 2734641E9E Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=R6AQq6Uc Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id 2734641E9E; Wed, 11 Mar 2026 06:05:46 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 00D18C0070; Wed, 11 Mar 2026 06:05:46 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id CA276C003D for <dev@openvswitch.org>; Wed, 11 Mar 2026 06:05:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id B5F518420F for <dev@openvswitch.org>; Wed, 11 Mar 2026 06:05:44 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id B3qxcjQNfaiy for <dev@openvswitch.org>; Wed, 11 Mar 2026 06:05:43 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=amorenoz@redhat.com; receiver=<UNKNOWN> DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org D9D0D8420C Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org D9D0D8420C Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=R6AQq6Uc Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id D9D0D8420C for <dev@openvswitch.org>; Wed, 11 Mar 2026 06:05:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1773209140; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=tzN74fzW+uQM2iP4lra3x9x7xalEm1unWcMi6IxSOSc=; b=R6AQq6Uc3aSrcl64r3PmlqMfHY+i3g0V/3GbO3SPOuo9q+fou0XMqVRzLJP3Awpv0ptDoT DkBFFiCGD43YM4vFVuPuvLaELya0Zh4CDtbd3dbIgoe5CwNv6Cb4g/h2dXnCFlmIRlThYS g+nsuDIm7Th3+HfBqCvKXpILOm2uiEA= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-144-O-ucjJ9xPV652_f3nGxFaw-1; Wed, 11 Mar 2026 02:05:38 -0400 X-MC-Unique: O-ucjJ9xPV652_f3nGxFaw-1 X-Mimecast-MFC-AGG-ID: O-ucjJ9xPV652_f3nGxFaw_1773209138 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EC05A1800283 for <dev@openvswitch.org>; Wed, 11 Mar 2026 06:05:37 +0000 (UTC) Received: from antares.redhat.com (unknown [10.45.224.13]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CA7C1180049D; Wed, 11 Mar 2026 06:05:36 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 11 Mar 2026 07:05:18 +0100 Message-ID: <20260311060533.52598-1-amorenoz@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: KpdLbmPt-QwbTfk6zyZ-6sGMfScBli1yG9hWamN65vA_1773209138 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v2 0/8] netdev-linux: Use event-driven netlink notifications. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: <ovs-dev.openvswitch.org> List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>, <mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe> List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/> List-Post: <mailto:ovs-dev@openvswitch.org> List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help> List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>, <mailto:ovs-dev-request@openvswitch.org?subject=subscribe> From: Adrian Moreno via dev <ovs-dev@openvswitch.org> Reply-To: Adrian Moreno <amorenoz@redhat.com> Content-Type: multipart/mixed; boundary="===============3769603534897317959==" Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" <ovs-dev-bounces@openvswitch.org> |
| Series |
netdev-linux: Use event-driven netlink notifications.
|
expand
|
This series refactors netdev-linux to make more use use of event-driven netlink notifications instead of polling for device state update, significantly improving performance under RTNL lock contention. ## Background The RTNL mutex is used to serialize rtnetlink requests in the Linux kernel. It's widespread use in many network configuration paths make it a problem which the kernel community is well aware of. When there is a lot of network configuration activity, like when lots of interfaces are being created or deleted, specially interfaces that interact with HW resources such as SR-IOV VFs, contention on the RTNL mutex can make rtnetlink requests be quite slow. The impact of RTNL contention on OVS's main thread can be high, making the entire loop take several seconds (even minutes!) to complete, affecting other periodic tasks such as OVSDB updates, OpenFlow flow programming, etc. After analyzing what requests were being sent by OVS, it was observed that most of them came from netdev-linux state checking mechanisms. While some state is cached (such as the MTU or MAC address), the netdev's flags are not, and they are checked very often. On the other hand, Netlink provides a reliable notification mechanism via multicast groups that allows userspace to receive asynchronouse updates when device state changes, and OVS already has some infrastructure for that purpose. ## Approach The series aims to change two aspects of netdev-linux operations: A - Cache the netdev's flags B - Reuse existing rtnetlink event infrastructure in netdev-linux's main loop run to avoid races. In order to accomplish A (commit 2) some refactoring is done first (commit 1). In roder to accomplish B (commit 5) some refactoring is done to enhance existing rtnetlink notifier infrastructure (commits 3-4). Finally, there are some extra consolidation and cleanups (commits 6-8). The following diagram represents the resulting infrastructure: +---------------------------------+ | bridge.c | | | | to call bridge_reconfigure() | | if ifaces changed | +--------------------+------------+ | +-----v-------+ | if-notifier | +-----+-------+ | +-----------------------+ | +------------------------------------------------------------------------------+ +-----+ | route_table | | | +-------------------------------------------+ | | | +---------------------------+ | for route change detection | | | | | | | | | | +--+--+ for link change detection| | family: NETLINK_ROUTE +-+--+ | | | | | | mcast: RTNLGRP_IPV{4,6}_{ROUTE,RULE} | | | | | | +---------------------------+ | all_ns: false | | | | | | +-------------------------------------------+ | | | | +------------------------------------------------------------------------------+ | | | | | | +----------------------------------------------------------------------------+ | | | | netdev_linux | | | | | +----------------------------------------+ | | | | | +-----------------------------+|for address change detection | | | | | | | to update netdev || | | | | | +--+-+ internal (cached) state || family: NETLINK_ROUTE | | | | | | | +-----------------------------+| mcast: RTNLGRP_IPV4_IFADDR, | | | | | | | | RTNLGRP_IPV6_{IFADDR,IFINFO} | | | | | | | | all_ns: true | | | | | | | +-----------+----------------------------+ | | | | | +--------------------------------------------+-------------------------------+ | | | | | | +-v----v-v-------------------+ | | | rtnetlink_notifier.{c,h} | | | | | | | | family: NETLINK_ROUTE | | | | mcast: RTNLGRP_LINK | | | | all_ns: true | | | +----------------------------+ | | | | | | | | +------v------------------------------------------v----------------------------------v--+ | | | nln (netlink_notifier.{h,c}) | | | +---------------------------------------------------------------------------------------+ ## Testing and results In order to test this series, I have written a small script that chruns (deletes and recreates) some ovs ports (veths) in a way an SDN would do. I increased the number of interfaces to churn from 10 to 100 In order to simulate RTNL mutex contention I used delay-kfunc [1] to introduce latency to 'rtnl_lock'. The following table shows the time it takes to complete the test: ============================================================================== N ifaces RTNL Delay(μs) Main (s) Series (s) Delta (%) ------------------------------------------------------------------------------ 10 0 0.275(0.008) 0.234(0.014) -14.9% 10 50 0.269(0.009) 0.249(0.012) -7.3% 10 100 0.278(0.011) 0.266(0.007) -4.6% 10 500 0.423(0.039) 0.395(0.046) -6.7% 10 1000 0.695(0.060) 0.586(0.045) -15.6% 10 5000 1.855(0.099) 1.818(0.041) -2.0% 10 10000 4.361(0.074) 3.106(0.111) -28.8% 20 0 0.485(0.014) 0.424(0.019) -12.6% 20 50 0.478(0.018) 0.472(0.015) -1.3% 20 100 0.504(0.018) 0.493(0.020) -2.3% 20 500 0.716(0.022) 0.678(0.031) -5.3% 20 1000 0.994(0.026) 0.926(0.083) -6.9% 20 5000 3.313(0.133) 2.851(0.039) -13.9% 20 10000 6.803(0.093) 4.875(0.117) -28.3% 30 0 0.716(0.024) 0.645(0.033) -10.0% 30 50 0.723(0.018) 0.692(0.019) -4.2% 30 100 0.744(0.024) 0.745(0.031) +0.1% 30 500 0.981(0.031) 0.997(0.034) +1.6% 30 1000 1.328(0.046) 1.222(0.040) -8.0% 30 5000 4.838(0.059) 3.865(0.079) -20.1% 30 10000 9.146(0.110) 6.653(0.110) -27.3% 40 0 0.974(0.042) 0.864(0.065) -11.3% 40 50 0.963(0.032) 0.961(0.044) -0.2% 40 100 0.997(0.040) 1.004(0.043) +0.7% 40 500 1.397(0.105) 1.359(0.035) -2.7% 40 1000 1.990(0.107) 1.805(0.096) -9.3% 40 5000 7.240(1.751) 4.967(0.587) -31.4% 40 10000 11.657(0.131) 8.289(0.308) -28.9% 50 0 1.340(0.111) 1.253(0.167) -6.5% 50 50 1.410(0.196) 1.274(0.059) -9.7% 50 100 1.411(0.108) 1.329(0.111) -5.8% 50 500 1.788(0.060) 1.779(0.079) -0.5% 50 1000 2.656(0.220) 2.446(0.097) -7.9% 50 5000 11.532(0.132) 8.216(0.094) -28.8% 50 10000 22.685(1.157) 14.098(0.186) -37.8% 60 0 1.760(0.249) 1.738(0.333) -1.3% 60 50 1.945(0.283) 1.851(0.305) -4.8% 60 100 1.777(0.340) 1.613(0.116) -9.2% 60 500 2.525(0.184) 2.330(0.125) -7.7% 60 1000 3.497(0.327) 3.247(0.174) -7.2% 60 5000 14.390(0.172) 10.093(0.138) -29.9% 60 10000 27.980(0.545) 17.383(0.211) -37.9% 80 0 3.977(0.767) 3.632(0.651) -8.7% 80 50 3.550(0.667) 3.294(0.645) -7.2% 80 100 3.854(0.679) 3.182(0.763) -17.4% 80 500 4.571(0.685) 3.998(0.619) -12.5% 80 1000 6.445(0.490) 4.955(0.281) -23.1% 80 5000 27.107(0.331) 17.348(0.197) -36.0% 80 10000 54.738(0.971) 31.525(1.116) -42.4% 100 0 8.509(2.392) 7.452(0.138) -12.4% 100 50 7.730(0.552) 7.278(1.877) -5.8% 100 100 8.084(2.648) 7.342(1.124) -9.2% 100 500 7.543(0.551) 6.851(0.997) -9.2% 100 1000 10.784(0.782) 7.990(0.651) -25.9% 100 5000 36.393(0.626) 25.800(0.363) -29.1% 100 10000 72.916(2.488) 45.648(1.929) -37.4% ============================================================================== Notes about the above results: - Values are shown as "{mean}({std})". - I did not perform any kind of tuning or cpu isolation the test server. - delay-kfunc does not always introduce the exact same delay so there is some source of variance there as well. - Beyond 200 interfaces, limitations of the test script itself make the results rather unreliable. All in all, a pretty consistent improvement is observed which increases with the number of interfaces that we churn and with the amount of external RTNL pressure we add. ## Future work This is part of a larger effort to improve robustness against RTNL contention. I plan to work on more optimizations in future series. [1] https://github.com/xdp-project/bpf-examples/tree/main/delay-kfunc Adrian Moreno (8): netdev_linux: Refactor netdev flag update. netdev-linux: Cache netdev flags. netlink-notifier: Drain socket on overflow. netlink-notifier: Include nsid in callbacks. netdev-linux: Use rtnetlink to update state. netdev-linux: Consolidate RTM_GETLINK parsing. linux-netdev: Check status when reading stats. netdev-linux: Consolidate netlink updates. lib/if-notifier.c | 3 +- lib/netdev-afxdp.c | 2 +- lib/netdev-linux-private.h | 5 +- lib/netdev-linux.c | 416 +++++++++++++++------------------ lib/netdev-linux.h | 1 + lib/netlink-notifier.c | 15 +- lib/netlink-notifier.h | 9 +- lib/netnsid.h | 1 + lib/route-table.c | 12 +- lib/route-table.h | 2 +- lib/rtnetlink.c | 40 +++- lib/rtnetlink.h | 56 ++++- lib/tnl-ports.c | 2 +- tests/system-interface.at | 2 + tests/system-tap.at | 5 +- tests/system-traffic.at | 3 +- tests/test-lib-route-table.c | 9 +- tests/test-netlink-conntrack.c | 9 +- 18 files changed, 338 insertions(+), 254 deletions(-)