From patchwork Thu Jan 3 12:36:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nitin Katiyar X-Patchwork-Id: 1020276 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=ericsson.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=ericsson.com header.i=@ericsson.com header.b="eK8nD9QW"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=ericsson.com header.i=@ericsson.com header.b="H5aQaTHh"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43VnVQ6P29z9s9h for ; Thu, 3 Jan 2019 23:37:02 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id A2D93B12; Thu, 3 Jan 2019 12:36:59 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id C38BBAF0 for ; Thu, 3 Jan 2019 12:36:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from sesbmg22.ericsson.net (sesbmg22.ericsson.net [193.180.251.48]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id BE19AE6 for ; Thu, 3 Jan 2019 12:36:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=ericsson.com; s=mailgw201801; c=relaxed/relaxed; q=dns/txt; i=@ericsson.com; t=1546519015; x=1549111015; h=From:Sender:Reply-To:Subject:Date:Message-ID:To:CC:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=o2HZ4zuLFoOWTz0M4nfPUfH5KFwnVFf2MIRMDZ/JwTw=; b=eK8nD9QWGp+z5nKub2Xfh0VwT9lM6bICM9g0kD/L7vcZYz6hGxO+KnUziEwlVZ4j NjSxAUn4KwH+5mdg07tAJM7uFX43SaSB2tpKwqI9FEt3EYuWZO+Mcd+rRLfkeI1M +YqYMvxOP4lh+ukzVvgbwHvR5oTJL2Lusy/OGYbp9dQ=; X-AuditID: c1b4fb30-fabff7000000355c-d8-5c2e01e738c6 Received: from ESESBMB504.ericsson.se (Unknown_Domain [153.88.183.117]) by sesbmg22.ericsson.net (Symantec Mail Security) with SMTP id B3.15.13660.7E10E2C5; Thu, 3 Jan 2019 13:36:55 +0100 (CET) Received: from ESESBMR503.ericsson.se (153.88.183.135) by ESESBMB504.ericsson.se (153.88.183.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1466.3; Thu, 3 Jan 2019 13:36:54 +0100 Received: from ESESSMB503.ericsson.se (153.88.183.164) by ESESBMR503.ericsson.se (153.88.183.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1466.3; Thu, 3 Jan 2019 13:36:54 +0100 Received: from EUR02-VE1-obe.outbound.protection.outlook.com (153.88.183.157) by ESESSMB503.ericsson.se (153.88.183.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1466.3 via Frontend Transport; Thu, 3 Jan 2019 13:36:54 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=o2HZ4zuLFoOWTz0M4nfPUfH5KFwnVFf2MIRMDZ/JwTw=; b=H5aQaTHh/1SJ3sTF5+WWWsgrQRSGvlEy+HZwhR4JHd2arxcECOLXiaom18J7DD66sjK0LyOqUe2gvQaLoM1regM5ttOkHpqJqct5DS40aRbkrhMaG29ggWHsbiCCjRVZ/hH+I8J5va0FFo5uXpv6ySjWMlBRO/6dZ8hl18H5ACA= Received: from HE1PR0702MB3626.eurprd07.prod.outlook.com (52.133.6.24) by HE1PR0702MB3691.eurprd07.prod.outlook.com (52.133.6.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1495.6; Thu, 3 Jan 2019 12:36:51 +0000 Received: from HE1PR0702MB3626.eurprd07.prod.outlook.com ([fe80::1574:b839:6817:5995]) by HE1PR0702MB3626.eurprd07.prod.outlook.com ([fe80::1574:b839:6817:5995%3]) with mapi id 15.20.1516.000; Thu, 3 Jan 2019 12:36:51 +0000 From: Nitin Katiyar To: "ovs-dev@openvswitch.org" Thread-Topic: [PATCH v2] Adding support for PMD auto load balancing Thread-Index: AQHUo2EAsTNJDMCvIkO2Do8sg4DdsA== Date: Thu, 3 Jan 2019 12:36:51 +0000 Message-ID: <1546547894-16821-1-git-send-email-nitin.katiyar@ericsson.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [125.16.213.150] x-mailer: git-send-email 1.9.1 x-clientproxiedby: BM1PR0101CA0067.INDPRD01.PROD.OUTLOOK.COM (2603:1096:b00:19::29) To HE1PR0702MB3626.eurprd07.prod.outlook.com (2603:10a6:7:8c::24) authentication-results: spf=none (sender IP is ) smtp.mailfrom=nitin.katiyar@ericsson.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; HE1PR0702MB3691; 6:F/q3uMo3H8bSykKCqNID+MxK1WKNJsD+pF5d8bqDFEplh9JilRlrA9M4rIMhrx4LD9VAyzgqD5CYgZ87TPsKWdJ9RGYtgCX/rB/R25TfHc5C7FrlW04HIPgFiTkU1ucP7VRKAu1lDl61PqxTO3iG+tBtAXJa1nNffeHG1mJqa59BCV8+bWRtO3CIRhcvhodMaqutN+9msr2MEYEWHCoscnAVXgFYkltZoZBfbKWk8OulRQUt8KdAxu50HnT0ypoECj7ZilxPtBH3KkalyxqxGrzCJqgXYmOUuN5Omyez3e7niq+1WPB+Z2YQLKvQe7Z6HyyDbtYf9iS0wzgkc9o69ABXq7MpDlgwA0MIZQmXBvD9sQFnICBWAgDORM32E+Qfn1d6fcGvtPB08CTgy3RFCuM8EFwXoNQZtWMsH8ITLIrG4IaEGO3lB/1HbQ4A4Jst+s84vgp4IWS1J2ge+Q5gZA==; 5:kc67jpRFxCpxin6xySZdHnmhmz4KGisoiDCciCxnR43zGrNYgPFXLJZOJWfPrbHccCvIRQyRsvpnbk0MsG3E4TdxXLj/sBeGkeZQ8BHVgzaKrBcQTE6WjPzfXJnFowSzxURy/37W5RX89Yc243k6sTWl514DL+/956dZoD/G1+f2qjhY+5ed25mL6oQuT5asmBLhwf1F4nT/Jv6vyDRhmg==; 7:nGZoWUsp13aa+ojtsnlYA9iqwSKCuUzycIpC5jztT5rjX6CB/Oih00E9DPu141aXPaZ8VWNwE4AYAsUuwUMQUo684TbY01JjyZoBbmOnzITGjpZC3GybEcRV0i/3RMqlQqFJ6QJP2gS9buYEP1i4oQ== x-ms-office365-filtering-correlation-id: 3ca0f93f-d689-47d1-1d90-08d671782289 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(5600109)(711020)(2017052603328)(7153060)(7193020); SRVR:HE1PR0702MB3691; x-ms-traffictypediagnostic: HE1PR0702MB3691: x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(3230021)(908002)(999002)(5005026)(6040522)(8220060)(2401047)(8121501046)(3002001)(10201501046)(93006095)(93001095)(3231475)(944501520)(52105112)(6041310)(20161123560045)(20161123562045)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051)(76991095); SRVR:HE1PR0702MB3691; BCL:0; PCL:0; RULEID:; SRVR:HE1PR0702MB3691; x-forefront-prvs: 0906E83A25 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(396003)(366004)(376002)(39860400002)(136003)(45074003)(199004)(189003)(6486002)(86362001)(102836004)(54906003)(486006)(44832011)(6436002)(14454004)(316002)(5640700003)(186003)(26005)(105586002)(68736007)(2351001)(478600001)(52116002)(2501003)(53936002)(14444005)(256004)(107886003)(39060400002)(25786009)(4326008)(2906002)(6512007)(55236004)(6506007)(106356001)(53946003)(99286004)(386003)(97736004)(71190400001)(71200400001)(3846002)(6116002)(305945005)(36756003)(476003)(6916009)(7736002)(81166006)(66066001)(81156014)(5660300001)(8936002)(8676002)(4744004)(50226002)(2616005)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0702MB3691; H:HE1PR0702MB3626.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: ericsson.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 2UgDYQhdjIx98bX+CJrEUaazs11Av8Y/6opn3wONkiopCsrJXMuTInBcG3D2JVQAs+iH5KEOfG1djdOu8ByestHHJpY0dHxsDZTFoChucTaRmNvS++OEJqaKN2USXzXtPY4P0e87Z/he/CnrzYiqiFAOfOtZ7eQ7ZppOQrz36oRqkiD/G5Zm97BI4LYfnNMQM8ZQxvBEjFbtyRBXZJ2MJ8GM1HmuLUpEYaWvFbzoQgbzBJG0I/0EIW3PzIdw6iWX/ahrRUFJvek7zbjBRybIdXl/Px7nav/Q3jvbE5I88x6I7XrUwciqWfHToXu1DJuo spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 3ca0f93f-d689-47d1-1d90-08d671782289 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Jan 2019 12:36:51.3152 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 92e84ceb-fbfd-47ab-be52-080c6b87953f X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0702MB3691 X-OriginatorOrg: ericsson.com X-Brightmail-Tracker: H4sIAAAAAAAAA02Sa0hTYRjHe3e2c86Gy9fl5UErcJhS6LxgKBRSBGlJEF0galGnPHhJp+xM 0SyaZlFKJirmhreaEOViJKI2Q214R0uLKRX7IF7GsuZUMsnKnEejb7/nef7/97nw0oSsXuRP p6o0rFrFpMtJiVB3ri07zI4Uygh7qyj2hcVFxdYu2lHsk9U76BCR8EpvoxJmP66hhPlOK3mS OC85mMSmp+aw6vC4y5IUnalBmFXlQrmO2kKBFmm/oGIkpgFHQ7/+p6AYSWgZ7kHwdHxCyAff ESwPmkm3aiMw95ziCwYBjDp7CXcgxGUErDpfbvrLBeBomUW8ZQaBfpFxM4kjoG2snHKzN46B Addbym0g8ACC5iXrhmEHjoP6oW7Ei46A1dVL8qwAV+v99Xb0ersg6KtVuNNSfBwqOqo35Aj7 wo8ho8DNBPaDT9P1An45DI2v3xE8+4Bj6o+IZzmMGQ0Uz7vB0PSBdM8DWIdg1fSM5B+9CM6p mU1DKIxMTG9ebBe8ry9BvKGIgtHK0c0OJ6D7UQnFFz4j6NTN/3N32bfcmaA1FAp4kZUAs3OS KkNR+v9G169vSuC9YDKH8+kE6LPZSJ4DobJkktJvXMALBnXTwgYkeo58OJa7kpEcFaVg1alX OS5TpVCxmma0/m3etKxGtCOH/bAFYRrJPaTWhTClTMTkcHkZFgQ0IfeW+nKhSpk0icm7zqoz L6mz01nOggJoodxP+kvmpZThZEbDXmPZLFa9VRXQYn8tihR7kM1hK579xqUuZmUuqEC3n7oQ jduD73k9XHzQPth8etxzMqTDpB9pDAxJtKTP4bPGoXiXr6n0aE3+2PLNndrkAnFdxR5Jvma4 tDVtbq1422OPA87SMynBw2ll8T23q2uqvgbZYrJy/ZUBv+uKbrm2J8oM4/3HFu7eqPw23CQX cilM5D5CzTF/AboNd+QyAwAA X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Nitin Katiyar Subject: [ovs-dev] [PATCH v2] Adding support for PMD auto load balancing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Port rx queues that have not been statically assigned to PMDs are currently assigned based on periodically sampled load measurements. The assignment is performed at specific instances – port addition, port deletion, upon reassignment request via CLI etc. Due to change in traffic pattern over time it can cause uneven load among the PMDs and thus resulting in lower overall throughout. This patch enables the support of auto load balancing of PMDs based on measured load of RX queues. Each PMD measures the processing load for each of its associated queues every 10 seconds. If the aggregated PMD load reaches 95% for 6 consecutive intervals then PMD considers itself to be overloaded. If any PMD is overloaded, a dry-run of the PMD assignment algorithm is performed by OVS main thread. The dry-run does NOT change the existing queue to PMD assignments. If the resultant mapping of dry-run indicates an improved distribution of the load then the actual reassignment will be performed. The automatic rebalancing will be disabled by default and has to be enabled via configuration option. The interval (in minutes) between two consecutive rebalancing can also be configured via CLI, default is 1 min. Following example commands can be used to set the auto-lb params: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5" Co-authored-by: Rohith Basavaraja Co-authored-by: Venkatesan Pradeep Signed-off-by: Rohith Basavaraja Signed-off-by: Venkatesan Pradeep Signed-off-by: Nitin Katiyar --- lib/dpif-netdev.c | 403 +++++++++++++++++++++++++++++++++++++++++++++++++-- vswitchd/vswitch.xml | 30 ++++ 2 files changed, 424 insertions(+), 9 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1564db9..8db106f 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -80,6 +80,12 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev); +/* Auto Load Balancing Defaults */ +#define ACCEPT_IMPROVE_DEFAULT (25) +#define PMD_LOAD_THRE_DEFAULT (95) +#define PMD_REBALANCE_POLL_INTERVAL 1 /* 1 Min */ +#define MIN_TO_MSEC 60000 + #define FLOW_DUMP_MAX_BATCH 50 /* Use per thread recirc_depth to prevent recirculation loop. */ #define MAX_RECIRC_DEPTH 6 @@ -288,6 +294,13 @@ struct dp_meter { struct dp_meter_band bands[]; }; +struct pmd_auto_lb { + bool auto_lb_conf; /* enable-disable auto load balancing */ + bool is_enabled; /* auto_lb current status */ + uint64_t rebalance_intvl; + uint64_t rebalance_poll_timer; +}; + /* Datapath based on the network device interface from netdev.h. * * @@ -368,6 +381,7 @@ struct dp_netdev { uint64_t last_tnl_conf_seq; struct conntrack conntrack; + struct pmd_auto_lb pmd_alb; }; static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) @@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread { struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and 'tx_ports'. */ /* List of rx queues to poll. */ struct hmap poll_list OVS_GUARDED; + /* Map of 'tx_port's used for transmission. Written by the main thread, * read by the pmd thread. */ struct hmap tx_ports OVS_GUARDED; @@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread { /* Keep track of detailed PMD performance statistics. */ struct pmd_perf_stats perf_stats; + /* Some stats from previous iteration used by automatic pmd + load balance logic. */ + uint64_t prev_stats[PMD_N_STATS]; + atomic_count pmd_overloaded; + /* Set to true if the pmd thread needs to be reloaded. */ bool need_reload; }; @@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, enum rxq_cycles_counter_type type); static void dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx, - unsigned long long cycles); + unsigned long long cycles, + unsigned idx); static uint64_t -dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx); +dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, + unsigned idx); static void dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, bool purge); @@ -3734,6 +3756,51 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op **ops, size_t n_ops, } } +/* Enable/Disable PMD auto load balancing */ +static void +set_pmd_auto_lb(struct dp_netdev *dp) +{ + unsigned int cnt = 0; + struct dp_netdev_pmd_thread *pmd; + struct pmd_auto_lb * pmd_alb = &dp->pmd_alb; + + bool enable = false; + bool pmd_rxq_assign_cyc = dp->pmd_rxq_assign_cyc; + + /* Ensure that there is at least 2 non-isolated PMDs and + * one of the PMD is polling more than one rxq + */ + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { + if (pmd->core_id == NON_PMD_CORE_ID || pmd->isolated) { + continue; + } + + cnt++; + if (hmap_count(&pmd->poll_list) > 1) { + if (enable && (cnt > 1)) { + break; + } else { + enable = true; + } + } + } + + /* Enable auto LB if it is configured and cycle based assignment is true */ + enable = enable && pmd_rxq_assign_cyc && pmd_alb->auto_lb_conf; + + if (pmd_alb->is_enabled != enable) { + pmd_alb->is_enabled = enable; + if (pmd_alb->is_enabled) { + VLOG_INFO("PMD auto lb is enabled, rebalance intvl:%lu(msec)\n", + pmd_alb->rebalance_intvl); + } else { + pmd_alb->rebalance_poll_timer = 0; + VLOG_INFO("PMD auto lb is disabled\n"); + } + } + +} + /* Applies datapath configuration from the database. Some of the changes are * actually applied in dpif_netdev_run(). */ static int @@ -3748,6 +3815,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) DEFAULT_EM_FLOW_INSERT_INV_PROB); uint32_t insert_min, cur_min; uint32_t tx_flush_interval, cur_tx_flush_interval; + uint64_t rebalance_intvl; tx_flush_interval = smap_get_int(other_config, "tx-flush-interval", DEFAULT_TX_FLUSH_INTERVAL); @@ -3819,6 +3887,23 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) pmd_rxq_assign); dp_netdev_request_reconfigure(dp); } + + struct pmd_auto_lb * pmd_alb = &dp->pmd_alb; + pmd_alb->auto_lb_conf = smap_get_bool(other_config, "pmd-auto-lb", + false); + + rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebalance-intvl", + PMD_REBALANCE_POLL_INTERVAL); + + /* Input is in min, convert it to msec */ + rebalance_intvl = + rebalance_intvl ? rebalance_intvl * MIN_TO_MSEC : MIN_TO_MSEC; + + if (pmd_alb->rebalance_intvl != rebalance_intvl) { + pmd_alb->rebalance_intvl = rebalance_intvl; + } + + set_pmd_auto_lb(dp); return 0; } @@ -3974,9 +4059,9 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, static void dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx, - unsigned long long cycles) + unsigned long long cycles, + unsigned idx) { - unsigned int idx = rx->intrvl_idx++ % PMD_RXQ_INTERVAL_MAX; atomic_store_relaxed(&rx->cycles_intrvl[idx], cycles); } @@ -4762,6 +4847,9 @@ reconfigure_datapath(struct dp_netdev *dp) /* Reload affected pmd threads. */ reload_affected_pmds(dp); + + /* Check if PMD Auto LB is to be enabled */ + set_pmd_auto_lb(dp); } /* Returns true if one of the netdevs in 'dp' requires a reconfiguration */ @@ -4780,6 +4868,228 @@ ports_require_restart(const struct dp_netdev *dp) return false; } +/* Function for calculating variance */ +static uint64_t +variance(uint64_t a[], int n) +{ + /* Compute mean (average of elements) */ + uint64_t sum = 0; + uint64_t mean = 0; + uint64_t sqDiff = 0; + + if (!n) { + return 0; + } + + for (int i = 0; i < n; i++) { + sum += a[i]; + } + + if (sum) { + mean = sum / n; + + /* Compute sum squared differences with mean. */ + for (int i = 0; i < n; i++) { + sqDiff += (a[i] - mean)*(a[i] - mean); + } + } + return (sqDiff ? (sqDiff / n) : 0); +} + +static uint64_t +get_dry_run_variance(struct dp_netdev *dp, uint32_t *core_list, uint32_t num) +{ + struct dp_netdev_port *port; + struct dp_netdev_pmd_thread *pmd; + struct dp_netdev_rxq ** rxqs = NULL; + struct rr_numa *numa = NULL; + struct rr_numa_list rr; + int n_rxqs = 0; + uint64_t ret = 0; + uint64_t *pmd_usage; + + pmd_usage = xcalloc(num, sizeof(uint64_t)); + + HMAP_FOR_EACH (port, node, &dp->ports) { + if (!netdev_is_pmd(port->netdev)) { + continue; + } + + for (int qid = 0; qid < port->n_rxq; qid++) { + struct dp_netdev_rxq *q = &port->rxqs[qid]; + uint64_t cycle_hist = 0; + + if (q->pmd->isolated) { + continue; + } + + if (n_rxqs == 0) { + rxqs = xmalloc(sizeof *rxqs); + } else { + rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); + } + + /* Sum the queue intervals and store the cycle history. */ + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { + cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i); + } + /* Do we need to add intrvl_cycles in history?? + * but then we should clear interval cycles also */ + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, + cycle_hist); + /* Store the queue. */ + rxqs[n_rxqs++] = q; + } + } + if (n_rxqs > 1) { + /* Sort the queues in order of the processing cycles + * they consumed during their last pmd interval. */ + qsort(rxqs, n_rxqs, sizeof *rxqs, compare_rxq_cycles); + } + rr_numa_list_populate(dp, &rr); + + for (int i = 0; i < n_rxqs; i++) { + int numa_id = netdev_get_numa_id(rxqs[i]->port->netdev); + numa = rr_numa_list_lookup(&rr, numa_id); + if (!numa) { + /* Don't consider queues across NUMA ???*/ + continue; + } + + pmd = rr_numa_get_pmd(numa, true); + VLOG_DBG("PMD AUTO_LB:Core %d on numa node %d assigned port \'%s\' " + "rx queue %d " + "(measured processing cycles %"PRIu64").", + pmd->core_id, numa_id, + netdev_rxq_get_name(rxqs[i]->rx), + netdev_rxq_get_queue_id(rxqs[i]->rx), + dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_HIST)); + + for (int id = 0; id < num; id++) { + if (pmd->core_id == core_list[id]) { + /* Add the processing cycles of rxq to pmd polling it */ + uint64_t proc_cycles = 0; + for (unsigned idx = 0; idx < PMD_RXQ_INTERVAL_MAX; idx++) { + proc_cycles += dp_netdev_rxq_get_intrvl_cycles(rxqs[i], + idx); + } + pmd_usage[id] += proc_cycles; + } + } + } + + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { + uint64_t total_cycles = 0; + + if ((pmd->core_id == NON_PMD_CORE_ID) || pmd->isolated) { + continue; + } + + /* Get the total pmd cycles for an interval. */ + atomic_read_relaxed(&pmd->intrvl_cycles, &total_cycles); + /* Estimate the cycles to cover all intervals. */ + total_cycles *= PMD_RXQ_INTERVAL_MAX; + for (int id = 0; id < num; id++) { + if (pmd->core_id == core_list[id]) { + if (pmd_usage[id]) { + pmd_usage[id] = (pmd_usage[id] * 100) / total_cycles; + } + VLOG_DBG("Core_id:%d, usage:%"PRIu64"\n", + pmd->core_id, pmd_usage[id]); + } + } + } + ret = variance(pmd_usage, num); + + rr_numa_list_destroy(&rr); + free(rxqs); + free(pmd_usage); + return ret; +} + +static bool +pmd_rebalance_dry_run(struct dp_netdev *dp) +{ + struct dp_netdev_pmd_thread *pmd; + uint64_t *curr_pmd_usage; + + uint64_t curr_variance; + uint64_t new_variance; + uint64_t improvement = 0; + uint32_t num_pmds; + uint32_t *pmd_corelist; + struct rxq_poll *poll, *poll_next; + + num_pmds = cmap_count(&dp->poll_threads); + + if (num_pmds > 1) { + curr_pmd_usage = xcalloc(num_pmds, sizeof(uint64_t)); + pmd_corelist = xcalloc(num_pmds, sizeof(uint32_t)); + } else { + return false; + } + + num_pmds = 0; + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { + uint64_t total_cycles = 0; + uint64_t total_proc = 0; + + if ((pmd->core_id == NON_PMD_CORE_ID) || pmd->isolated) { + continue; + } + + /* Get the total pmd cycles for an interval. */ + atomic_read_relaxed(&pmd->intrvl_cycles, &total_cycles); + /* Estimate the cycles to cover all intervals. */ + total_cycles *= PMD_RXQ_INTERVAL_MAX; + + HMAP_FOR_EACH_SAFE (poll, poll_next, node, &pmd->poll_list) { + uint64_t proc_cycles = 0; + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { + proc_cycles += dp_netdev_rxq_get_intrvl_cycles(poll->rxq, i); + } + total_proc += proc_cycles; + } + if (total_proc) { + curr_pmd_usage[num_pmds] = (total_proc * 100) / total_cycles; + } + + VLOG_DBG("PMD_AUTO_LB_MON curr_pmd_usage(%d):%"PRIu64"", + pmd->core_id, curr_pmd_usage[num_pmds]); + + if (atomic_count_get(&pmd->pmd_overloaded)) { + atomic_count_set(&pmd->pmd_overloaded, 0); + } + + pmd_corelist[num_pmds] = pmd->core_id; + num_pmds++; + } + + curr_variance = variance(curr_pmd_usage, num_pmds); + + new_variance = get_dry_run_variance(dp, pmd_corelist, num_pmds); + VLOG_DBG("PMD_AUTO_LB_MON new variance: %"PRIu64"," + " curr_variance: %"PRIu64"", + new_variance, curr_variance); + + if (new_variance && (new_variance < curr_variance)) { + improvement = + ((curr_variance - new_variance) * 100) / curr_variance; + + VLOG_DBG("PMD_AUTO_LB_MON improvement %"PRIu64"", improvement); + } + + free(curr_pmd_usage); + free(pmd_corelist); + + if (improvement >= ACCEPT_IMPROVE_DEFAULT) { + return true; + } + + return false; +} + + /* Return true if needs to revalidate datapath flows. */ static bool dpif_netdev_run(struct dpif *dpif) @@ -4789,6 +5099,9 @@ dpif_netdev_run(struct dpif *dpif) struct dp_netdev_pmd_thread *non_pmd; uint64_t new_tnl_seq; bool need_to_flush = true; + bool pmd_rebalance = false; + long long int now = time_msec(); + struct dp_netdev_pmd_thread *pmd; ovs_mutex_lock(&dp->port_mutex); non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); @@ -4821,6 +5134,38 @@ dpif_netdev_run(struct dpif *dpif) dp_netdev_pmd_unref(non_pmd); } + struct pmd_auto_lb * pmd_alb = &dp->pmd_alb; + if (pmd_alb->is_enabled) { + if (!pmd_alb->rebalance_poll_timer) { + pmd_alb->rebalance_poll_timer = now; + } else if ((pmd_alb->rebalance_poll_timer + + pmd_alb->rebalance_intvl) < now) { + pmd_alb->rebalance_poll_timer = now; + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { + if (atomic_count_get(&pmd->pmd_overloaded) >= + PMD_RXQ_INTERVAL_MAX) { + pmd_rebalance = true; + break; + } + } + VLOG_DBG("PMD_AUTO_LB_MON periodic check:pmd rebalance:%d", + pmd_rebalance); + + if (pmd_rebalance && + !dp_netdev_is_reconf_required(dp) && + !ports_require_restart(dp) && + pmd_rebalance_dry_run(dp)) { + + ovs_mutex_unlock(&dp->port_mutex); + ovs_mutex_lock(&dp_netdev_mutex); + VLOG_DBG("PMD_AUTO_LB_MON Invoking PMD RECONFIGURE"); + dp_netdev_request_reconfigure(dp); + ovs_mutex_unlock(&dp_netdev_mutex); + ovs_mutex_lock(&dp->port_mutex); + } + } + } + if (dp_netdev_is_reconf_required(dp) || ports_require_restart(dp)) { reconfigure_datapath(dp); } @@ -4979,6 +5324,8 @@ pmd_thread_main(void *f_) reload: pmd_alloc_static_tx_qid(pmd); + atomic_count_init(&pmd->pmd_overloaded, 0); + /* List port/core affinity */ for (i = 0; i < poll_cnt; i++) { VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", @@ -4986,6 +5333,10 @@ reload: netdev_rxq_get_queue_id(poll_list[i].rxq->rx)); /* Reset the rxq current cycles counter. */ dp_netdev_rxq_set_cycles(poll_list[i].rxq, RXQ_CYCLES_PROC_CURR, 0); + + for (unsigned j = 0; j < PMD_RXQ_INTERVAL_MAX; j++) { + dp_netdev_rxq_set_intrvl_cycles(poll_list[i].rxq, 0, j); + } } if (!poll_cnt) { @@ -7188,17 +7539,51 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd, struct polled_queue *poll_list, int poll_cnt) { struct dpcls *cls; + uint64_t tot_idle = 0, tot_proc = 0; + unsigned int idx; + unsigned int pmd_load = 0; if (pmd->ctx.now > pmd->rxq_next_cycle_store) { uint64_t curr_tsc; + struct pmd_auto_lb * pmd_alb = &pmd->dp->pmd_alb; + if (pmd_alb->is_enabled && !pmd->isolated) { + tot_idle = pmd->perf_stats.counters.n[PMD_CYCLES_ITER_IDLE] - + pmd->prev_stats[PMD_CYCLES_ITER_IDLE]; + tot_proc = pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY] - + pmd->prev_stats[PMD_CYCLES_ITER_BUSY]; + + if (tot_proc) { + pmd_load = ((tot_proc * 100) / (tot_idle + tot_proc)); + } + + if (pmd_load >= PMD_LOAD_THRE_DEFAULT) { + atomic_count_inc(&pmd->pmd_overloaded); + + VLOG_DBG("PMD_AUTO_LB_MON PMD OVERLOAD DETECT iter %d", + atomic_count_get(&pmd->pmd_overloaded)); + } else { + atomic_count_set(&pmd->pmd_overloaded, 0); + } + } + + pmd->prev_stats[PMD_CYCLES_ITER_IDLE] = + pmd->perf_stats.counters.n[PMD_CYCLES_ITER_IDLE]; + pmd->prev_stats[PMD_CYCLES_ITER_BUSY] = + pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY]; + /* Get the cycles that were used to process each queue and store. */ for (unsigned i = 0; i < poll_cnt; i++) { - uint64_t rxq_cyc_curr = dp_netdev_rxq_get_cycles(poll_list[i].rxq, - RXQ_CYCLES_PROC_CURR); - dp_netdev_rxq_set_intrvl_cycles(poll_list[i].rxq, rxq_cyc_curr); - dp_netdev_rxq_set_cycles(poll_list[i].rxq, RXQ_CYCLES_PROC_CURR, - 0); + uint64_t rxq_cyc_curr; + struct dp_netdev_rxq *rxq; + + rxq = poll_list[i].rxq; + idx = rxq->intrvl_idx++ % PMD_RXQ_INTERVAL_MAX; + + rxq_cyc_curr = dp_netdev_rxq_get_cycles(rxq, RXQ_CYCLES_PROC_CURR); + dp_netdev_rxq_set_intrvl_cycles(rxq, rxq_cyc_curr, idx); + dp_netdev_rxq_set_cycles(rxq, RXQ_CYCLES_PROC_CURR, 0); } + curr_tsc = cycles_counter_update(&pmd->perf_stats); if (pmd->intrvl_tsc_prev) { /* There is a prev timestamp, store a new intrvl cycle count. */ diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 2160910..ff3589c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -574,6 +574,36 @@ be set to 'skip_sw'.

+ +

+ Configures PMD Auto Load Balancing that allows automatic assignment of + RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles + > 95%). +

+

+ It uses current scheme of cycle based assignment of RX queues that + are not statically pinned to PMDs. +

+

+ Set this value to true to enable this option. +

+

+ The default value is false. +

+

+ This only comes in effect if cycle based assignment is enabled and + there are more than one non-isolated PMDs present and atleast one of + it polls more than one queue. +

+
+ +

+ The minimum time (in minutes) 2 consecutive PMD Auto Load Balancing + iterations. +

+