From patchwork Tue Jan 10 13:47:14 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kevin Traynor
X-Patchwork-Id: 1724085
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org
(client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org;
envelope-from=ovs-dev-bounces@openvswitch.org; receiver=)
Authentication-Results: legolas.ozlabs.org;
dkim=fail reason="signature verification failed" (1024-bit key;
unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256
header.s=mimecast20190719 header.b=Z9jICNoU;
dkim-atps=neutral
Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
(No client certificate requested)
by legolas.ozlabs.org (Postfix) with ESMTPS id 4NrsZL3ZfWz23g8
for ; Wed, 11 Jan 2023 00:47:38 +1100 (AEDT)
Received: from localhost (localhost [127.0.0.1])
by smtp3.osuosl.org (Postfix) with ESMTP id 68B7A60FF4;
Tue, 10 Jan 2023 13:47:36 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 68B7A60FF4
Authentication-Results: smtp3.osuosl.org;
dkim=fail reason="signature verification failed" (1024-bit key)
header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256
header.s=mimecast20190719 header.b=Z9jICNoU
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp3.osuosl.org ([127.0.0.1])
by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id 46I-bKhgXo4a; Tue, 10 Jan 2023 13:47:35 +0000 (UTC)
Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56])
by smtp3.osuosl.org (Postfix) with ESMTPS id 6EA4261009;
Tue, 10 Jan 2023 13:47:34 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 6EA4261009
Received: from lf-lists.osuosl.org (localhost [127.0.0.1])
by lists.linuxfoundation.org (Postfix) with ESMTP id 47605C0081;
Tue, 10 Jan 2023 13:47:34 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@lists.linuxfoundation.org
Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138])
by lists.linuxfoundation.org (Postfix) with ESMTP id 9596CC002D
for ; Tue, 10 Jan 2023 13:47:32 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
by smtp1.osuosl.org (Postfix) with ESMTP id 7166581E89
for ; Tue, 10 Jan 2023 13:47:32 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 7166581E89
Authentication-Results: smtp1.osuosl.org;
dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com
header.a=rsa-sha256 header.s=mimecast20190719 header.b=Z9jICNoU
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp1.osuosl.org ([127.0.0.1])
by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id EksmS45-1qQO for ;
Tue, 10 Jan 2023 13:47:31 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org B1E9E81E62
Received: from us-smtp-delivery-124.mimecast.com
(us-smtp-delivery-124.mimecast.com [170.10.129.124])
by smtp1.osuosl.org (Postfix) with ESMTPS id B1E9E81E62
for ; Tue, 10 Jan 2023 13:47:31 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
s=mimecast20190719; t=1673358450;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references;
bh=B3vOC8lQBbvt/BQXAwjjvP8avsswbHaE0tm931L7GLY=;
b=Z9jICNoU2AoU/3ZX6aqm8+8uN+L2OZfMNbOxMyGbnfjas7lvWWX8WRYVOI/O25WocS/Z6V
bXhj5Td55vuLiwgdlyUQyixQroiPhVPJeTbg1hCLXi84o4LbpQ69M6VJRmKe9UnfuSRtsk
VrCezQXlS1PKB90cWAUEn0Td45uCqLQ=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
[66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
(version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
us-mta-327-SsJacTJkObeqEjneTPMNjQ-1; Tue, 10 Jan 2023 08:47:27 -0500
X-MC-Unique: SsJacTJkObeqEjneTPMNjQ-1
Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com
[10.11.54.10])
(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
(No client certificate requested)
by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 32646877CA0;
Tue, 10 Jan 2023 13:47:27 +0000 (UTC)
Received: from rh.redhat.com (unknown [10.39.193.107])
by smtp.corp.redhat.com (Postfix) with ESMTP id 63076492B01;
Tue, 10 Jan 2023 13:47:25 +0000 (UTC)
From: Kevin Traynor
To: dev@openvswitch.org
Date: Tue, 10 Jan 2023 13:47:14 +0000
Message-Id: <20230110134715.99051-2-ktraynor@redhat.com>
In-Reply-To: <20230110134715.99051-1-ktraynor@redhat.com>
References: <20230110134715.99051-1-ktraynor@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Cc: i.maximets@ovn.org, david.marchand@redhat.com
Subject: [ovs-dev] [PATCH v4 1/2] util: Add non quiesce xnanosleep.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: ovs-dev-bounces@openvswitch.org
Sender: "dev"
xnanosleep forces the thread into quiesce state in anticipation that
it will be sleeping for a considerable time and that the thread may
need to quiesce before the sleep is finished.
In some cases, a very short sleep may be requested and in that case
the overhead of going to into quiesce state may be unnecessary.
To allow for those cases add a xnanosleep_no_quiesce() variant.
Suggested-by: Ilya Maximets
Reviewed-by: David Marchand
Signed-off-by: Kevin Traynor
---
lib/util.c | 21 +++++++++++++++++----
lib/util.h | 1 +
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/lib/util.c b/lib/util.c
index 1195c7982..7576eb06e 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -2372,9 +2372,7 @@ xsleep(unsigned int seconds)
}
-/* High resolution sleep. */
-void
-xnanosleep(uint64_t nanoseconds)
+static void
+xnanosleep__(uint64_t nanoseconds)
{
- ovsrcu_quiesce_start();
#ifndef _WIN32
int retval;
@@ -2404,7 +2402,22 @@ xnanosleep(uint64_t nanoseconds)
}
#endif
+}
+
+/* High resolution sleep with thread quiesce. */
+void
+xnanosleep(uint64_t nanoseconds)
+{
+ ovsrcu_quiesce_start();
+ xnanosleep__(nanoseconds);
ovsrcu_quiesce_end();
}
+/* High resolution sleep without thread quiesce. */
+void
+xnanosleep_no_quiesce(uint64_t nanoseconds)
+{
+ xnanosleep__(nanoseconds);
+}
+
/* Determine whether standard output is a tty or not. This is useful to decide
* whether to use color output or not when --color option for utilities is set
diff --git a/lib/util.h b/lib/util.h
index 9ff84b3dc..f35f33021 100644
--- a/lib/util.h
+++ b/lib/util.h
@@ -594,4 +594,5 @@ ovs_u128_is_superset(ovs_u128 super, ovs_u128 sub)
void xsleep(unsigned int seconds);
void xnanosleep(uint64_t nanoseconds);
+void xnanosleep_no_quiesce(uint64_t nanoseconds);
bool is_stdout_a_tty(void);
From patchwork Tue Jan 10 13:47:15 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kevin Traynor
X-Patchwork-Id: 1724087
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org
(client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org;
envelope-from=ovs-dev-bounces@openvswitch.org; receiver=)
Authentication-Results: legolas.ozlabs.org;
dkim=fail reason="signature verification failed" (1024-bit key;
unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256
header.s=mimecast20190719 header.b=cS3EObWP;
dkim-atps=neutral
Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
(No client certificate requested)
by legolas.ozlabs.org (Postfix) with ESMTPS id 4NrsZQ21vXz23gG
for ; Wed, 11 Jan 2023 00:47:42 +1100 (AEDT)
Received: from localhost (localhost [127.0.0.1])
by smtp2.osuosl.org (Postfix) with ESMTP id 959364014B;
Tue, 10 Jan 2023 13:47:40 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 959364014B
Authentication-Results: smtp2.osuosl.org;
dkim=fail reason="signature verification failed" (1024-bit key)
header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256
header.s=mimecast20190719 header.b=cS3EObWP
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp2.osuosl.org ([127.0.0.1])
by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id e_w6rqLVwfsZ; Tue, 10 Jan 2023 13:47:39 +0000 (UTC)
Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56])
by smtp2.osuosl.org (Postfix) with ESMTPS id 20F6E40B38;
Tue, 10 Jan 2023 13:47:38 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 20F6E40B38
Received: from lf-lists.osuosl.org (localhost [127.0.0.1])
by lists.linuxfoundation.org (Postfix) with ESMTP id DFA00C0070;
Tue, 10 Jan 2023 13:47:37 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@lists.linuxfoundation.org
Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138])
by lists.linuxfoundation.org (Postfix) with ESMTP id 3EB6EC0033
for ; Tue, 10 Jan 2023 13:47:36 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
by smtp1.osuosl.org (Postfix) with ESMTP id A6A3781E8D
for ; Tue, 10 Jan 2023 13:47:35 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org A6A3781E8D
Authentication-Results: smtp1.osuosl.org;
dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com
header.a=rsa-sha256 header.s=mimecast20190719 header.b=cS3EObWP
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp1.osuosl.org ([127.0.0.1])
by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id 1Vrys71GEOtX for ;
Tue, 10 Jan 2023 13:47:34 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org C115981E87
Received: from us-smtp-delivery-124.mimecast.com
(us-smtp-delivery-124.mimecast.com [170.10.129.124])
by smtp1.osuosl.org (Postfix) with ESMTPS id C115981E87
for ; Tue, 10 Jan 2023 13:47:33 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
s=mimecast20190719; t=1673358452;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references;
bh=M6H9+smIbJx/vK0xZwJ+iEKT8gX7cz4ad+9Ewp5n/N8=;
b=cS3EObWPuXp3poLfwuUMMQdSOtvoMO570HQAuBDapYpXBPdjL5kedY3q6FPMeM+TIZx1tl
VhidLpZ7q55liNQCKvFDcXAhzuBGk+snj6er5weEONVWvEpPNC7cag+jUuF4i3fOi22MHu
VjsA0q5FTGS1PMcS9Zf2ANC+zzu6hxs=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
[66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
(version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
us-mta-493-sVVjR2XdOJei-N94MoPpaQ-1; Tue, 10 Jan 2023 08:47:29 -0500
X-MC-Unique: sVVjR2XdOJei-N94MoPpaQ-1
Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com
[10.11.54.10])
(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
(No client certificate requested)
by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 61F4685C069;
Tue, 10 Jan 2023 13:47:29 +0000 (UTC)
Received: from rh.redhat.com (unknown [10.39.193.107])
by smtp.corp.redhat.com (Postfix) with ESMTP id 87E11492B00;
Tue, 10 Jan 2023 13:47:27 +0000 (UTC)
From: Kevin Traynor
To: dev@openvswitch.org
Date: Tue, 10 Jan 2023 13:47:15 +0000
Message-Id: <20230110134715.99051-3-ktraynor@redhat.com>
In-Reply-To: <20230110134715.99051-1-ktraynor@redhat.com>
References: <20230110134715.99051-1-ktraynor@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Cc: i.maximets@ovn.org, david.marchand@redhat.com
Subject: [ovs-dev] [PATCH v4 2/2] dpif-netdev: Add PMD load based sleeping.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: ovs-dev-bounces@openvswitch.org
Sender: "dev"
Sleep for an incremental amount of time if none of the Rx queues
assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
on an polling iteration of the PMD.
Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
sleep time to zero (i.e. no sleep).
Sleep time will be increased on each iteration where the low load
conditions remain up to a total of the max sleep time which is set
by the user e.g:
ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500
The default pmd-maxsleep value is 0, which means that no sleeps
will occur and the default behaviour is unchanged from previously.
Also add new stats to pmd-perf-show to get visibility of operation
e.g.
...
- sleep iterations: 153994 ( 76.8 % of iterations)
Sleep time (us): 9159399 ( 46 us/iteration avg.)
...
Signed-off-by: Kevin Traynor
Reviewed-by: Robin Jarry
Reviewed-by: David Marchand
---
Documentation/topics/dpdk/pmd.rst | 54 +++++++++++++++++++++++++
NEWS | 3 ++
lib/dpif-netdev-perf.c | 24 +++++++++---
lib/dpif-netdev-perf.h | 5 ++-
lib/dpif-netdev.c | 65 +++++++++++++++++++++++++++++--
tests/pmd.at | 46 ++++++++++++++++++++++
vswitchd/vswitch.xml | 26 +++++++++++++
7 files changed, 213 insertions(+), 10 deletions(-)
diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
index 9006fd40f..14eba0bcf 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -325,4 +325,58 @@ reassignment due to PMD Auto Load Balance. For example, this could be set
(in min) such that a reassignment is triggered at most every few hours.
+PMD load based sleeping (Experimental)
+--------------------------------------
+
+PMD threads constantly poll Rx queues which are assigned to them. In order to
+reduce the CPU cycles they use, they can sleep for small periods of time
+when there is no load or very-low load on all the Rx queues they poll.
+
+This can be enabled by setting the max requested sleep time (in microseconds)
+for a PMD thread::
+
+ $ ovs-vsctl set open_vswitch . other_config:pmd-maxsleep=500
+
+Non-zero values will be rounded up to the nearest 10 microseconds to avoid
+requesting very small sleep times.
+
+With a non-zero max value a PMD may request to sleep by an incrementing amount
+of time up to the maximum time. If at any point the threshold of at least half
+a batch of packets (i.e. 16) is received from an Rx queue that the PMD is
+polling is met, the requested sleep time will be reset to 0. At that point no
+sleeps will occur until the no/low load conditions return.
+
+Sleeping in a PMD thread will mean there is a period of time when the PMD
+thread will not process packets. Sleep times requested are not guaranteed
+and can differ significantly depending on system configuration. The actual
+time not processing packets will be determined by the sleep and processor
+wake-up times and should be tested with each system configuration.
+
+Sleep time statistics for 10 secs can be seen with::
+
+ $ ovs-appctl dpif-netdev/pmd-stats-clear \
+ && sleep 10 && ovs-appctl dpif-netdev/pmd-perf-show
+
+Example output, showing that during the last 10 seconds, 76.8% of iterations
+had a sleep of some length. The total amount of sleep time was 9.15 seconds and
+the average sleep time per iteration was 46 microseconds::
+
+ - sleep iterations: 153994 ( 76.8 % of iterations)
+ Sleep time: 9159399 us ( 46 us/iteration avg.)
+
+Any potential power saving from PMD load based sleeping is dependent on the
+system configuration (e.g. enabling processor C-states) and workloads.
+
+.. note::
+
+ If there is a sudden spike of packets while the PMD thread is sleeping and
+ the processor is in a low-power state it may result in some lost packets or
+ extra latency before the PMD thread returns to processing packets at full
+ rate.
+
+.. note::
+
+ By default Linux kernel groups timer expirations and this can add an
+ overhead of up to 50 microseconds to a requested timer expiration.
+
.. _ovs-vswitchd(8):
http://openvswitch.org/support/dist-docs/ovs-vswitchd.8.html
diff --git a/NEWS b/NEWS
index 2f6ededfe..9b43a2351 100644
--- a/NEWS
+++ b/NEWS
@@ -31,4 +31,7 @@ Post-v3.0.0
* Add '-secs' argument to appctl 'dpif-netdev/pmd-rxq-show' to show
the pmd usage of an Rx queue over a configurable time period.
+ * Add new experimental PMD load based sleeping feature. PMD threads can
+ request to sleep up to a user configured 'pmd-maxsleep' value under no
+ and low load conditions.
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index a2a7d8f0b..b910629f7 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -231,4 +231,6 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
uint64_t idle_iter = s->pkts.bin[0];
uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
+ uint64_t sleep_iter = stats[PMD_SLEEP_ITER];
+ uint64_t tot_sleep_cycles = stats[PMD_CYCLES_SLEEP];
ds_put_format(str,
@@ -236,11 +238,17 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
" - Used TSC cycles: %12"PRIu64" (%5.1f %% of total cycles)\n"
" - idle iterations: %12"PRIu64" (%5.1f %% of used cycles)\n"
- " - busy iterations: %12"PRIu64" (%5.1f %% of used cycles)\n",
- tot_iter, tot_cycles * us_per_cycle / tot_iter,
+ " - busy iterations: %12"PRIu64" (%5.1f %% of used cycles)\n"
+ " - sleep iterations: %12"PRIu64" (%5.1f %% of iterations)\n"
+ " Sleep time (us): %12.0f (%3.0f us/iteration avg.)\n",
+ tot_iter,
+ (tot_cycles + tot_sleep_cycles) * us_per_cycle / tot_iter,
tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
idle_iter,
100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles,
busy_iter,
- 100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
+ 100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles,
+ sleep_iter, tot_iter ? 100.0 * sleep_iter / tot_iter : 0,
+ tot_sleep_cycles * us_per_cycle,
+ tot_iter ? (tot_sleep_cycles * us_per_cycle) / tot_iter : 0);
if (rx_packets > 0) {
ds_put_format(str,
@@ -519,5 +527,6 @@ OVS_REQUIRES(s->stats_mutex)
void
pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
- int tx_packets, bool full_metrics)
+ int tx_packets, uint64_t sleep_cycles,
+ bool full_metrics)
{
uint64_t now_tsc = cycles_counter_update(s);
@@ -526,5 +535,5 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
char *reason = NULL;
- cycles = now_tsc - s->start_tsc;
+ cycles = now_tsc - s->start_tsc - sleep_cycles;
s->current.timestamp = s->iteration_cnt;
s->current.cycles = cycles;
@@ -540,4 +549,9 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
histogram_add_sample(&s->pkts, rx_packets);
+ if (sleep_cycles) {
+ pmd_perf_update_counter(s, PMD_SLEEP_ITER, 1);
+ pmd_perf_update_counter(s, PMD_CYCLES_SLEEP, sleep_cycles);
+ }
+
if (!full_metrics) {
return;
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 9673dddd8..84beced15 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -81,4 +81,6 @@ enum pmd_stat_type {
PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */
PMD_CYCLES_UPCALL, /* Cycles spent processing upcalls. */
+ PMD_SLEEP_ITER, /* Iterations where a sleep has taken place. */
+ PMD_CYCLES_SLEEP, /* Total cycles slept to save power. */
PMD_N_STATS
};
@@ -409,5 +411,6 @@ pmd_perf_start_iteration(struct pmd_perf_stats *s);
void
pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
- int tx_packets, bool full_metrics);
+ int tx_packets, uint64_t sleep_cycles,
+ bool full_metrics);
/* Formatting the output of commands. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7127068fe..a47d54c6f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -172,4 +172,9 @@ static struct odp_support dp_netdev_support = {
#define PMD_RCU_QUIESCE_INTERVAL 10000LL
+/* Number of pkts Rx on an interface that will stop pmd thread sleeping. */
+#define PMD_SLEEP_THRESH (NETDEV_MAX_BURST / 2)
+/* Time in uS to increment a pmd thread sleep time. */
+#define PMD_SLEEP_INC_US 10
+
struct dpcls {
struct cmap_node node; /* Within dp_netdev_pmd_thread.classifiers */
@@ -280,4 +285,6 @@ struct dp_netdev {
/* Enable collection of PMD performance metrics. */
atomic_bool pmd_perf_metrics;
+ /* Max load based sleep request. */
+ atomic_uint64_t pmd_max_sleep;
/* Enable the SMC cache from ovsdb config */
atomic_bool smc_enable_db;
@@ -4822,6 +4829,8 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
uint8_t cur_rebalance_load;
uint32_t rebalance_load, rebalance_improve;
+ uint64_t pmd_max_sleep, cur_pmd_max_sleep;
bool log_autolb = false;
enum sched_assignment_type pmd_rxq_assign_type;
+ static bool first_set_config = true;
tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
@@ -4970,4 +4979,17 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
set_pmd_auto_lb(dp, autolb_state, log_autolb);
+
+ pmd_max_sleep = smap_get_ullong(other_config, "pmd-maxsleep", 0);
+ pmd_max_sleep = ROUND_UP(pmd_max_sleep, 10);
+ pmd_max_sleep = MIN(PMD_RCU_QUIESCE_INTERVAL, pmd_max_sleep);
+ atomic_read_relaxed(&dp->pmd_max_sleep, &cur_pmd_max_sleep);
+ if (first_set_config || pmd_max_sleep != cur_pmd_max_sleep) {
+ atomic_store_relaxed(&dp->pmd_max_sleep, pmd_max_sleep);
+ VLOG_INFO("PMD max sleep request is %"PRIu64" usecs.", pmd_max_sleep);
+ VLOG_INFO("PMD load based sleeps are %s.",
+ pmd_max_sleep ? "enabled" : "disabled" );
+ }
+
+ first_set_config = false;
return 0;
}
@@ -6930,4 +6952,5 @@ pmd_thread_main(void *f_)
int i;
int process_packets = 0;
+ uint64_t sleep_time = 0;
poll_list = NULL;
@@ -6990,8 +7013,11 @@ reload:
for (;;) {
uint64_t rx_packets = 0, tx_packets = 0;
+ uint64_t time_slept = 0;
+ uint64_t max_sleep;
pmd_perf_start_iteration(s);
atomic_read_relaxed(&pmd->dp->smc_enable_db, &pmd->ctx.smc_enable_db);
+ atomic_read_relaxed(&pmd->dp->pmd_max_sleep, &max_sleep);
for (i = 0; i < poll_cnt; i++) {
@@ -7012,4 +7038,7 @@ reload:
poll_list[i].port_no);
rx_packets += process_packets;
+ if (process_packets >= PMD_SLEEP_THRESH) {
+ sleep_time = 0;
+ }
}
@@ -7019,5 +7048,28 @@ reload:
* There was no time updates on current iteration. */
pmd_thread_ctx_time_update(pmd);
- tx_packets = dp_netdev_pmd_flush_output_packets(pmd, false);
+ tx_packets = dp_netdev_pmd_flush_output_packets(pmd,
+ max_sleep && sleep_time
+ ? true : false);
+ }
+
+ if (max_sleep) {
+ /* Check if a sleep should happen on this iteration. */
+ if (sleep_time) {
+ struct cycle_timer sleep_timer;
+
+ cycle_timer_start(&pmd->perf_stats, &sleep_timer);
+ xnanosleep_no_quiesce(sleep_time * 1000);
+ time_slept = cycle_timer_stop(&pmd->perf_stats, &sleep_timer);
+ pmd_thread_ctx_time_update(pmd);
+ }
+ if (sleep_time < max_sleep) {
+ /* Increase sleep time for next iteration. */
+ sleep_time += PMD_SLEEP_INC_US;
+ } else {
+ sleep_time = max_sleep;
+ }
+ } else {
+ /* Reset sleep time as max sleep policy may have been changed. */
+ sleep_time = 0;
}
@@ -7059,5 +7111,5 @@ reload:
}
- pmd_perf_end_iteration(s, rx_packets, tx_packets,
+ pmd_perf_end_iteration(s, rx_packets, tx_packets, time_slept,
pmd_perf_metrics_enabled(pmd));
}
@@ -9910,5 +9962,5 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
{
struct dpcls *cls;
- uint64_t tot_idle = 0, tot_proc = 0;
+ uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0;
unsigned int pmd_load = 0;
@@ -9927,8 +9979,11 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
tot_proc = pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY] -
pmd->prev_stats[PMD_CYCLES_ITER_BUSY];
+ tot_sleep = pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP] -
+ pmd->prev_stats[PMD_CYCLES_SLEEP];
if (pmd_alb->is_enabled && !pmd->isolated) {
if (tot_proc) {
- pmd_load = ((tot_proc * 100) / (tot_idle + tot_proc));
+ pmd_load = ((tot_proc * 100) /
+ (tot_idle + tot_proc + tot_sleep));
}
@@ -9947,4 +10002,6 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
pmd->prev_stats[PMD_CYCLES_ITER_BUSY] =
pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY];
+ pmd->prev_stats[PMD_CYCLES_SLEEP] =
+ pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP];
/* Get the cycles that were used to process each queue and store. */
diff --git a/tests/pmd.at b/tests/pmd.at
index ed90f88c4..e0f58f7a6 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -1255,2 +1255,48 @@ ovs-appctl: ovs-vswitchd: server returned an error
OVS_VSWITCHD_STOP
AT_CLEANUP
+
+dnl Check default state
+AT_SETUP([PMD - pmd sleep])
+OVS_VSWITCHD_START
+
+dnl Check default
+OVS_WAIT_UNTIL([tail ovs-vswitchd.log | grep "PMD max sleep request is 0 usecs."])
+OVS_WAIT_UNTIL([tail ovs-vswitchd.log | grep "PMD load based sleeps are disabled."])
+
+dnl Check low value max sleep
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="1"])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep request is 10 usecs."])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps are enabled."])
+
+dnl Check high value max sleep
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="10000"])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep request is 10000 usecs."])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps are enabled."])
+
+dnl Check setting max sleep to zero
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="0"])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep request is 0 usecs."])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps are disabled."])
+
+dnl Check above high value max sleep
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="10001"])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep request is 10000 usecs."])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps are enabled."])
+
+dnl Check rounding
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="490"])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep request is 490 usecs."])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps are enabled."])
+dnl Check rounding
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="491"])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep request is 500 usecs."])
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps are enabled."])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index f9bdb2d92..8c4acfb18 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -789,4 +789,30 @@
+
+
+ Specifies the maximum sleep time that will be requested in
+ microseconds per iteration for a PMD thread which has received zero
+ or a small amount of packets from the Rx queues it is polling.
+
+
+ The actual sleep time requested is based on the load
+ of the Rx queues that the PMD polls and may be less than
+ the maximum value.
+
+
+ The default value is 0 microseconds
, which means
+ that the PMD will not sleep regardless of the load from the
+ Rx queues that it polls.
+
+
+ To avoid requesting very small sleeps (e.g. less than 10 us) the
+ value will be rounded up to the nearest 10 us.
+
+
+ The maximum value is 10000 microseconds
.
+
+