From patchwork Fri Sep 16 23:10:51 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jarno Rajahalme <jarno@ovn.org>
X-Patchwork-Id: 671188
X-Patchwork-Delegate: blp@nicira.com
Return-Path: <dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from archives.nicira.com (archives.nicira.com [96.126.127.54])
	by ozlabs.org (Postfix) with ESMTP id 3sbWJ84GS9z9s2Q
	for <incoming@patchwork.ozlabs.org>;
	Sat, 17 Sep 2016 09:11:52 +1000 (AEST)
Received: from archives.nicira.com (localhost [127.0.0.1])
	by archives.nicira.com (Postfix) with ESMTP id 68002109DF;
	Fri, 16 Sep 2016 16:11:34 -0700 (PDT)
X-Original-To: dev@openvswitch.org
Delivered-To: dev@openvswitch.org
Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67])
	by archives.nicira.com (Postfix) with ESMTPS id C600E109D7
	for <dev@openvswitch.org>; Fri, 16 Sep 2016 16:11:32 -0700 (PDT)
Received: from bar5.cudamail.com (localhost [127.0.0.1])
	by mx1e3.cudamail.com (Postfix) with ESMTPS id 35AE1420142
	for <dev@openvswitch.org>; Fri, 16 Sep 2016 17:11:32 -0600 (MDT)
X-ASG-Debug-ID: 1474067491-09eadd353234f190001-byXFYA
Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com
	with ESMTP id JjtHackrwVC4o7Lb (version=TLSv1.2
	cipher=ECDHE-RSA-AES256-GCM-SHA384
	bits=256 verify=NO) for <dev@openvswitch.org>;
	Fri, 16 Sep 2016 17:11:31 -0600 (MDT)
X-Barracuda-Envelope-From: jarno@ovn.org
X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1
Received: from unknown (HELO relay6-d.mail.gandi.net) (217.70.183.198)
	by mx1-pf1.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted);
	16 Sep 2016 23:11:31 -0000
Received-SPF: pass (mx1-pf1.cudamail.com: SPF record at ovn.org designates
	217.70.183.198 as permitted sender)
X-Barracuda-Apparent-Source-IP: 217.70.183.198
X-Barracuda-RBL-IP: 217.70.183.198
Received: from mfilter46-d.gandi.net (mfilter46-d.gandi.net [217.70.178.177])
	by relay6-d.mail.gandi.net (Postfix) with ESMTP id CE7A0FB881;
	Sat, 17 Sep 2016 01:11:29 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mfilter46-d.gandi.net
Received: from relay6-d.mail.gandi.net ([IPv6:::ffff:217.70.183.198])
	by mfilter46-d.gandi.net (mfilter46-d.gandi.net [::ffff:10.0.15.180])
	(amavisd-new, port 10024)
	with ESMTP id gtrQUeVF-YhK; Sat, 17 Sep 2016 01:11:27 +0200 (CEST)
X-Originating-IP: 208.91.1.34
Received: from sc9-mailhost3.vmware.com (unknown [208.91.1.34])
	(Authenticated sender: jarno@ovn.org)
	by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id 016A6FB887;
	Sat, 17 Sep 2016 01:11:26 +0200 (CEST)
X-CudaMail-Envelope-Sender: jarno@ovn.org
From: Jarno Rajahalme <jarno@ovn.org>
To: dev@openvswitch.org
X-CudaMail-Whitelist-To: dev@openvswitch.org
X-CudaMail-MID: CM-E1-915068411
X-CudaMail-DTE: 091616
X-CudaMail-Originating-IP: 217.70.183.198
Date: Fri, 16 Sep 2016 16:10:51 -0700
X-ASG-Orig-Subj: [##CM-E1-915068411##][PATCH 3/3] seq: Add support for
	initial delay before waking up.
Message-Id: <1474067451-78603-4-git-send-email-jarno@ovn.org>
X-Mailer: git-send-email 2.1.4
In-Reply-To: <1474067451-78603-1-git-send-email-jarno@ovn.org>
References: <1474067451-78603-1-git-send-email-jarno@ovn.org>
X-Barracuda-Connect: UNKNOWN[192.168.24.1]
X-Barracuda-Start-Time: 1474067491
X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384
X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi
X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?=
X-Virus-Scanned: by bsmtpd at cudamail.com
X-Barracuda-BRTS-Status: 1
Subject: [ovs-dev] [PATCH 3/3] seq: Add support for initial delay before
	waking up.
X-BeenThere: dev@openvswitch.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: <dev.openvswitch.org>
List-Unsubscribe: <http://openvswitch.org/mailman/options/dev>,
	<mailto:dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://openvswitch.org/pipermail/dev>
List-Post: <mailto:dev@openvswitch.org>
List-Help: <mailto:dev-request@openvswitch.org?subject=help>
List-Subscribe: <http://openvswitch.org/mailman/listinfo/dev>,
	<mailto:dev-request@openvswitch.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: dev-bounces@openvswitch.org
Sender: "dev" <dev-bounces@openvswitch.org>

The execution time of 'ovs-ofctl add-flows' with a large number of
flows can be more than halved if revalidators are not running after
each flow mod separately.  This was first suspected when it was found
that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
same command without the '--bundle' option in a scenario where there
is a large set of flows being added and no datapath flows at all.  One
of the differences caused by the '--bundle' option is that the
revalidators are woken up only once, at the end of the whole set of
flow table changes, rather than after each flow table change
individually.

This patch limits the revalidation to run at most 200 times a second
by enforcing a minimum of 5ms delay for flow table change wakeup after
each complete revalidation round.  This is implemented by adding a new
seq_wait_delay() function, that takes a delay parameter, which, when
non-zero, causes the wakeup to be postponed by the given number of
milliseconds from the original seq_wait_delay() call time.  If nothing
happens in, say 6 milliseconds, and then a new flow table change is
signaled, the revalidator threads wake up immediately without any
further delay.  Values smaller than 5 were found to increase the
'ovs-ofctl add-flows' execution time noticeably.

Since the revalidators are not running after each flow mod, the
overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
is reduced roughly by one core on a four core machine.

In testing the 'ovs-ofctl add-flows' execution time is not
significantly improved from this even if the revalidators are not
notified about the flow table changes at all.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
---
 lib/seq.c                     | 50 ++++++++++++++++++++++++++++++-------------
 lib/seq.h                     |  9 ++++++--
 ofproto/ofproto-dpif-upcall.c |  2 +-
 tests/ofproto-dpif.at         |  3 +++
 4 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/lib/seq.c b/lib/seq.c
index 6e2f596..2b64dd3 100644
--- a/lib/seq.c
+++ b/lib/seq.c
@@ -22,9 +22,10 @@
 
 #include "coverage.h"
 #include "hash.h"
-#include "openvswitch/hmap.h"
 #include "latch.h"
+#include "openvswitch/hmap.h"
 #include "openvswitch/list.h"
+#include "timeval.h"
 #include "ovs-thread.h"
 #include "poll-loop.h"
 
@@ -45,6 +46,8 @@ struct seq_waiter {
     struct seq_thread *thread OVS_GUARDED;  /* Thread preparing to wait. */
     struct ovs_list list_node OVS_GUARDED;  /* In 'thread->waiters'. */
 
+    long long int delay_until_time OVS_GUARDED;
+
     uint64_t value OVS_GUARDED; /* seq->value we're waiting to change. */
 };
 
@@ -66,7 +69,8 @@ static struct seq_thread *seq_thread_get(void) OVS_REQUIRES(seq_mutex);
 static void seq_thread_exit(void *thread_) OVS_EXCLUDED(seq_mutex);
 static void seq_thread_woke(struct seq_thread *) OVS_REQUIRES(seq_mutex);
 static void seq_waiter_destroy(struct seq_waiter *) OVS_REQUIRES(seq_mutex);
-static void seq_wake_waiters(struct seq *) OVS_REQUIRES(seq_mutex);
+static void seq_wake_waiters(struct seq *, long long int now)
+    OVS_REQUIRES(seq_mutex);
 
 /* Creates and returns a new 'seq' object. */
 struct seq * OVS_EXCLUDED(seq_mutex)
@@ -94,7 +98,7 @@ seq_destroy(struct seq *seq)
      OVS_EXCLUDED(seq_mutex)
 {
     ovs_mutex_lock(&seq_mutex);
-    seq_wake_waiters(seq);
+    seq_wake_waiters(seq, LLONG_MAX);
     hmap_destroy(&seq->waiters);
     free(seq);
     ovs_mutex_unlock(&seq_mutex);
@@ -123,13 +127,13 @@ seq_unlock(void)
 /* Increments 'seq''s sequence number, waking up any threads that are waiting
  * on 'seq'. */
 void
-seq_change_protected(struct seq *seq)
+seq_change_protected_now(struct seq *seq, long long int now)
     OVS_REQUIRES(seq_mutex)
 {
     COVERAGE_INC(seq_change);
 
     seq->value = seq_next++;
-    seq_wake_waiters(seq);
+    seq_wake_waiters(seq, now);
 }
 
 /* Increments 'seq''s sequence number, waking up any threads that are waiting
@@ -138,8 +142,12 @@ void
 seq_change(struct seq *seq)
     OVS_EXCLUDED(seq_mutex)
 {
+    /* time initialization uses a seq, so we must take 'now' before
+     * locking 'seq_mutex'. */
+    long long int now = time_msec();
+
     ovs_mutex_lock(&seq_mutex);
-    seq_change_protected(seq);
+    seq_change_protected_now(seq, now);
     ovs_mutex_unlock(&seq_mutex);
 }
 
@@ -173,8 +181,10 @@ seq_read(const struct seq *seq)
     return value;
 }
 
+/* Find or create a waiter for the current thread. */
 static void
-seq_wait__(struct seq *seq, uint64_t value, const char *where)
+seq_wait__(struct seq *seq, uint64_t value, long long int delay_until_time,
+           const char *where)
     OVS_REQUIRES(seq_mutex)
 {
     unsigned int id = ovsthread_id_self();
@@ -185,8 +195,9 @@ seq_wait__(struct seq *seq, uint64_t value, const char *where)
         if (waiter->ovsthread_id == id) {
             if (waiter->value != value) {
                 /* The current value is different from the value we've already
-                 * waited for, */
-                poll_immediate_wake_at(where);
+                 * waited for.  Wakes immediately if 'waiter->delay_until_time'
+                 * is in the past. */
+                poll_timer_wait_until_at(waiter->delay_until_time, where);
             } else {
                 /* Already waiting on 'value', nothing more to do. */
             }
@@ -201,6 +212,7 @@ seq_wait__(struct seq *seq, uint64_t value, const char *where)
     waiter->value = value;
     waiter->thread = seq_thread_get();
     ovs_list_push_back(&waiter->thread->waiters, &waiter->list_node);
+    waiter->delay_until_time = delay_until_time;
 
     if (!waiter->thread->waiting) {
         latch_wait_at(&waiter->thread->latch, where);
@@ -220,15 +232,21 @@ seq_wait__(struct seq *seq, uint64_t value, const char *where)
  * automatically provide the caller's source file and line number for
  * 'where'.) */
 void
-seq_wait_at(const struct seq *seq_, uint64_t value, const char *where)
+seq_wait_delay_at(const struct seq *seq_, uint64_t value,
+                  unsigned int delay_time, const char *where)
     OVS_EXCLUDED(seq_mutex)
 {
     struct seq *seq = CONST_CAST(struct seq *, seq_);
+    /* time initialization uses a seq, so we must call time_msec() before
+     * locking 'seq_mutex'. */
+    long long int delay_until_time = time_msec() + delay_time;
 
     ovs_mutex_lock(&seq_mutex);
-    if (value == seq->value) {
-        seq_wait__(seq, value, where);
+    if (value == seq->value || delay_time) {
+        seq_wait__(seq, value, delay_until_time, where);
     } else {
+        /* The current value is different from the value we want to wait for
+         * and we do not want to wait. */
         poll_immediate_wake_at(where);
     }
     ovs_mutex_unlock(&seq_mutex);
@@ -316,13 +334,15 @@ seq_waiter_destroy(struct seq_waiter *waiter)
 }
 
 static void
-seq_wake_waiters(struct seq *seq)
+seq_wake_waiters(struct seq *seq, long long int now)
     OVS_REQUIRES(seq_mutex)
 {
     struct seq_waiter *waiter, *next_waiter;
 
     HMAP_FOR_EACH_SAFE (waiter, next_waiter, hmap_node, &seq->waiters) {
-        latch_set(&waiter->thread->latch);
-        seq_waiter_destroy(waiter);
+        if (now >= waiter->delay_until_time) {
+            latch_set(&waiter->thread->latch);
+            seq_waiter_destroy(waiter);
+        }
     }
 }
diff --git a/lib/seq.h b/lib/seq.h
index 221ab9a..a459921 100644
--- a/lib/seq.h
+++ b/lib/seq.h
@@ -121,7 +121,8 @@
 struct seq *seq_create(void);
 void seq_destroy(struct seq *);
 void seq_change(struct seq *);
-void seq_change_protected(struct seq *);
+void seq_change_protected_now(struct seq *, long long int now);
+#define seq_change_protected(seq) seq_change_protected_now(seq, LLONG_MAX)
 void seq_lock(void);
 int seq_try_lock(void);
 void seq_unlock(void);
@@ -130,7 +131,11 @@ void seq_unlock(void);
 uint64_t seq_read(const struct seq *);
 uint64_t seq_read_protected(const struct seq *);
 
-void seq_wait_at(const struct seq *, uint64_t value, const char *where);
+void seq_wait_delay_at(const struct seq *, uint64_t value, unsigned int delay,
+                       const char *where);
+#define seq_wait_delay(seq, value, delay)                       \
+  seq_wait_delay_at(seq, value, delay, OVS_SOURCE_LOCATOR)
+#define seq_wait_at(seq, value, where) seq_wait_delay_at(seq, value, 0, where)
 #define seq_wait(seq, value) seq_wait_at(seq, value, OVS_SOURCE_LOCATOR)
 
 /* For poll_block() internal use. */
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index eecea53..2de8e70 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -935,7 +935,7 @@ udpif_revalidator(void *arg)
             }
 
             poll_timer_wait_until(start_time + MIN(ofproto_max_idle, 500));
-            seq_wait(udpif->reval_seq, last_reval_seq);
+            seq_wait_delay(udpif->reval_seq, last_reval_seq, 5);
             latch_wait(&udpif->exit_latch);
             latch_wait(&udpif->pause_latch);
             poll_block();
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index de57efd..2105718 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -4609,6 +4609,9 @@ m4_define([CHECK_CONTINUATION], [dnl
         m4_if([$3], [0], [],
             [AT_CHECK([echo "$actions1" | sed 's/pause/controller(pause)/g' | ovs-ofctl -O OpenFlow13 add-flows br1 -])])
 
+        # Wait for the revalidators to catch up.
+        ovs-appctl revalidator/wait
+
         # Run a packet through the switch.
         AT_CHECK([ovs-appctl netdev-dummy/receive p1 "$flow"], [0], [stdout])