[ovs-dev,3/3] seq: Add support for initial delay before waking up.
diff mbox

Message ID 1474067451-78603-4-git-send-email-jarno@ovn.org
State Superseded
Delegated to: Ben Pfaff
Headers show

Commit Message

Jarno Rajahalme Sept. 16, 2016, 11:10 p.m. UTC
The execution time of 'ovs-ofctl add-flows' with a large number of
flows can be more than halved if revalidators are not running after
each flow mod separately.  This was first suspected when it was found
that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
same command without the '--bundle' option in a scenario where there
is a large set of flows being added and no datapath flows at all.  One
of the differences caused by the '--bundle' option is that the
revalidators are woken up only once, at the end of the whole set of
flow table changes, rather than after each flow table change
individually.

This patch limits the revalidation to run at most 200 times a second
by enforcing a minimum of 5ms delay for flow table change wakeup after
each complete revalidation round.  This is implemented by adding a new
seq_wait_delay() function, that takes a delay parameter, which, when
non-zero, causes the wakeup to be postponed by the given number of
milliseconds from the original seq_wait_delay() call time.  If nothing
happens in, say 6 milliseconds, and then a new flow table change is
signaled, the revalidator threads wake up immediately without any
further delay.  Values smaller than 5 were found to increase the
'ovs-ofctl add-flows' execution time noticeably.

Since the revalidators are not running after each flow mod, the
overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
is reduced roughly by one core on a four core machine.

In testing the 'ovs-ofctl add-flows' execution time is not
significantly improved from this even if the revalidators are not
notified about the flow table changes at all.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
---
 lib/seq.c                     | 50 ++++++++++++++++++++++++++++++-------------
 lib/seq.h                     |  9 ++++++--
 ofproto/ofproto-dpif-upcall.c |  2 +-
 tests/ofproto-dpif.at         |  3 +++
 4 files changed, 46 insertions(+), 18 deletions(-)

Comments

Ben Pfaff Sept. 20, 2016, 5:32 a.m. UTC | #1
On Fri, Sep 16, 2016 at 04:10:51PM -0700, Jarno Rajahalme wrote:
> The execution time of 'ovs-ofctl add-flows' with a large number of
> flows can be more than halved if revalidators are not running after
> each flow mod separately.  This was first suspected when it was found
> that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
> same command without the '--bundle' option in a scenario where there
> is a large set of flows being added and no datapath flows at all.  One
> of the differences caused by the '--bundle' option is that the
> revalidators are woken up only once, at the end of the whole set of
> flow table changes, rather than after each flow table change
> individually.
> 
> This patch limits the revalidation to run at most 200 times a second
> by enforcing a minimum of 5ms delay for flow table change wakeup after
> each complete revalidation round.  This is implemented by adding a new
> seq_wait_delay() function, that takes a delay parameter, which, when
> non-zero, causes the wakeup to be postponed by the given number of
> milliseconds from the original seq_wait_delay() call time.  If nothing
> happens in, say 6 milliseconds, and then a new flow table change is
> signaled, the revalidator threads wake up immediately without any
> further delay.  Values smaller than 5 were found to increase the
> 'ovs-ofctl add-flows' execution time noticeably.
> 
> Since the revalidators are not running after each flow mod, the
> overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
> is reduced roughly by one core on a four core machine.
> 
> In testing the 'ovs-ofctl add-flows' execution time is not
> significantly improved from this even if the revalidators are not
> notified about the flow table changes at all.
> 
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>

I need some help understanding how this works.  Before this commit, a
thread that wanted to wake up if a seq changed would call seq_wait().
If the seq changed, it got awakened immediately because seq_change()
called seq_wake_waiters() which set a latch that woke the thread.  After
this commit, that still works fine with delay==0.  However suppose that
delay==5 instead.  As I see it, the same chain of events will occur
except that seq_wake_waiters() might do nothing because the time hasn't
come yet.  Suppose that nothing ever calls seq_change() again for that
seq.  What wakes up the thread after 5 ms?

Once I understand that, here's a possible nit to look into.  A couple of
places talk about how time_msec() has to be called without holding
seq_mutex because time_msec() calls time_init(), which creates a seq,
which takes seq_mutex.  Maybe this is good enough.  But it also appears
to be easy to eliminate the dependency of seq_create() on seq_mutex.
Here is why it takes seq_mutex:

    ovs_mutex_lock(&seq_mutex);
    seq->value = seq_next++;
    hmap_init(&seq->waiters);
    ovs_mutex_unlock(&seq_mutex);

The first statement is in seq_mutex because seq_next requires it.  This
could be fixed by make seq_next atomic and then using atomic_add()
on it.  The second statement is in seq_mutex only to silence Clang; an
OVS_NO_THREAD_SAFETY_ANALYSIS annotation would also silence it.

Thanks,

Ben.
Jarno Rajahalme Sept. 20, 2016, 6:44 p.m. UTC | #2
Thanks for the review. I sent a v2 with an implementation that just does the right thing (I hope) in the revalidator thread itself.

  Jarno

> On Sep 19, 2016, at 10:32 PM, Ben Pfaff <blp@ovn.org> wrote:
> 
> On Fri, Sep 16, 2016 at 04:10:51PM -0700, Jarno Rajahalme wrote:
>> The execution time of 'ovs-ofctl add-flows' with a large number of
>> flows can be more than halved if revalidators are not running after
>> each flow mod separately.  This was first suspected when it was found
>> that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
>> same command without the '--bundle' option in a scenario where there
>> is a large set of flows being added and no datapath flows at all.  One
>> of the differences caused by the '--bundle' option is that the
>> revalidators are woken up only once, at the end of the whole set of
>> flow table changes, rather than after each flow table change
>> individually.
>> 
>> This patch limits the revalidation to run at most 200 times a second
>> by enforcing a minimum of 5ms delay for flow table change wakeup after
>> each complete revalidation round.  This is implemented by adding a new
>> seq_wait_delay() function, that takes a delay parameter, which, when
>> non-zero, causes the wakeup to be postponed by the given number of
>> milliseconds from the original seq_wait_delay() call time.  If nothing
>> happens in, say 6 milliseconds, and then a new flow table change is
>> signaled, the revalidator threads wake up immediately without any
>> further delay.  Values smaller than 5 were found to increase the
>> 'ovs-ofctl add-flows' execution time noticeably.
>> 
>> Since the revalidators are not running after each flow mod, the
>> overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
>> is reduced roughly by one core on a four core machine.
>> 
>> In testing the 'ovs-ofctl add-flows' execution time is not
>> significantly improved from this even if the revalidators are not
>> notified about the flow table changes at all.
>> 
>> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> 
> I need some help understanding how this works.  Before this commit, a
> thread that wanted to wake up if a seq changed would call seq_wait().
> If the seq changed, it got awakened immediately because seq_change()
> called seq_wake_waiters() which set a latch that woke the thread.  After
> this commit, that still works fine with delay==0.  However suppose that
> delay==5 instead.  As I see it, the same chain of events will occur
> except that seq_wake_waiters() might do nothing because the time hasn't
> come yet.  Suppose that nothing ever calls seq_change() again for that
> seq.  What wakes up the thread after 5 ms?
> 
> Once I understand that, here's a possible nit to look into.  A couple of
> places talk about how time_msec() has to be called without holding
> seq_mutex because time_msec() calls time_init(), which creates a seq,
> which takes seq_mutex.  Maybe this is good enough.  But it also appears
> to be easy to eliminate the dependency of seq_create() on seq_mutex.
> Here is why it takes seq_mutex:
> 
>    ovs_mutex_lock(&seq_mutex);
>    seq->value = seq_next++;
>    hmap_init(&seq->waiters);
>    ovs_mutex_unlock(&seq_mutex);
> 
> The first statement is in seq_mutex because seq_next requires it.  This
> could be fixed by make seq_next atomic and then using atomic_add()
> on it.  The second statement is in seq_mutex only to silence Clang; an
> OVS_NO_THREAD_SAFETY_ANALYSIS annotation would also silence it.
> 
> Thanks,
> 
> Ben.

Patch
diff mbox

diff --git a/lib/seq.c b/lib/seq.c
index 6e2f596..2b64dd3 100644
--- a/lib/seq.c
+++ b/lib/seq.c
@@ -22,9 +22,10 @@ 
 
 #include "coverage.h"
 #include "hash.h"
-#include "openvswitch/hmap.h"
 #include "latch.h"
+#include "openvswitch/hmap.h"
 #include "openvswitch/list.h"
+#include "timeval.h"
 #include "ovs-thread.h"
 #include "poll-loop.h"
 
@@ -45,6 +46,8 @@  struct seq_waiter {
     struct seq_thread *thread OVS_GUARDED;  /* Thread preparing to wait. */
     struct ovs_list list_node OVS_GUARDED;  /* In 'thread->waiters'. */
 
+    long long int delay_until_time OVS_GUARDED;
+
     uint64_t value OVS_GUARDED; /* seq->value we're waiting to change. */
 };
 
@@ -66,7 +69,8 @@  static struct seq_thread *seq_thread_get(void) OVS_REQUIRES(seq_mutex);
 static void seq_thread_exit(void *thread_) OVS_EXCLUDED(seq_mutex);
 static void seq_thread_woke(struct seq_thread *) OVS_REQUIRES(seq_mutex);
 static void seq_waiter_destroy(struct seq_waiter *) OVS_REQUIRES(seq_mutex);
-static void seq_wake_waiters(struct seq *) OVS_REQUIRES(seq_mutex);
+static void seq_wake_waiters(struct seq *, long long int now)
+    OVS_REQUIRES(seq_mutex);
 
 /* Creates and returns a new 'seq' object. */
 struct seq * OVS_EXCLUDED(seq_mutex)
@@ -94,7 +98,7 @@  seq_destroy(struct seq *seq)
      OVS_EXCLUDED(seq_mutex)
 {
     ovs_mutex_lock(&seq_mutex);
-    seq_wake_waiters(seq);
+    seq_wake_waiters(seq, LLONG_MAX);
     hmap_destroy(&seq->waiters);
     free(seq);
     ovs_mutex_unlock(&seq_mutex);
@@ -123,13 +127,13 @@  seq_unlock(void)
 /* Increments 'seq''s sequence number, waking up any threads that are waiting
  * on 'seq'. */
 void
-seq_change_protected(struct seq *seq)
+seq_change_protected_now(struct seq *seq, long long int now)
     OVS_REQUIRES(seq_mutex)
 {
     COVERAGE_INC(seq_change);
 
     seq->value = seq_next++;
-    seq_wake_waiters(seq);
+    seq_wake_waiters(seq, now);
 }
 
 /* Increments 'seq''s sequence number, waking up any threads that are waiting
@@ -138,8 +142,12 @@  void
 seq_change(struct seq *seq)
     OVS_EXCLUDED(seq_mutex)
 {
+    /* time initialization uses a seq, so we must take 'now' before
+     * locking 'seq_mutex'. */
+    long long int now = time_msec();
+
     ovs_mutex_lock(&seq_mutex);
-    seq_change_protected(seq);
+    seq_change_protected_now(seq, now);
     ovs_mutex_unlock(&seq_mutex);
 }
 
@@ -173,8 +181,10 @@  seq_read(const struct seq *seq)
     return value;
 }
 
+/* Find or create a waiter for the current thread. */
 static void
-seq_wait__(struct seq *seq, uint64_t value, const char *where)
+seq_wait__(struct seq *seq, uint64_t value, long long int delay_until_time,
+           const char *where)
     OVS_REQUIRES(seq_mutex)
 {
     unsigned int id = ovsthread_id_self();
@@ -185,8 +195,9 @@  seq_wait__(struct seq *seq, uint64_t value, const char *where)
         if (waiter->ovsthread_id == id) {
             if (waiter->value != value) {
                 /* The current value is different from the value we've already
-                 * waited for, */
-                poll_immediate_wake_at(where);
+                 * waited for.  Wakes immediately if 'waiter->delay_until_time'
+                 * is in the past. */
+                poll_timer_wait_until_at(waiter->delay_until_time, where);
             } else {
                 /* Already waiting on 'value', nothing more to do. */
             }
@@ -201,6 +212,7 @@  seq_wait__(struct seq *seq, uint64_t value, const char *where)
     waiter->value = value;
     waiter->thread = seq_thread_get();
     ovs_list_push_back(&waiter->thread->waiters, &waiter->list_node);
+    waiter->delay_until_time = delay_until_time;
 
     if (!waiter->thread->waiting) {
         latch_wait_at(&waiter->thread->latch, where);
@@ -220,15 +232,21 @@  seq_wait__(struct seq *seq, uint64_t value, const char *where)
  * automatically provide the caller's source file and line number for
  * 'where'.) */
 void
-seq_wait_at(const struct seq *seq_, uint64_t value, const char *where)
+seq_wait_delay_at(const struct seq *seq_, uint64_t value,
+                  unsigned int delay_time, const char *where)
     OVS_EXCLUDED(seq_mutex)
 {
     struct seq *seq = CONST_CAST(struct seq *, seq_);
+    /* time initialization uses a seq, so we must call time_msec() before
+     * locking 'seq_mutex'. */
+    long long int delay_until_time = time_msec() + delay_time;
 
     ovs_mutex_lock(&seq_mutex);
-    if (value == seq->value) {
-        seq_wait__(seq, value, where);
+    if (value == seq->value || delay_time) {
+        seq_wait__(seq, value, delay_until_time, where);
     } else {
+        /* The current value is different from the value we want to wait for
+         * and we do not want to wait. */
         poll_immediate_wake_at(where);
     }
     ovs_mutex_unlock(&seq_mutex);
@@ -316,13 +334,15 @@  seq_waiter_destroy(struct seq_waiter *waiter)
 }
 
 static void
-seq_wake_waiters(struct seq *seq)
+seq_wake_waiters(struct seq *seq, long long int now)
     OVS_REQUIRES(seq_mutex)
 {
     struct seq_waiter *waiter, *next_waiter;
 
     HMAP_FOR_EACH_SAFE (waiter, next_waiter, hmap_node, &seq->waiters) {
-        latch_set(&waiter->thread->latch);
-        seq_waiter_destroy(waiter);
+        if (now >= waiter->delay_until_time) {
+            latch_set(&waiter->thread->latch);
+            seq_waiter_destroy(waiter);
+        }
     }
 }
diff --git a/lib/seq.h b/lib/seq.h
index 221ab9a..a459921 100644
--- a/lib/seq.h
+++ b/lib/seq.h
@@ -121,7 +121,8 @@ 
 struct seq *seq_create(void);
 void seq_destroy(struct seq *);
 void seq_change(struct seq *);
-void seq_change_protected(struct seq *);
+void seq_change_protected_now(struct seq *, long long int now);
+#define seq_change_protected(seq) seq_change_protected_now(seq, LLONG_MAX)
 void seq_lock(void);
 int seq_try_lock(void);
 void seq_unlock(void);
@@ -130,7 +131,11 @@  void seq_unlock(void);
 uint64_t seq_read(const struct seq *);
 uint64_t seq_read_protected(const struct seq *);
 
-void seq_wait_at(const struct seq *, uint64_t value, const char *where);
+void seq_wait_delay_at(const struct seq *, uint64_t value, unsigned int delay,
+                       const char *where);
+#define seq_wait_delay(seq, value, delay)                       \
+  seq_wait_delay_at(seq, value, delay, OVS_SOURCE_LOCATOR)
+#define seq_wait_at(seq, value, where) seq_wait_delay_at(seq, value, 0, where)
 #define seq_wait(seq, value) seq_wait_at(seq, value, OVS_SOURCE_LOCATOR)
 
 /* For poll_block() internal use. */
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index eecea53..2de8e70 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -935,7 +935,7 @@  udpif_revalidator(void *arg)
             }
 
             poll_timer_wait_until(start_time + MIN(ofproto_max_idle, 500));
-            seq_wait(udpif->reval_seq, last_reval_seq);
+            seq_wait_delay(udpif->reval_seq, last_reval_seq, 5);
             latch_wait(&udpif->exit_latch);
             latch_wait(&udpif->pause_latch);
             poll_block();
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index de57efd..2105718 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -4609,6 +4609,9 @@  m4_define([CHECK_CONTINUATION], [dnl
         m4_if([$3], [0], [],
             [AT_CHECK([echo "$actions1" | sed 's/pause/controller(pause)/g' | ovs-ofctl -O OpenFlow13 add-flows br1 -])])
 
+        # Wait for the revalidators to catch up.
+        ovs-appctl revalidator/wait
+
         # Run a packet through the switch.
         AT_CHECK([ovs-appctl netdev-dummy/receive p1 "$flow"], [0], [stdout])