diff mbox series

[ovs-dev] Optimize the poll loop for poll_immediate_wake()

Message ID 20210709171906.10755-1-anton.ivanov@cambridgegreys.com
State Changes Requested
Headers show
Series [ovs-dev] Optimize the poll loop for poll_immediate_wake() | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot success github build: passed

Commit Message

Anton Ivanov July 9, 2021, 5:19 p.m. UTC
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

If we are not obtaining any useful information out of the poll(),
such as is a fd busy or not, we do not need to do a poll() if
an immediate_wake() has been requested.

This cuts out all the pollfd hash additions, forming the poll
arguments and the actual poll() after a call to
poll_immediate_wake()

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 lib/poll-loop.c | 67 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 23 deletions(-)

Comments

Ben Pfaff July 9, 2021, 6:45 p.m. UTC | #1
On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote:
> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> 
> If we are not obtaining any useful information out of the poll(),
> such as is a fd busy or not, we do not need to do a poll() if
> an immediate_wake() has been requested.
> 
> This cuts out all the pollfd hash additions, forming the poll
> arguments and the actual poll() after a call to
> poll_immediate_wake()
> 
> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Thanks for the patch.

I think that this will have some undesirable side effects because it
avoids calling time_poll() if the wakeup should happen immediately, and
time_poll() does some thing that we always want to happen between one
main loop and another.  In particular the following calls from
time_poll() are important:

    coverage_clear();
    coverage_run();
    if (*last_wakeup && !thread_is_pmd()) {
        log_poll_interval(*last_wakeup);
    }

...

            if (!time_left) {
                ovsrcu_quiesce();
            } else {
                ovsrcu_quiesce_start();
            }

...

        if (deadline <= time_msec()) {
#ifndef _WIN32
            fatal_signal_handler(SIGALRM);
#else
            VLOG_ERR("wake up from WaitForMultipleObjects after deadline");
            fatal_signal_handler(SIGTERM);
#endif
            if (retval < 0) {
                retval = 0;
            }
            break;
        }

...

    *last_wakeup = time_msec();
    refresh_rusage();

Instead of this change, I'd suggest something more like the following,
along with the changes you made to suppress the file descriptors if the
timeout is already zero:

diff --git a/lib/timeval.c b/lib/timeval.c
index 193c7bab1781..f080a9999742 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED,
         }
 
 #ifndef _WIN32
-        retval = poll(pollfds, n_pollfds, time_left);
+        retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0;
         if (retval < 0) {
             retval = -errno;
         }
Anton Ivanov July 9, 2021, 7:07 p.m. UTC | #2
Good point,

I forgot about the rcu.

I will redo the patch as per your suggestions and resubmit it next week.

A.

On 09/07/2021 19:45, Ben Pfaff wrote:
> On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote:
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> If we are not obtaining any useful information out of the poll(),
>> such as is a fd busy or not, we do not need to do a poll() if
>> an immediate_wake() has been requested.
>>
>> This cuts out all the pollfd hash additions, forming the poll
>> arguments and the actual poll() after a call to
>> poll_immediate_wake()
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Thanks for the patch.
>
> I think that this will have some undesirable side effects because it
> avoids calling time_poll() if the wakeup should happen immediately, and
> time_poll() does some thing that we always want to happen between one
> main loop and another.  In particular the following calls from
> time_poll() are important:
>
>      coverage_clear();
>      coverage_run();
>      if (*last_wakeup && !thread_is_pmd()) {
>          log_poll_interval(*last_wakeup);
>      }
>
> ...
>
>              if (!time_left) {
>                  ovsrcu_quiesce();
>              } else {
>                  ovsrcu_quiesce_start();
>              }
>
> ...
>
>          if (deadline <= time_msec()) {
> #ifndef _WIN32
>              fatal_signal_handler(SIGALRM);
> #else
>              VLOG_ERR("wake up from WaitForMultipleObjects after deadline");
>              fatal_signal_handler(SIGTERM);
> #endif
>              if (retval < 0) {
>                  retval = 0;
>              }
>              break;
>          }
>
> ...
>
>      *last_wakeup = time_msec();
>      refresh_rusage();
>
> Instead of this change, I'd suggest something more like the following,
> along with the changes you made to suppress the file descriptors if the
> timeout is already zero:
>
> diff --git a/lib/timeval.c b/lib/timeval.c
> index 193c7bab1781..f080a9999742 100644
> --- a/lib/timeval.c
> +++ b/lib/timeval.c
> @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED,
>           }
>   
>   #ifndef _WIN32
> -        retval = poll(pollfds, n_pollfds, time_left);
> +        retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0;
>           if (retval < 0) {
>               retval = -errno;
>           }
>
>
Anton Ivanov July 13, 2021, 8:19 a.m. UTC | #3
I ran the revised patch series (v2) which addresses your comments on the ovn-heater benchmark.

With 9 fake nodes, 50 fake pods per node I get ~ 2% reproducible improvement. IMHO it should be even more noticeable at high scale.

Best regards,

A

On 09/07/2021 19:45, Ben Pfaff wrote:
> On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote:
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> If we are not obtaining any useful information out of the poll(),
>> such as is a fd busy or not, we do not need to do a poll() if
>> an immediate_wake() has been requested.
>>
>> This cuts out all the pollfd hash additions, forming the poll
>> arguments and the actual poll() after a call to
>> poll_immediate_wake()
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Thanks for the patch.
>
> I think that this will have some undesirable side effects because it
> avoids calling time_poll() if the wakeup should happen immediately, and
> time_poll() does some thing that we always want to happen between one
> main loop and another.  In particular the following calls from
> time_poll() are important:
>
>      coverage_clear();
>      coverage_run();
>      if (*last_wakeup && !thread_is_pmd()) {
>          log_poll_interval(*last_wakeup);
>      }
>
> ...
>
>              if (!time_left) {
>                  ovsrcu_quiesce();
>              } else {
>                  ovsrcu_quiesce_start();
>              }
>
> ...
>
>          if (deadline <= time_msec()) {
> #ifndef _WIN32
>              fatal_signal_handler(SIGALRM);
> #else
>              VLOG_ERR("wake up from WaitForMultipleObjects after deadline");
>              fatal_signal_handler(SIGTERM);
> #endif
>              if (retval < 0) {
>                  retval = 0;
>              }
>              break;
>          }
>
> ...
>
>      *last_wakeup = time_msec();
>      refresh_rusage();
>
> Instead of this change, I'd suggest something more like the following,
> along with the changes you made to suppress the file descriptors if the
> timeout is already zero:
>
> diff --git a/lib/timeval.c b/lib/timeval.c
> index 193c7bab1781..f080a9999742 100644
> --- a/lib/timeval.c
> +++ b/lib/timeval.c
> @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED,
>           }
>   
>   #ifndef _WIN32
> -        retval = poll(pollfds, n_pollfds, time_left);
> +        retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0;
>           if (retval < 0) {
>               retval = -errno;
>           }
>
>
Ben Pfaff July 16, 2021, midnight UTC | #4
I'm impressed by the improvement.  I didn't really expect any.

On Tue, Jul 13, 2021 at 09:19:34AM +0100, Anton Ivanov wrote:
> I ran the revised patch series (v2) which addresses your comments on the ovn-heater benchmark.
> 
> With 9 fake nodes, 50 fake pods per node I get ~ 2% reproducible improvement. IMHO it should be even more noticeable at high scale.
> 
> Best regards,
> 
> A
> 
> On 09/07/2021 19:45, Ben Pfaff wrote:
> > On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote:
> > > From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> > > 
> > > If we are not obtaining any useful information out of the poll(),
> > > such as is a fd busy or not, we do not need to do a poll() if
> > > an immediate_wake() has been requested.
> > > 
> > > This cuts out all the pollfd hash additions, forming the poll
> > > arguments and the actual poll() after a call to
> > > poll_immediate_wake()
> > > 
> > > Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> > Thanks for the patch.
> > 
> > I think that this will have some undesirable side effects because it
> > avoids calling time_poll() if the wakeup should happen immediately, and
> > time_poll() does some thing that we always want to happen between one
> > main loop and another.  In particular the following calls from
> > time_poll() are important:
> > 
> >      coverage_clear();
> >      coverage_run();
> >      if (*last_wakeup && !thread_is_pmd()) {
> >          log_poll_interval(*last_wakeup);
> >      }
> > 
> > ...
> > 
> >              if (!time_left) {
> >                  ovsrcu_quiesce();
> >              } else {
> >                  ovsrcu_quiesce_start();
> >              }
> > 
> > ...
> > 
> >          if (deadline <= time_msec()) {
> > #ifndef _WIN32
> >              fatal_signal_handler(SIGALRM);
> > #else
> >              VLOG_ERR("wake up from WaitForMultipleObjects after deadline");
> >              fatal_signal_handler(SIGTERM);
> > #endif
> >              if (retval < 0) {
> >                  retval = 0;
> >              }
> >              break;
> >          }
> > 
> > ...
> > 
> >      *last_wakeup = time_msec();
> >      refresh_rusage();
> > 
> > Instead of this change, I'd suggest something more like the following,
> > along with the changes you made to suppress the file descriptors if the
> > timeout is already zero:
> > 
> > diff --git a/lib/timeval.c b/lib/timeval.c
> > index 193c7bab1781..f080a9999742 100644
> > --- a/lib/timeval.c
> > +++ b/lib/timeval.c
> > @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED,
> >           }
> >   #ifndef _WIN32
> > -        retval = poll(pollfds, n_pollfds, time_left);
> > +        retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0;
> >           if (retval < 0) {
> >               retval = -errno;
> >           }
> > 
> > 
> -- 
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661
> https://www.cambridgegreys.com/
>
diff mbox series

Patch

diff --git a/lib/poll-loop.c b/lib/poll-loop.c
index 4e751ff2c..8e418a8ce 100644
--- a/lib/poll-loop.c
+++ b/lib/poll-loop.c
@@ -53,6 +53,7 @@  struct poll_loop {
      * wake up immediately, or LLONG_MAX to wait forever. */
     long long int timeout_when; /* In msecs as returned by time_msec(). */
     const char *timeout_where;  /* Where 'timeout_when' was set. */
+    bool immediate_wake;
 };
 
 static struct poll_loop *poll_loop(void);
@@ -107,6 +108,13 @@  poll_create_node(int fd, HANDLE wevent, short int events, const char *where)
 
     COVERAGE_INC(poll_create_node);
 
+    if (loop->immediate_wake) {
+        /* We have been asked to bail out of this poll loop.
+         * There is no point to engage in yack shaving a poll hmap.
+         */
+        return;
+    }
+
     /* Both 'fd' and 'wevent' cannot be set. */
     ovs_assert(!fd != !wevent);
 
@@ -181,8 +189,15 @@  poll_wevent_wait_at(HANDLE wevent, const char *where)
 void
 poll_timer_wait_at(long long int msec, const char *where)
 {
-    long long int now = time_msec();
+    long long int now;
     long long int when;
+    struct poll_loop *loop = poll_loop();
+
+    if (loop->immediate_wake) {
+        return;
+    }
+
+    now = time_msec();
 
     if (msec <= 0) {
         /* Wake up immediately. */
@@ -229,7 +244,9 @@  poll_timer_wait_until_at(long long int when, const char *where)
 void
 poll_immediate_wake_at(const char *where)
 {
+    struct poll_loop *loop = poll_loop();
     poll_timer_wait_at(0, where);
+    loop->immediate_wake = true;
 }
 
 /* Logs, if appropriate, that the poll loop was awakened by an event
@@ -320,10 +337,10 @@  poll_block(void)
 {
     struct poll_loop *loop = poll_loop();
     struct poll_node *node;
-    struct pollfd *pollfds;
+    struct pollfd *pollfds = NULL;
     HANDLE *wevents = NULL;
     int elapsed;
-    int retval;
+    int retval = 0;
     int i;
 
     /* Register fatal signal events before actually doing any real work for
@@ -335,34 +352,36 @@  poll_block(void)
     }
 
     timewarp_run();
-    pollfds = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *pollfds);
+    if (!loop->immediate_wake) {
+        pollfds = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *pollfds);
 
 #ifdef _WIN32
-    wevents = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *wevents);
+        wevents = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *wevents);
 #endif
 
-    /* Populate with all the fds and events. */
-    i = 0;
-    HMAP_FOR_EACH (node, hmap_node, &loop->poll_nodes) {
-        pollfds[i] = node->pollfd;
+        /* Populate with all the fds and events. */
+        i = 0;
+        HMAP_FOR_EACH (node, hmap_node, &loop->poll_nodes) {
+            pollfds[i] = node->pollfd;
 #ifdef _WIN32
-        wevents[i] = node->wevent;
-        if (node->pollfd.fd && node->wevent) {
-            short int wsa_events = 0;
-            if (node->pollfd.events & POLLIN) {
-                wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE;
+            wevents[i] = node->wevent;
+            if (node->pollfd.fd && node->wevent) {
+                short int wsa_events = 0;
+                if (node->pollfd.events & POLLIN) {
+                    wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE;
+                }
+                if (node->pollfd.events & POLLOUT) {
+                    wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE;
+                }
+                WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events);
             }
-            if (node->pollfd.events & POLLOUT) {
-                wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE;
-            }
-            WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events);
-        }
 #endif
-        i++;
-    }
+            i++;
+        }
 
-    retval = time_poll(pollfds, hmap_count(&loop->poll_nodes), wevents,
-                       loop->timeout_when, &elapsed);
+        retval = time_poll(pollfds, hmap_count(&loop->poll_nodes), wevents,
+                           loop->timeout_when, &elapsed);
+    }
     if (retval < 0) {
         static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
         VLOG_ERR_RL(&rl, "poll: %s", ovs_strerror(-retval));
@@ -381,6 +400,7 @@  poll_block(void)
     free_poll_nodes(loop);
     loop->timeout_when = LLONG_MAX;
     loop->timeout_where = NULL;
+    loop->immediate_wake = false;
     free(pollfds);
     free(wevents);
 
@@ -417,6 +437,7 @@  poll_loop(void)
         loop = xzalloc(sizeof *loop);
         loop->timeout_when = LLONG_MAX;
         hmap_init(&loop->poll_nodes);
+        loop->immediate_wake = false;
         xpthread_setspecific(key, loop);
     }
     return loop;