Message ID | 20210709171906.10755-1-anton.ivanov@cambridgegreys.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] Optimize the poll loop for poll_immediate_wake() | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot | success | github build: passed |
On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote: > From: Anton Ivanov <anton.ivanov@cambridgegreys.com> > > If we are not obtaining any useful information out of the poll(), > such as is a fd busy or not, we do not need to do a poll() if > an immediate_wake() has been requested. > > This cuts out all the pollfd hash additions, forming the poll > arguments and the actual poll() after a call to > poll_immediate_wake() > > Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Thanks for the patch. I think that this will have some undesirable side effects because it avoids calling time_poll() if the wakeup should happen immediately, and time_poll() does some thing that we always want to happen between one main loop and another. In particular the following calls from time_poll() are important: coverage_clear(); coverage_run(); if (*last_wakeup && !thread_is_pmd()) { log_poll_interval(*last_wakeup); } ... if (!time_left) { ovsrcu_quiesce(); } else { ovsrcu_quiesce_start(); } ... if (deadline <= time_msec()) { #ifndef _WIN32 fatal_signal_handler(SIGALRM); #else VLOG_ERR("wake up from WaitForMultipleObjects after deadline"); fatal_signal_handler(SIGTERM); #endif if (retval < 0) { retval = 0; } break; } ... *last_wakeup = time_msec(); refresh_rusage(); Instead of this change, I'd suggest something more like the following, along with the changes you made to suppress the file descriptors if the timeout is already zero: diff --git a/lib/timeval.c b/lib/timeval.c index 193c7bab1781..f080a9999742 100644 --- a/lib/timeval.c +++ b/lib/timeval.c @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, } #ifndef _WIN32 - retval = poll(pollfds, n_pollfds, time_left); + retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0; if (retval < 0) { retval = -errno; }
Good point, I forgot about the rcu. I will redo the patch as per your suggestions and resubmit it next week. A. On 09/07/2021 19:45, Ben Pfaff wrote: > On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote: >> From: Anton Ivanov <anton.ivanov@cambridgegreys.com> >> >> If we are not obtaining any useful information out of the poll(), >> such as is a fd busy or not, we do not need to do a poll() if >> an immediate_wake() has been requested. >> >> This cuts out all the pollfd hash additions, forming the poll >> arguments and the actual poll() after a call to >> poll_immediate_wake() >> >> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> > Thanks for the patch. > > I think that this will have some undesirable side effects because it > avoids calling time_poll() if the wakeup should happen immediately, and > time_poll() does some thing that we always want to happen between one > main loop and another. In particular the following calls from > time_poll() are important: > > coverage_clear(); > coverage_run(); > if (*last_wakeup && !thread_is_pmd()) { > log_poll_interval(*last_wakeup); > } > > ... > > if (!time_left) { > ovsrcu_quiesce(); > } else { > ovsrcu_quiesce_start(); > } > > ... > > if (deadline <= time_msec()) { > #ifndef _WIN32 > fatal_signal_handler(SIGALRM); > #else > VLOG_ERR("wake up from WaitForMultipleObjects after deadline"); > fatal_signal_handler(SIGTERM); > #endif > if (retval < 0) { > retval = 0; > } > break; > } > > ... > > *last_wakeup = time_msec(); > refresh_rusage(); > > Instead of this change, I'd suggest something more like the following, > along with the changes you made to suppress the file descriptors if the > timeout is already zero: > > diff --git a/lib/timeval.c b/lib/timeval.c > index 193c7bab1781..f080a9999742 100644 > --- a/lib/timeval.c > +++ b/lib/timeval.c > @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, > } > > #ifndef _WIN32 > - retval = poll(pollfds, n_pollfds, time_left); > + retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0; > if (retval < 0) { > retval = -errno; > } > >
I ran the revised patch series (v2) which addresses your comments on the ovn-heater benchmark. With 9 fake nodes, 50 fake pods per node I get ~ 2% reproducible improvement. IMHO it should be even more noticeable at high scale. Best regards, A On 09/07/2021 19:45, Ben Pfaff wrote: > On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote: >> From: Anton Ivanov <anton.ivanov@cambridgegreys.com> >> >> If we are not obtaining any useful information out of the poll(), >> such as is a fd busy or not, we do not need to do a poll() if >> an immediate_wake() has been requested. >> >> This cuts out all the pollfd hash additions, forming the poll >> arguments and the actual poll() after a call to >> poll_immediate_wake() >> >> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> > Thanks for the patch. > > I think that this will have some undesirable side effects because it > avoids calling time_poll() if the wakeup should happen immediately, and > time_poll() does some thing that we always want to happen between one > main loop and another. In particular the following calls from > time_poll() are important: > > coverage_clear(); > coverage_run(); > if (*last_wakeup && !thread_is_pmd()) { > log_poll_interval(*last_wakeup); > } > > ... > > if (!time_left) { > ovsrcu_quiesce(); > } else { > ovsrcu_quiesce_start(); > } > > ... > > if (deadline <= time_msec()) { > #ifndef _WIN32 > fatal_signal_handler(SIGALRM); > #else > VLOG_ERR("wake up from WaitForMultipleObjects after deadline"); > fatal_signal_handler(SIGTERM); > #endif > if (retval < 0) { > retval = 0; > } > break; > } > > ... > > *last_wakeup = time_msec(); > refresh_rusage(); > > Instead of this change, I'd suggest something more like the following, > along with the changes you made to suppress the file descriptors if the > timeout is already zero: > > diff --git a/lib/timeval.c b/lib/timeval.c > index 193c7bab1781..f080a9999742 100644 > --- a/lib/timeval.c > +++ b/lib/timeval.c > @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, > } > > #ifndef _WIN32 > - retval = poll(pollfds, n_pollfds, time_left); > + retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0; > if (retval < 0) { > retval = -errno; > } > >
I'm impressed by the improvement. I didn't really expect any. On Tue, Jul 13, 2021 at 09:19:34AM +0100, Anton Ivanov wrote: > I ran the revised patch series (v2) which addresses your comments on the ovn-heater benchmark. > > With 9 fake nodes, 50 fake pods per node I get ~ 2% reproducible improvement. IMHO it should be even more noticeable at high scale. > > Best regards, > > A > > On 09/07/2021 19:45, Ben Pfaff wrote: > > On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.ivanov@cambridgegreys.com wrote: > > > From: Anton Ivanov <anton.ivanov@cambridgegreys.com> > > > > > > If we are not obtaining any useful information out of the poll(), > > > such as is a fd busy or not, we do not need to do a poll() if > > > an immediate_wake() has been requested. > > > > > > This cuts out all the pollfd hash additions, forming the poll > > > arguments and the actual poll() after a call to > > > poll_immediate_wake() > > > > > > Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> > > Thanks for the patch. > > > > I think that this will have some undesirable side effects because it > > avoids calling time_poll() if the wakeup should happen immediately, and > > time_poll() does some thing that we always want to happen between one > > main loop and another. In particular the following calls from > > time_poll() are important: > > > > coverage_clear(); > > coverage_run(); > > if (*last_wakeup && !thread_is_pmd()) { > > log_poll_interval(*last_wakeup); > > } > > > > ... > > > > if (!time_left) { > > ovsrcu_quiesce(); > > } else { > > ovsrcu_quiesce_start(); > > } > > > > ... > > > > if (deadline <= time_msec()) { > > #ifndef _WIN32 > > fatal_signal_handler(SIGALRM); > > #else > > VLOG_ERR("wake up from WaitForMultipleObjects after deadline"); > > fatal_signal_handler(SIGTERM); > > #endif > > if (retval < 0) { > > retval = 0; > > } > > break; > > } > > > > ... > > > > *last_wakeup = time_msec(); > > refresh_rusage(); > > > > Instead of this change, I'd suggest something more like the following, > > along with the changes you made to suppress the file descriptors if the > > timeout is already zero: > > > > diff --git a/lib/timeval.c b/lib/timeval.c > > index 193c7bab1781..f080a9999742 100644 > > --- a/lib/timeval.c > > +++ b/lib/timeval.c > > @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, > > } > > #ifndef _WIN32 > > - retval = poll(pollfds, n_pollfds, time_left); > > + retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0; > > if (retval < 0) { > > retval = -errno; > > } > > > > > -- > Anton R. Ivanov > Cambridgegreys Limited. Registered in England. Company Number 10273661 > https://www.cambridgegreys.com/ >
diff --git a/lib/poll-loop.c b/lib/poll-loop.c index 4e751ff2c..8e418a8ce 100644 --- a/lib/poll-loop.c +++ b/lib/poll-loop.c @@ -53,6 +53,7 @@ struct poll_loop { * wake up immediately, or LLONG_MAX to wait forever. */ long long int timeout_when; /* In msecs as returned by time_msec(). */ const char *timeout_where; /* Where 'timeout_when' was set. */ + bool immediate_wake; }; static struct poll_loop *poll_loop(void); @@ -107,6 +108,13 @@ poll_create_node(int fd, HANDLE wevent, short int events, const char *where) COVERAGE_INC(poll_create_node); + if (loop->immediate_wake) { + /* We have been asked to bail out of this poll loop. + * There is no point to engage in yack shaving a poll hmap. + */ + return; + } + /* Both 'fd' and 'wevent' cannot be set. */ ovs_assert(!fd != !wevent); @@ -181,8 +189,15 @@ poll_wevent_wait_at(HANDLE wevent, const char *where) void poll_timer_wait_at(long long int msec, const char *where) { - long long int now = time_msec(); + long long int now; long long int when; + struct poll_loop *loop = poll_loop(); + + if (loop->immediate_wake) { + return; + } + + now = time_msec(); if (msec <= 0) { /* Wake up immediately. */ @@ -229,7 +244,9 @@ poll_timer_wait_until_at(long long int when, const char *where) void poll_immediate_wake_at(const char *where) { + struct poll_loop *loop = poll_loop(); poll_timer_wait_at(0, where); + loop->immediate_wake = true; } /* Logs, if appropriate, that the poll loop was awakened by an event @@ -320,10 +337,10 @@ poll_block(void) { struct poll_loop *loop = poll_loop(); struct poll_node *node; - struct pollfd *pollfds; + struct pollfd *pollfds = NULL; HANDLE *wevents = NULL; int elapsed; - int retval; + int retval = 0; int i; /* Register fatal signal events before actually doing any real work for @@ -335,34 +352,36 @@ poll_block(void) } timewarp_run(); - pollfds = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *pollfds); + if (!loop->immediate_wake) { + pollfds = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *pollfds); #ifdef _WIN32 - wevents = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *wevents); + wevents = xmalloc(hmap_count(&loop->poll_nodes) * sizeof *wevents); #endif - /* Populate with all the fds and events. */ - i = 0; - HMAP_FOR_EACH (node, hmap_node, &loop->poll_nodes) { - pollfds[i] = node->pollfd; + /* Populate with all the fds and events. */ + i = 0; + HMAP_FOR_EACH (node, hmap_node, &loop->poll_nodes) { + pollfds[i] = node->pollfd; #ifdef _WIN32 - wevents[i] = node->wevent; - if (node->pollfd.fd && node->wevent) { - short int wsa_events = 0; - if (node->pollfd.events & POLLIN) { - wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE; + wevents[i] = node->wevent; + if (node->pollfd.fd && node->wevent) { + short int wsa_events = 0; + if (node->pollfd.events & POLLIN) { + wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE; + } + if (node->pollfd.events & POLLOUT) { + wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE; + } + WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events); } - if (node->pollfd.events & POLLOUT) { - wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE; - } - WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events); - } #endif - i++; - } + i++; + } - retval = time_poll(pollfds, hmap_count(&loop->poll_nodes), wevents, - loop->timeout_when, &elapsed); + retval = time_poll(pollfds, hmap_count(&loop->poll_nodes), wevents, + loop->timeout_when, &elapsed); + } if (retval < 0) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); VLOG_ERR_RL(&rl, "poll: %s", ovs_strerror(-retval)); @@ -381,6 +400,7 @@ poll_block(void) free_poll_nodes(loop); loop->timeout_when = LLONG_MAX; loop->timeout_where = NULL; + loop->immediate_wake = false; free(pollfds); free(wevents); @@ -417,6 +437,7 @@ poll_loop(void) loop = xzalloc(sizeof *loop); loop->timeout_when = LLONG_MAX; hmap_init(&loop->poll_nodes); + loop->immediate_wake = false; xpthread_setspecific(key, loop); } return loop;