diff mbox series

[ovs-dev,v1,6/9] conntrack: Do not schedule zero ms timers

Message ID 397eadf064f25300cc1833f159ce77205c7d9efe.1613557616.git.grive@u256.net
State New
Headers show
Series conntrack: improve multithread scalability | expand

Commit Message

Gaëtan Rivet Feb. 17, 2021, 4:34 p.m. UTC
When ct_sweep() is far behind on its work, the 'next_wake' returned can
be before the moment it started. When it happens, the thread schedules a
zero ms timer that is logged as an error.

Instead, mark the thread for immediate wake in the next poll_block().

Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
---
 lib/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

William Tu Feb. 23, 2021, 10:56 p.m. UTC | #1
On Wed, Feb 17, 2021 at 8:34 AM Gaetan Rivet <grive@u256.net> wrote:
>
> When ct_sweep() is far behind on its work, the 'next_wake' returned can
> be before the moment it started. When it happens, the thread schedules a
> zero ms timer that is logged as an error.
>
> Instead, mark the thread for immediate wake in the next poll_block().
>
> Signed-off-by: Gaetan Rivet <grive@u256.net>
> Reviewed-by: Eli Britstein <elibr@nvidia.com>
> ---

Looks ok to me.
I guess previously we don't want to clean too often, so there is
a minimal CT_CLEAN_MIN_INTERVAL.
With this change, we might end up busy doing ct_sweep() and
hit 100% cpu?



>  lib/conntrack.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 5aad64994..71f79a790 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1628,6 +1628,8 @@ clean_thread_main(void *f_)
>          next_wake = conntrack_clean(ct, now);
>
>          if (next_wake < now) {
> +            poll_immediate_wake();
> +        } else if (next_wake < now + CT_CLEAN_MIN_INTERVAL) {
>              poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL);
>          } else {
>              poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL));
> --
> 2.30.0
>
Gaëtan Rivet Feb. 24, 2021, 12:34 a.m. UTC | #2
On Tue, Feb 23, 2021, at 22:56, William Tu wrote:
> On Wed, Feb 17, 2021 at 8:34 AM Gaetan Rivet <grive@u256.net> wrote:
> >
> > When ct_sweep() is far behind on its work, the 'next_wake' returned can
> > be before the moment it started. When it happens, the thread schedules a
> > zero ms timer that is logged as an error.
> >
> > Instead, mark the thread for immediate wake in the next poll_block().
> >
> > Signed-off-by: Gaetan Rivet <grive@u256.net>
> > Reviewed-by: Eli Britstein <elibr@nvidia.com>
> > ---
> 
> Looks ok to me.
> I guess previously we don't want to clean too often, so there is
> a minimal CT_CLEAN_MIN_INTERVAL.
> With this change, we might end up busy doing ct_sweep() and
> hit 100% cpu?
> 

Yes, this patch and the next will remove CT_CLEAN_MIN_INTERVAL.
The thread will still call poll_block() and quiesce, but it has the potential to hog the core if work is still needed.

Is it an issue? If so instead it could yield for a short time.
The main idea is to avoid waiting for 200 ms, as it is not useful now.

> 
> 
> >  lib/conntrack.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index 5aad64994..71f79a790 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -1628,6 +1628,8 @@ clean_thread_main(void *f_)
> >          next_wake = conntrack_clean(ct, now);
> >
> >          if (next_wake < now) {
> > +            poll_immediate_wake();
> > +        } else if (next_wake < now + CT_CLEAN_MIN_INTERVAL) {
> >              poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL);
> >          } else {
> >              poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL));
> > --
> > 2.30.0
> >
>
William Tu Feb. 24, 2021, 5:02 p.m. UTC | #3
On Tue, Feb 23, 2021 at 4:35 PM Gaëtan Rivet <grive@u256.net> wrote:
>
> On Tue, Feb 23, 2021, at 22:56, William Tu wrote:
> > On Wed, Feb 17, 2021 at 8:34 AM Gaetan Rivet <grive@u256.net> wrote:
> > >
> > > When ct_sweep() is far behind on its work, the 'next_wake' returned can
> > > be before the moment it started. When it happens, the thread schedules a
> > > zero ms timer that is logged as an error.
> > >
> > > Instead, mark the thread for immediate wake in the next poll_block().
> > >
> > > Signed-off-by: Gaetan Rivet <grive@u256.net>
> > > Reviewed-by: Eli Britstein <elibr@nvidia.com>
> > > ---
> >
> > Looks ok to me.
> > I guess previously we don't want to clean too often, so there is
> > a minimal CT_CLEAN_MIN_INTERVAL.
> > With this change, we might end up busy doing ct_sweep() and
> > hit 100% cpu?
> >
>
> Yes, this patch and the next will remove CT_CLEAN_MIN_INTERVAL.
> The thread will still call poll_block() and quiesce, but it has the potential to hog the core if work is still needed.
>
> Is it an issue? If so instead it could yield for a short time.
> The main idea is to avoid waiting for 200 ms, as it is not useful now.
>
I don't have a strong opinion on this.
It's just sometimes customers complain OVS is using too much CPU.
And they start to look at which process/thread is using 100%.
Maybe we can find a balance between a reasonable backlog, accuracy of
timeout, and cpu utilizaion.

William
diff mbox series

Patch

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5aad64994..71f79a790 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1628,6 +1628,8 @@  clean_thread_main(void *f_)
         next_wake = conntrack_clean(ct, now);
 
         if (next_wake < now) {
+            poll_immediate_wake();
+        } else if (next_wake < now + CT_CLEAN_MIN_INTERVAL) {
             poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL);
         } else {
             poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL));