diff mbox series

[ovs-dev] rcu: reduce RCU block time

Message ID 1544162607-26672-1-git-send-email-jerry.lilijun@huawei.com
State Changes Requested
Headers show
Series [ovs-dev] rcu: reduce RCU block time | expand

Commit Message

Lilijun (Jerry, Cloud Networking) Dec. 7, 2018, 6:03 a.m. UTC
When calling ovsrcu_synchronize(), it always block 1000ms because we poll
block until elapsed time become greater than
warning_threshold(1000).That's too long for some configuration commands.
So this patch reduces warning_threshold's default value to 100 and print
logs after it have elapsed more than 1000ms.

Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
---
 lib/ovs-rcu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Ben Pfaff Dec. 7, 2018, 2:37 p.m. UTC | #1
On Fri, Dec 07, 2018 at 02:03:27PM +0800, Lilijun wrote:
> When calling ovsrcu_synchronize(), it always block 1000ms because we poll
> block until elapsed time become greater than
> warning_threshold(1000).That's too long for some configuration commands.
> So this patch reduces warning_threshold's default value to 100 and print
> logs after it have elapsed more than 1000ms.
> 
> Signed-off-by: Lilijun <jerry.lilijun@huawei.com>

This only makes any sense if the seq_wait() doesn't work.  Do you have
evidence that is the case?
Lilijun (Jerry, Cloud Networking) Dec. 10, 2018, 1:50 a.m. UTC | #2
Hi, Ben

Thanks for your reply.

Do you mean that the function poll_timer_wait_until(start + warning_threshold) call only  wakeup to do some need log messages and ovsrcu_synchronize() will wait for seq_wait()?

According to my understanding, ovsrcu_synchronize() will return until every thread in ovsrcu_threads list have change their seqno to target_seqno.  When some thread call ovsrcu_quiesce(), global_seqno is changed.  Then seq_wait() will check if global_seqno is changed and call poll_immediate_wake_at() to finish the poll_block()'s waiting.

We assume that the first call of seq_wait() in this loop check global_seqno failed for other thread hasn't been ready to call ovsrcu_quiesce(). Then the loop will sleep 1000ms(set by warning_threshold) by poll_block() and check the thread's seqno or global_seqno again when no other threads finish the poll_block()'s waiting in early.

If we set warning_threshold to 100ms, the look will check the seqno more frequently and may break earlier.  I have test it myself, when warning_threshold is set to 100 as default, the loop 's time indeed reduce to 100ms, that's maybe some evidence.



> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org]
> Sent: Friday, December 07, 2018 10:37 PM
> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> 
> On Fri, Dec 07, 2018 at 02:03:27PM +0800, Lilijun wrote:
> > When calling ovsrcu_synchronize(), it always block 1000ms because we
> > poll block until elapsed time become greater than
> > warning_threshold(1000).That's too long for some configuration commands.
> > So this patch reduces warning_threshold's default value to 100 and
> > print logs after it have elapsed more than 1000ms.
> >
> > Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> 
> This only makes any sense if the seq_wait() doesn't work.  Do you have
> evidence that is the case?
Ben Pfaff Dec. 10, 2018, 2:08 a.m. UTC | #3
If that's the case, then there's something wrong with seq_wait() or the
use of it in this context.  The timer logic here is just to issue log
messages.

It would be better to figure out the real problem, because that would
reduce the 100 ms (or 1000 ms) latency to near-zero.

On Mon, Dec 10, 2018 at 01:50:41AM +0000, Lilijun (Jerry, Cloud Networking) wrote:
> Hi, Ben
> 
> Thanks for your reply.
> 
> Do you mean that the function poll_timer_wait_until(start + warning_threshold) call only  wakeup to do some need log messages and ovsrcu_synchronize() will wait for seq_wait()?
> 
> According to my understanding, ovsrcu_synchronize() will return until every thread in ovsrcu_threads list have change their seqno to target_seqno.  When some thread call ovsrcu_quiesce(), global_seqno is changed.  Then seq_wait() will check if global_seqno is changed and call poll_immediate_wake_at() to finish the poll_block()'s waiting.
> 
> We assume that the first call of seq_wait() in this loop check global_seqno failed for other thread hasn't been ready to call ovsrcu_quiesce(). Then the loop will sleep 1000ms(set by warning_threshold) by poll_block() and check the thread's seqno or global_seqno again when no other threads finish the poll_block()'s waiting in early.
> 
> If we set warning_threshold to 100ms, the look will check the seqno more frequently and may break earlier.  I have test it myself, when warning_threshold is set to 100 as default, the loop 's time indeed reduce to 100ms, that's maybe some evidence.
> 
> 
> 
> > -----Original Message-----
> > From: Ben Pfaff [mailto:blp@ovn.org]
> > Sent: Friday, December 07, 2018 10:37 PM
> > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> > Cc: dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> > 
> > On Fri, Dec 07, 2018 at 02:03:27PM +0800, Lilijun wrote:
> > > When calling ovsrcu_synchronize(), it always block 1000ms because we
> > > poll block until elapsed time become greater than
> > > warning_threshold(1000).That's too long for some configuration commands.
> > > So this patch reduces warning_threshold's default value to 100 and
> > > print logs after it have elapsed more than 1000ms.
> > >
> > > Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> > 
> > This only makes any sense if the seq_wait() doesn't work.  Do you have
> > evidence that is the case?
Lilijun (Jerry, Cloud Networking) Dec. 10, 2018, 7:30 a.m. UTC | #4
Hi Ben,

Do you mean that there some threads in in ovsrcu_threads list were wrongly using seq_wait or rcu functions?

Thanks, I will do some more research on this problem.

> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org]
> Sent: Monday, December 10, 2018 10:08 AM
> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> 
> If that's the case, then there's something wrong with seq_wait() or the use
> of it in this context.  The timer logic here is just to issue log messages.
> 
> It would be better to figure out the real problem, because that would reduce
> the 100 ms (or 1000 ms) latency to near-zero.
> 
> On Mon, Dec 10, 2018 at 01:50:41AM +0000, Lilijun (Jerry, Cloud Networking)
> wrote:
> > Hi, Ben
> >
> > Thanks for your reply.
> >
> > Do you mean that the function poll_timer_wait_until(start +
> warning_threshold) call only  wakeup to do some need log messages and
> ovsrcu_synchronize() will wait for seq_wait()?
> >
> > According to my understanding, ovsrcu_synchronize() will return until
> every thread in ovsrcu_threads list have change their seqno to target_seqno.
> When some thread call ovsrcu_quiesce(), global_seqno is changed.  Then
> seq_wait() will check if global_seqno is changed and call
> poll_immediate_wake_at() to finish the poll_block()'s waiting.
> >
> > We assume that the first call of seq_wait() in this loop check global_seqno
> failed for other thread hasn't been ready to call ovsrcu_quiesce(). Then the
> loop will sleep 1000ms(set by warning_threshold) by poll_block() and check
> the thread's seqno or global_seqno again when no other threads finish the
> poll_block()'s waiting in early.
> >
> > If we set warning_threshold to 100ms, the look will check the seqno more
> frequently and may break earlier.  I have test it myself, when
> warning_threshold is set to 100 as default, the loop 's time indeed reduce to
> 100ms, that's maybe some evidence.
> >
> >
> >
> > > -----Original Message-----
> > > From: Ben Pfaff [mailto:blp@ovn.org]
> > > Sent: Friday, December 07, 2018 10:37 PM
> > > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> > > Cc: dev@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> > >
> > > On Fri, Dec 07, 2018 at 02:03:27PM +0800, Lilijun wrote:
> > > > When calling ovsrcu_synchronize(), it always block 1000ms because
> > > > we poll block until elapsed time become greater than
> > > > warning_threshold(1000).That's too long for some configuration
> commands.
> > > > So this patch reduces warning_threshold's default value to 100 and
> > > > print logs after it have elapsed more than 1000ms.
> > > >
> > > > Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> > >
> > > This only makes any sense if the seq_wait() doesn't work.  Do you
> > > have evidence that is the case?
Ben Pfaff Dec. 10, 2018, 3:49 p.m. UTC | #5
I mean that the seq_wait() call should cause ovsrcu_synchronize() to
wake up immediately when it can return.  If it doesn't, something is
wrong.

On Mon, Dec 10, 2018 at 07:30:09AM +0000, Lilijun (Jerry, Cloud Networking) wrote:
> Hi Ben,
> 
> Do you mean that there some threads in in ovsrcu_threads list were wrongly using seq_wait or rcu functions?
> 
> Thanks, I will do some more research on this problem.
> 
> > -----Original Message-----
> > From: Ben Pfaff [mailto:blp@ovn.org]
> > Sent: Monday, December 10, 2018 10:08 AM
> > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> > Cc: dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> > 
> > If that's the case, then there's something wrong with seq_wait() or the use
> > of it in this context.  The timer logic here is just to issue log messages.
> > 
> > It would be better to figure out the real problem, because that would reduce
> > the 100 ms (or 1000 ms) latency to near-zero.
> > 
> > On Mon, Dec 10, 2018 at 01:50:41AM +0000, Lilijun (Jerry, Cloud Networking)
> > wrote:
> > > Hi, Ben
> > >
> > > Thanks for your reply.
> > >
> > > Do you mean that the function poll_timer_wait_until(start +
> > warning_threshold) call only  wakeup to do some need log messages and
> > ovsrcu_synchronize() will wait for seq_wait()?
> > >
> > > According to my understanding, ovsrcu_synchronize() will return until
> > every thread in ovsrcu_threads list have change their seqno to target_seqno.
> > When some thread call ovsrcu_quiesce(), global_seqno is changed.  Then
> > seq_wait() will check if global_seqno is changed and call
> > poll_immediate_wake_at() to finish the poll_block()'s waiting.
> > >
> > > We assume that the first call of seq_wait() in this loop check global_seqno
> > failed for other thread hasn't been ready to call ovsrcu_quiesce(). Then the
> > loop will sleep 1000ms(set by warning_threshold) by poll_block() and check
> > the thread's seqno or global_seqno again when no other threads finish the
> > poll_block()'s waiting in early.
> > >
> > > If we set warning_threshold to 100ms, the look will check the seqno more
> > frequently and may break earlier.  I have test it myself, when
> > warning_threshold is set to 100 as default, the loop 's time indeed reduce to
> > 100ms, that's maybe some evidence.
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Ben Pfaff [mailto:blp@ovn.org]
> > > > Sent: Friday, December 07, 2018 10:37 PM
> > > > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> > > > Cc: dev@openvswitch.org
> > > > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> > > >
> > > > On Fri, Dec 07, 2018 at 02:03:27PM +0800, Lilijun wrote:
> > > > > When calling ovsrcu_synchronize(), it always block 1000ms because
> > > > > we poll block until elapsed time become greater than
> > > > > warning_threshold(1000).That's too long for some configuration
> > commands.
> > > > > So this patch reduces warning_threshold's default value to 100 and
> > > > > print logs after it have elapsed more than 1000ms.
> > > > >
> > > > > Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> > > >
> > > > This only makes any sense if the seq_wait() doesn't work.  Do you
> > > > have evidence that is the case?
Lilijun (Jerry, Cloud Networking) Dec. 13, 2018, 7:07 a.m. UTC | #6
Hi, Ben

You're right, thanks for your help.

I have got the cause of this problem:   there is some mistake in my code that causes the current thread's waiter can't be inserted into &waiter->thread->waiters in the function seq_wait__(). So it can't be woke up when other threads have changed the global_seqno.

> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org]
> Sent: Monday, December 10, 2018 11:49 PM
> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> 
> I mean that the seq_wait() call should cause ovsrcu_synchronize() to wake
> up immediately when it can return.  If it doesn't, something is wrong.
> 
> On Mon, Dec 10, 2018 at 07:30:09AM +0000, Lilijun (Jerry, Cloud Networking)
> wrote:
> > Hi Ben,
> >
> > Do you mean that there some threads in in ovsrcu_threads list were
> wrongly using seq_wait or rcu functions?
> >
> > Thanks, I will do some more research on this problem.
> >
> > > -----Original Message-----
> > > From: Ben Pfaff [mailto:blp@ovn.org]
> > > Sent: Monday, December 10, 2018 10:08 AM
> > > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> > > Cc: dev@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> > >
> > > If that's the case, then there's something wrong with seq_wait() or
> > > the use of it in this context.  The timer logic here is just to issue log
> messages.
> > >
> > > It would be better to figure out the real problem, because that
> > > would reduce the 100 ms (or 1000 ms) latency to near-zero.
> > >
> > > On Mon, Dec 10, 2018 at 01:50:41AM +0000, Lilijun (Jerry, Cloud
> > > Networking)
> > > wrote:
> > > > Hi, Ben
> > > >
> > > > Thanks for your reply.
> > > >
> > > > Do you mean that the function poll_timer_wait_until(start +
> > > warning_threshold) call only  wakeup to do some need log messages
> > > and
> > > ovsrcu_synchronize() will wait for seq_wait()?
> > > >
> > > > According to my understanding, ovsrcu_synchronize() will return
> > > > until
> > > every thread in ovsrcu_threads list have change their seqno to
> target_seqno.
> > > When some thread call ovsrcu_quiesce(), global_seqno is changed.
> > > Then
> > > seq_wait() will check if global_seqno is changed and call
> > > poll_immediate_wake_at() to finish the poll_block()'s waiting.
> > > >
> > > > We assume that the first call of seq_wait() in this loop check
> > > > global_seqno
> > > failed for other thread hasn't been ready to call ovsrcu_quiesce().
> > > Then the loop will sleep 1000ms(set by warning_threshold) by
> > > poll_block() and check the thread's seqno or global_seqno again when
> > > no other threads finish the poll_block()'s waiting in early.
> > > >
> > > > If we set warning_threshold to 100ms, the look will check the
> > > > seqno more
> > > frequently and may break earlier.  I have test it myself, when
> > > warning_threshold is set to 100 as default, the loop 's time indeed
> > > reduce to 100ms, that's maybe some evidence.
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Ben Pfaff [mailto:blp@ovn.org]
> > > > > Sent: Friday, December 07, 2018 10:37 PM
> > > > > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>
> > > > > Cc: dev@openvswitch.org
> > > > > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block
> > > > > time
> > > > >
> > > > > On Fri, Dec 07, 2018 at 02:03:27PM +0800, Lilijun wrote:
> > > > > > When calling ovsrcu_synchronize(), it always block 1000ms
> > > > > > because we poll block until elapsed time become greater than
> > > > > > warning_threshold(1000).That's too long for some configuration
> > > commands.
> > > > > > So this patch reduces warning_threshold's default value to 100
> > > > > > and print logs after it have elapsed more than 1000ms.
> > > > > >
> > > > > > Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> > > > >
> > > > > This only makes any sense if the seq_wait() doesn't work.  Do
> > > > > you have evidence that is the case?
diff mbox series

Patch

diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index ebc8120..af52cc5 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -191,7 +191,7 @@  ovsrcu_is_quiescent(void)
 void
 ovsrcu_synchronize(void)
 {
-    unsigned int warning_threshold = 1000;
+    unsigned int warning_threshold = 100;
     uint64_t target_seqno;
     long long int start;
 
@@ -226,8 +226,10 @@  ovsrcu_synchronize(void)
 
         elapsed = time_msec() - start;
         if (elapsed >= warning_threshold) {
-            VLOG_WARN("blocked %u ms waiting for %s to quiesce",
-                      elapsed, stalled_thread);
+            if (warning_threshold >= 1000) {
+                VLOG_WARN("blocked %u ms waiting for %s to quiesce",
+                          elapsed, stalled_thread);
+            }
             warning_threshold *= 2;
         }
         poll_timer_wait_until(start + warning_threshold);