diff mbox

Long delays creating a netns after deleting one (possibly RCU related)

Message ID CAM_iQpUDnPfVayQv7Q+ZtC_2_ZbX9MRX=514gFpF3E6XY+GtSA@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Cong Wang Nov. 12, 2016, 12:55 a.m. UTC
On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
> modified to check for net_mutex being held in addition to the current
> checks for RTNL being held?
>

Good point!

Like commit be3fc413da9eb17cce0991f214ab0, checking
for net_mutex for this case seems to be an optimization, I assume
synchronize_rcu_expedited() and synchronize_rcu() have the same
behavior...

Comments

Cong Wang Nov. 14, 2016, 6:47 a.m. UTC | #1
On Fri, Nov 11, 2016 at 4:55 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
>>
>> Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
>> modified to check for net_mutex being held in addition to the current
>> checks for RTNL being held?
>>
>
> Good point!
>
> Like commit be3fc413da9eb17cce0991f214ab0, checking
> for net_mutex for this case seems to be an optimization, I assume
> synchronize_rcu_expedited() and synchronize_rcu() have the same
> behavior...

Thinking a bit more, I think commit be3fc413da9eb17cce0991f
gets wrong on rtnl_is_locked(), the lock could be locked by other
process not by the current one, therefore it should be
lockdep_rtnl_is_held() which, however, is defined only when LOCKDEP
is enabled... Sigh.

I don't see any better way than letting callers decide if they want the
expedited version or not, but this requires changes of all callers of
synchronize_net(). Hm.
Paul E. McKenney Nov. 14, 2016, 4:24 p.m. UTC | #2
On Sun, Nov 13, 2016 at 10:47:01PM -0800, Cong Wang wrote:
> On Fri, Nov 11, 2016 at 4:55 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> >>
> >> Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
> >> modified to check for net_mutex being held in addition to the current
> >> checks for RTNL being held?
> >>
> >
> > Good point!
> >
> > Like commit be3fc413da9eb17cce0991f214ab0, checking
> > for net_mutex for this case seems to be an optimization, I assume
> > synchronize_rcu_expedited() and synchronize_rcu() have the same
> > behavior...
> 
> Thinking a bit more, I think commit be3fc413da9eb17cce0991f
> gets wrong on rtnl_is_locked(), the lock could be locked by other
> process not by the current one, therefore it should be
> lockdep_rtnl_is_held() which, however, is defined only when LOCKDEP
> is enabled... Sigh.
> 
> I don't see any better way than letting callers decide if they want the
> expedited version or not, but this requires changes of all callers of
> synchronize_net(). Hm.

I must confess that I don't understand how it would help to use an
expedited grace period when some other process is holding RTNL.
In contrast, I do well understand how it helps when the current process
is holding RTNL.

So what am I missing here?

							Thanx, Paul
Hannes Frederic Sowa Nov. 14, 2016, 5:29 p.m. UTC | #3
Hi Cong,

On Sat, Nov 12, 2016, at 01:55, Cong Wang wrote:
> On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
> > modified to check for net_mutex being held in addition to the current
> > checks for RTNL being held?
> >
> 
> Good point!
> 
> Like commit be3fc413da9eb17cce0991f214ab0, checking
> for net_mutex for this case seems to be an optimization, I assume
> synchronize_rcu_expedited() and synchronize_rcu() have the same
> behavior...
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index eaad4c2..3415b6b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7762,7 +7762,7 @@ EXPORT_SYMBOL(free_netdev);
>  void synchronize_net(void)
>  {
>         might_sleep();
> -       if (rtnl_is_locked())
> +       if (rtnl_is_locked() || lockdep_is_held(&net_mutex))
>                 synchronize_rcu_expedited();

I don't think we should depend on lockdep for this check but rather use
mutex_is_locked here (I think it would fail to build like this without
CONFIG_LOCKDEP).

Bye,
Hannes
Cong Wang Nov. 14, 2016, 5:44 p.m. UTC | #4
On Mon, Nov 14, 2016 at 8:24 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Sun, Nov 13, 2016 at 10:47:01PM -0800, Cong Wang wrote:
>> On Fri, Nov 11, 2016 at 4:55 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
>> > <paulmck@linux.vnet.ibm.com> wrote:
>> >>
>> >> Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
>> >> modified to check for net_mutex being held in addition to the current
>> >> checks for RTNL being held?
>> >>
>> >
>> > Good point!
>> >
>> > Like commit be3fc413da9eb17cce0991f214ab0, checking
>> > for net_mutex for this case seems to be an optimization, I assume
>> > synchronize_rcu_expedited() and synchronize_rcu() have the same
>> > behavior...
>>
>> Thinking a bit more, I think commit be3fc413da9eb17cce0991f
>> gets wrong on rtnl_is_locked(), the lock could be locked by other
>> process not by the current one, therefore it should be
>> lockdep_rtnl_is_held() which, however, is defined only when LOCKDEP
>> is enabled... Sigh.
>>
>> I don't see any better way than letting callers decide if they want the
>> expedited version or not, but this requires changes of all callers of
>> synchronize_net(). Hm.
>
> I must confess that I don't understand how it would help to use an
> expedited grace period when some other process is holding RTNL.
> In contrast, I do well understand how it helps when the current process
> is holding RTNL.

Yeah, this is exactly my point. And same for ASSERT_RTNL() which checks
rtnl_is_locked(), clearly we need to assert "it is held by the current process"
rather than "it is locked by whatever process".

But given *_is_held() is always defined by LOCKDEP, so we probably need
mutex to provide such a helper directly, mutex->owner is not always defined
either. :-/
Paul E. McKenney Nov. 14, 2016, 6:14 p.m. UTC | #5
On Mon, Nov 14, 2016 at 09:44:35AM -0800, Cong Wang wrote:
> On Mon, Nov 14, 2016 at 8:24 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Sun, Nov 13, 2016 at 10:47:01PM -0800, Cong Wang wrote:
> >> On Fri, Nov 11, 2016 at 4:55 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
> >> > <paulmck@linux.vnet.ibm.com> wrote:
> >> >>
> >> >> Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
> >> >> modified to check for net_mutex being held in addition to the current
> >> >> checks for RTNL being held?
> >> >>
> >> >
> >> > Good point!
> >> >
> >> > Like commit be3fc413da9eb17cce0991f214ab0, checking
> >> > for net_mutex for this case seems to be an optimization, I assume
> >> > synchronize_rcu_expedited() and synchronize_rcu() have the same
> >> > behavior...
> >>
> >> Thinking a bit more, I think commit be3fc413da9eb17cce0991f
> >> gets wrong on rtnl_is_locked(), the lock could be locked by other
> >> process not by the current one, therefore it should be
> >> lockdep_rtnl_is_held() which, however, is defined only when LOCKDEP
> >> is enabled... Sigh.
> >>
> >> I don't see any better way than letting callers decide if they want the
> >> expedited version or not, but this requires changes of all callers of
> >> synchronize_net(). Hm.
> >
> > I must confess that I don't understand how it would help to use an
> > expedited grace period when some other process is holding RTNL.
> > In contrast, I do well understand how it helps when the current process
> > is holding RTNL.
> 
> Yeah, this is exactly my point. And same for ASSERT_RTNL() which checks
> rtnl_is_locked(), clearly we need to assert "it is held by the current process"
> rather than "it is locked by whatever process".
> 
> But given *_is_held() is always defined by LOCKDEP, so we probably need
> mutex to provide such a helper directly, mutex->owner is not always defined
> either. :-/

There is always the option of making acquisition and release set a per-task
variable that can be tested.  (Where did I put that asbestos suit, anyway?)

							Thanx, Paul
Eric W. Biederman Nov. 14, 2016, 10:12 p.m. UTC | #6
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:

> On Mon, Nov 14, 2016 at 09:44:35AM -0800, Cong Wang wrote:
>> On Mon, Nov 14, 2016 at 8:24 AM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Sun, Nov 13, 2016 at 10:47:01PM -0800, Cong Wang wrote:
>> >> On Fri, Nov 11, 2016 at 4:55 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> > On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
>> >> > <paulmck@linux.vnet.ibm.com> wrote:
>> >> >>
>> >> >> Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
>> >> >> modified to check for net_mutex being held in addition to the current
>> >> >> checks for RTNL being held?
>> >> >>
>> >> >
>> >> > Good point!
>> >> >
>> >> > Like commit be3fc413da9eb17cce0991f214ab0, checking
>> >> > for net_mutex for this case seems to be an optimization, I assume
>> >> > synchronize_rcu_expedited() and synchronize_rcu() have the same
>> >> > behavior...
>> >>
>> >> Thinking a bit more, I think commit be3fc413da9eb17cce0991f
>> >> gets wrong on rtnl_is_locked(), the lock could be locked by other
>> >> process not by the current one, therefore it should be
>> >> lockdep_rtnl_is_held() which, however, is defined only when LOCKDEP
>> >> is enabled... Sigh.
>> >>
>> >> I don't see any better way than letting callers decide if they want the
>> >> expedited version or not, but this requires changes of all callers of
>> >> synchronize_net(). Hm.
>> >
>> > I must confess that I don't understand how it would help to use an
>> > expedited grace period when some other process is holding RTNL.
>> > In contrast, I do well understand how it helps when the current process
>> > is holding RTNL.
>> 
>> Yeah, this is exactly my point. And same for ASSERT_RTNL() which checks
>> rtnl_is_locked(), clearly we need to assert "it is held by the current process"
>> rather than "it is locked by whatever process".
>> 
>> But given *_is_held() is always defined by LOCKDEP, so we probably need
>> mutex to provide such a helper directly, mutex->owner is not always defined
>> either. :-/
>
> There is always the option of making acquisition and release set a per-task
> variable that can be tested.  (Where did I put that asbestos suit, anyway?)
>
> 							Thanx, Paul

synchronize_rcu_expidited is not enough if you have multiple network
devices in play.

Looking at the code it comes down to this commit, and it appears there
is a promise add rcu grace period combining by Eric Dumazet.

Eric since people are hitting noticable stalls because of the rcu grace
period taking a long time do you think you could look at this code path
a bit more?

commit 93d05d4a320cb16712bb3d57a9658f395d8cecb9
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Nov 18 06:31:03 2015 -0800

    net: provide generic busy polling to all NAPI drivers
    
    NAPI drivers no longer need to observe a particular protocol
    to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
    
    napi_hash_add() and napi_hash_del() are automatically called
    from core networking stack, respectively from
    netif_napi_add() and netif_napi_del()
    
    This patch depends on free_netdev() and netif_napi_del() being
    called from process context, which seems to be the norm.
    
    Drivers might still prefer to call napi_hash_del() on their
    own, since they might combine all the rcu grace periods into
    a single one, knowing their NAPI structures lifetime, while
    core networking stack has no idea of a possible combining.
    
    Once this patch proves to not bring serious regressions,
    we will cleanup drivers to either remove napi_hash_del()
    or provide appropriate rcu grace periods combining.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Eric
Eric Dumazet Nov. 14, 2016, 10:46 p.m. UTC | #7
On Mon, 2016-11-14 at 16:12 -0600, Eric W. Biederman wrote:

> synchronize_rcu_expidited is not enough if you have multiple network
> devices in play.
> 
> Looking at the code it comes down to this commit, and it appears there
> is a promise add rcu grace period combining by Eric Dumazet.
> 
> Eric since people are hitting noticable stalls because of the rcu grace
> period taking a long time do you think you could look at this code path
> a bit more?
> 
> commit 93d05d4a320cb16712bb3d57a9658f395d8cecb9
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Nov 18 06:31:03 2015 -0800

Absolutely, I will take a loop asap.

Thanks Eric.
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index eaad4c2..3415b6b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7762,7 +7762,7 @@  EXPORT_SYMBOL(free_netdev);
 void synchronize_net(void)
 {
        might_sleep();
-       if (rtnl_is_locked())
+       if (rtnl_is_locked() || lockdep_is_held(&net_mutex))
                synchronize_rcu_expedited();
        else
                synchronize_rcu();