Patchwork net: deadlock during net device unregistration - V2

login
register
mail settings
Submitter Benjamin Thery
Date Oct. 1, 2008, 2:14 p.m.
Message ID <20081001141432.470106980@theryb.frec.bull.fr>
Download mbox | patch
Permalink /patch/2252/
State Superseded
Delegated to: David Miller
Headers show

Comments

Benjamin Thery - Oct. 1, 2008, 2:14 p.m.
This is the second version of a patch aimed at fixing a deadlock that
can occur when unregistering net devices. 

This new version of the patch ensures the garbage collector 
dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling
it in the netdevice notifier dst_dev_event() when receiving a 
NETDEV_UNREGISTER event.

Thanks to Jarek Poplawski for proposing this fix.

(The previous version proposed to replace in linkwatch_event()
the call to rntl_unlock() by __rtnl_lock())

Short description of the deadlock
- - - - - - - - - - - - - - - - -

When reaching netdev_wait_allrefs() some routes may still hold the 
device refcount. dst_gc_task() should 'garbage collect' them soon 
(and decrease the refcount). dst_gc_task() is scheduled with a delay in
the "events" workqueue.
But, as some linkwatch events are also pending, netdev_wait_allrefs()
schedules (without delay) a linkwatch_event() in the same workqueue.
linkwatch_event runs before dst_gc_task() and blocks in  netdev_run_todo
(called by rtnl_unlock()), blocking the whole "events" workqueue.

netdev_wait_allrefs() waiting for           dt_gc_task() waiting
dst_gc_task() to decrease the device -----> linkwatch_event() to return
refcount. It holds the mutex                and unblock the "events" 
netdev_run_todo_mutex                       workqueue
       ^                                        |
       |                                        |
       |                                        |
       |                                        |
       |        linkwatch_event() waiting for <-+
       +------- net_dev_run_todo (who called 
                netdev_wait_allrefs()) to release 
                netdev_run_todo_mutex

Long description:
- - - - - - - - -

In some circumstances, linkwatch_event() blocks the whole "events" 
workqueue while blocking in rtnl_unlock(), and prevent dst_gc_task() 
(enqueued in the same workqueue) to be run:

1. Unregister a device, the following routines are called:

-> unregister_netdev
  -> rtnl_lock
  -> unregister_netdevice
  -> rtnl_unlock
    -> netdev_run_todo
      -> netdev_wait_allrefs

2. In netdev_wait_allrefs(), the device's refcount is greater than 0
  because there are still some routes to be garbage collected later.

3. Also, some link watch events are pending. netdev_wait_allrefs()
  will run the linkwatch event queue, calls linkwatch_run_queue().


Both the route garbage collector dst_gc_task() and the linkwatch task
linkwatch_event() are queued in the same generic workqueue: "events".


4. linkwatch_event() is enqueued earlier in the queue. It will grab
  rtnl_lock(), deliver the link watch events pending, and then call
  rtnl_unlock(). 
  rtnl_unlock() will then call netdev_run_todo() and block on
  mutex_lock(&net_todo_run_mutex).

  At this point, the workqueue "events" is _blocked_ until the
  netdev_wait_allrefs() call above returns when the device refcount 
  reaches 0.

  Problem: it will never happens if dst_gc_task() was enqueued behind
  linkwatch_event() in the "events" workqueue as the queue is now 
  blocked.


Thread foo                            Thread "events"
----------                            ---------------

free IPv6 routes                    |
-> schedule_delayed_work(dst_gc_task)
...                                 |
unregister_netdev                   |
 -> rtln_unlock                     | 
   -> netdev_run_todo               | ...
     -> grab net_todo_run_mutex     |
     -> netdev_wait_allrefs (loop)  | -> run__workqueue()
        ->  linkwatch_run_queue()   | ....
            enqueue linkwatch_event |
            in "events"             |
                                    | -> run_workqueue()
        "BLOCKED"                   |   -> linkwatch_event()
                                    |     -> rtnl_lock()
                                    |     -> __linkwatch_run_queue 
                                    |     ->rtnl_unlock()
                                    |       -> netdev_run_todo
                                    |          -> try to grab net_todo_run_mutex
                                               BLOCKED

Deadlock: 
  
# BLOCKED waiting for dst_gc_task()  # BLOCKED waiting for netdev_wait_allrefs
to decrement the device refcnt         to release net_todo_run_mutex

                  # dst_gst_task() waiting to be run by "events"


This deadlock can be observed sometimes when the loopback is 
unregistered when a network namespaces exits.

Benjamin

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/core/dst.c |    4 ++++
 1 file changed, 4 insertions(+)
Jarek Poplawski - Oct. 1, 2008, 7:48 p.m.
On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
> This is the second version of a patch aimed at fixing a deadlock that
> can occur when unregistering net devices. 
> 
> This new version of the patch ensures the garbage collector 
> dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling
> it in the netdevice notifier dst_dev_event() when receiving a 
> NETDEV_UNREGISTER event.
> 
> Thanks to Jarek Poplawski for proposing this fix.

Not at all! Actually, I'm sorry for this mess... (Read below.)

> (The previous version proposed to replace in linkwatch_event()
> the call to rntl_unlock() by __rtnl_lock())
...
> --- net-next-2.6.orig/net/core/dst.c
> +++ net-next-2.6/net/core/dst.c
> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>  		}
>  		mutex_unlock(&dst_gc_mutex);
> +
> +		if (event == NETDEV_UNREGISTER &&
> +		    cancel_delayed_work(&dst_gc_work))
> +			dst_gc_task(&dst_gc_work.work);

Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
kill this while on timer, but not when queued and maybe blocked already.
Probably cancel_delayed_work_sync() should be considered instead, but
since this needs more checking, and David is waiting for this, I think
it's safer to use this previous (__rtnl_unlock) patch for now
(especially for -stable).

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano - Oct. 1, 2008, 9:06 p.m.
Jarek Poplawski wrote:
> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
>> This is the second version of a patch aimed at fixing a deadlock that
>> can occur when unregistering net devices. 
>>
>> This new version of the patch ensures the garbage collector 
>> dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling
>> it in the netdevice notifier dst_dev_event() when receiving a 
>> NETDEV_UNREGISTER event.
>>
>> Thanks to Jarek Poplawski for proposing this fix.
> 
> Not at all! Actually, I'm sorry for this mess... (Read below.)
> 
>> (The previous version proposed to replace in linkwatch_event()
>> the call to rntl_unlock() by __rtnl_lock())
> ...
>> --- net-next-2.6.orig/net/core/dst.c
>> +++ net-next-2.6/net/core/dst.c
>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>  		}
>>  		mutex_unlock(&dst_gc_mutex);
>> +
>> +		if (event == NETDEV_UNREGISTER &&
>> +		    cancel_delayed_work(&dst_gc_work))
>> +			dst_gc_task(&dst_gc_work.work);
> 
> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
> kill this while on timer, but not when queued and maybe blocked already.

Perhaps, I am misunderstanding but, dst_gc_work is always called with 
schedule_delayed_work. If the task is not running, we can cancel it and 
if the task is running, no need to trigger the gc, no ?

> Probably cancel_delayed_work_sync() should be considered instead, but
> since this needs more checking, and David is waiting for this, I think
> it's safer to use this previous (__rtnl_unlock) patch for now
> (especially for -stable).



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Oct. 1, 2008, 9:52 p.m.
On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
> Jarek Poplawski wrote:
>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
...
>>> --- net-next-2.6.orig/net/core/dst.c
>>> +++ net-next-2.6/net/core/dst.c
>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>  		}
>>>  		mutex_unlock(&dst_gc_mutex);
>>> +
>>> +		if (event == NETDEV_UNREGISTER &&
>>> +		    cancel_delayed_work(&dst_gc_work))
>>> +			dst_gc_task(&dst_gc_work.work);
>>
>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>> kill this while on timer, but not when queued and maybe blocked already.
>
> Perhaps, I am misunderstanding but, dst_gc_work is always called with  
> schedule_delayed_work. If the task is not running, we can cancel it and  
> if the task is running, no need to trigger the gc, no ?

Right, but cancel_delayed_work() can't cancel it reliably even if the
task is not running (but e.g. queued already just after us - linkwatch).
Of course, this should be OK most of the time, especially with such long
schedule delays, but, on the other hand, not as sure as the first patch
with __rtnl_unlock. So doing this right could take more time.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Oct. 1, 2008, 11:31 p.m.
On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
> Jarek Poplawski wrote:
>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
...
>>> --- net-next-2.6.orig/net/core/dst.c
>>> +++ net-next-2.6/net/core/dst.c
>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>  		}
>>>  		mutex_unlock(&dst_gc_mutex);
>>> +
>>> +		if (event == NETDEV_UNREGISTER &&
>>> +		    cancel_delayed_work(&dst_gc_work))
>>> +			dst_gc_task(&dst_gc_work.work);
>>
>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>> kill this while on timer, but not when queued and maybe blocked already.

Hmm#2... Then maybe something like this?:

		if (event == NETDEV_UNREGISTER &&
	    	    (cancel_delayed_work(&dst_gc_work) ||
		     delayed_work_pending(&dst_gc_work)))
			dst_gc_task(&dst_gc_work.work);


Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Thery - Oct. 2, 2008, 3:23 p.m.
Jarek Poplawski wrote:
> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
>> Jarek Poplawski wrote:
>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
> ...
>>>> --- net-next-2.6.orig/net/core/dst.c
>>>> +++ net-next-2.6/net/core/dst.c
>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>>  		}
>>>>  		mutex_unlock(&dst_gc_mutex);
>>>> +
>>>> +		if (event == NETDEV_UNREGISTER &&
>>>> +		    cancel_delayed_work(&dst_gc_work))
>>>> +			dst_gc_task(&dst_gc_work.work);
>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>>> kill this while on timer, but not when queued and maybe blocked already.

You're right.

> 
> Hmm#2... Then maybe something like this?:
> 
> 		if (event == NETDEV_UNREGISTER &&
> 	    	    (cancel_delayed_work(&dst_gc_work) ||
> 		     delayed_work_pending(&dst_gc_work)))
> 			dst_gc_task(&dst_gc_work.work);

Hmmm... I'm not sure I understand what this change do?

OK, I see this ensure we will run dst_gc_task() even if
cancel_delayed_work() failed and the work is still pending (ie. the 
timer has expired and dst_gc_work is already in the queue).

But what if the work was not pending at all at beginning?
We still need to run dst_gc_task().

Is something like this better?
(code expanded to be more readable)

	if (event == NETDEV_UNREGISTER) {
	    if (!delayed_work_pending(&dst_gc_work))
	        /* work is not scheduled (no timer, not in queue) */
		dst_gc_task(&dst_gc_work.work);
	    else if (cancel_delayed_work(&dst_gc_work) ||
  	             delayed_work_pending(&dst_gc_work)))
                 /* work was scheduled (and may be blocked) */
  	        dst_gc_task(&dst_gc_work.work);
	    else
                 /* dst_gc_task() is running, do nothing */
	}

Benjamin



> 
> 
> Jarek P.
> 
>
Jarek Poplawski - Oct. 2, 2008, 6:38 p.m.
On Thu, Oct 02, 2008 at 05:23:30PM +0200, Benjamin Thery wrote:
> Jarek Poplawski wrote:
>> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
>>> Jarek Poplawski wrote:
>>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
>> ...
>>>>> --- net-next-2.6.orig/net/core/dst.c
>>>>> +++ net-next-2.6/net/core/dst.c
>>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>>>  		}
>>>>>  		mutex_unlock(&dst_gc_mutex);
>>>>> +
>>>>> +		if (event == NETDEV_UNREGISTER &&
>>>>> +		    cancel_delayed_work(&dst_gc_work))
>>>>> +			dst_gc_task(&dst_gc_work.work);
>>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>>>> kill this while on timer, but not when queued and maybe blocked already.
>
> You're right.
>
>>
>> Hmm#2... Then maybe something like this?:
>>
>> 		if (event == NETDEV_UNREGISTER &&
>> 	    	    (cancel_delayed_work(&dst_gc_work) ||
>> 		     delayed_work_pending(&dst_gc_work)))
>> 			dst_gc_task(&dst_gc_work.work);
>
> Hmmm... I'm not sure I understand what this change do?
>
> OK, I see this ensure we will run dst_gc_task() even if
> cancel_delayed_work() failed and the work is still pending (ie. the  
> timer has expired and dst_gc_work is already in the queue).

I think, this covers exactly the case of blocking you described, plus
more... (the work is queued but not blocked). 

>
> But what if the work was not pending at all at beginning?
> We still need to run dst_gc_task().

Maybe I miss something, but if this is really needed here then it
looks like we are fixing more than "the blocking" bug BTW.

>
> Is something like this better?
> (code expanded to be more readable)
>
> 	if (event == NETDEV_UNREGISTER) {
> 	    if (!delayed_work_pending(&dst_gc_work))
> 	        /* work is not scheduled (no timer, not in queue) */

		/* may be running too */

> 		dst_gc_task(&dst_gc_work.work);
> 	    else if (cancel_delayed_work(&dst_gc_work) ||
>  	             delayed_work_pending(&dst_gc_work)))
>                 /* work was scheduled (and may be blocked) */

		/* actually could be both running and pending here:
		 * if it's after rearming
		 */

>  	        dst_gc_task(&dst_gc_work.work);
> 	    else
>                 /* dst_gc_task() is running, do nothing */

So again !delayed_work_pending() - there could be the change of state
while checking - but then looks a bit inconsistent. I think this should
be OK too.

As a matter of fact I've thought about something even simpler, which
probably should help for all above concerns too:

	if (event == NETDEV_UNREGISTER)
  	        dst_gc_task(&dst_gc_work.work);

dst_gc_task() locking allows for this, and running this two times in
a row could be even faster than trying to cancel the unnecessary run.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Thery - Oct. 2, 2008, 7:55 p.m.
Quoting Jarek Poplawski <jarkao2@gmail.com>:

> On Thu, Oct 02, 2008 at 05:23:30PM +0200, Benjamin Thery wrote:
>> Jarek Poplawski wrote:
>>> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
>>>> Jarek Poplawski wrote:
>>>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
>>> ...
>>>>>> --- net-next-2.6.orig/net/core/dst.c
>>>>>> +++ net-next-2.6/net/core/dst.c
>>>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>>>>  		}
>>>>>>  		mutex_unlock(&dst_gc_mutex);
>>>>>> +
>>>>>> +		if (event == NETDEV_UNREGISTER &&
>>>>>> +		    cancel_delayed_work(&dst_gc_work))
>>>>>> +			dst_gc_task(&dst_gc_work.work);
>>>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>>>>> kill this while on timer, but not when queued and maybe blocked already.
>>
>> You're right.
>>
>>>
>>> Hmm#2... Then maybe something like this?:
>>>
>>> 		if (event == NETDEV_UNREGISTER &&
>>> 	    	    (cancel_delayed_work(&dst_gc_work) ||
>>> 		     delayed_work_pending(&dst_gc_work)))
>>> 			dst_gc_task(&dst_gc_work.work);
>>
>> Hmmm... I'm not sure I understand what this change do?
>>
>> OK, I see this ensure we will run dst_gc_task() even if
>> cancel_delayed_work() failed and the work is still pending (ie. the
>> timer has expired and dst_gc_work is already in the queue).
>
> I think, this covers exactly the case of blocking you described, plus
> more... (the work is queued but not blocked).
>
>>
>> But what if the work was not pending at all at beginning?
>> We still need to run dst_gc_task().
>
> Maybe I miss something, but if this is really needed here then it
> looks like we are fixing more than "the blocking" bug BTW.
>
>>
>> Is something like this better?
>> (code expanded to be more readable)
>>
>> 	if (event == NETDEV_UNREGISTER) {
>> 	    if (!delayed_work_pending(&dst_gc_work))
>> 	        /* work is not scheduled (no timer, not in queue) */
>
> 		/* may be running too */
>
>> 		dst_gc_task(&dst_gc_work.work);
>> 	    else if (cancel_delayed_work(&dst_gc_work) ||
>>  	             delayed_work_pending(&dst_gc_work)))
>>                 /* work was scheduled (and may be blocked) */
>
> 		/* actually could be both running and pending here:
> 		 * if it's after rearming
> 		 */
>
>>  	        dst_gc_task(&dst_gc_work.work);
>> 	    else
>>                 /* dst_gc_task() is running, do nothing */
>
> So again !delayed_work_pending() - there could be the change of state
> while checking - but then looks a bit inconsistent. I think this should
> be OK too.
>
> As a matter of fact I've thought about something even simpler, which
> probably should help for all above concerns too:
>
> 	if (event == NETDEV_UNREGISTER)
>   	        dst_gc_task(&dst_gc_work.work);
>
> dst_gc_task() locking allows for this, and running this two times in
> a row could be even faster than trying to cancel the unnecessary run.

I've thought a bit more about my last proposal and come to the same
conclusion as you, hmm, almost. I thought we could call
cancel_delayed_work() unconditionally and then dst_gc_task().

  	if (event == NETDEV_UNREGISTER) {
                 cancel_delayed_work(&dst_gc_work);
    	        dst_gc_task(&dst_gc_work.work);
         }

But, you're right, calling only dst_gc_task() seems fine to me.

I'll run some tests tomorrow and send a new patch.

Do we agree that this fix (calling dst_gc_task() in dst_dev_event())
is better/more appropriate than the first one (replacing rtnl_unlock()
by the non-blocking __rtnl_unlock() in linkwatch_event())?

Thanks,
Benjamin

>
> Jarek P.
>
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Oct. 2, 2008, 8:34 p.m.
On Thu, Oct 02, 2008 at 09:55:02PM +0200, Benjamin Thery  wrote:
> Quoting Jarek Poplawski <jarkao2@gmail.com>:
...
>> As a matter of fact I've thought about something even simpler, which
>> probably should help for all above concerns too:
>>
>> 	if (event == NETDEV_UNREGISTER)
>>   	        dst_gc_task(&dst_gc_work.work);
>>
>> dst_gc_task() locking allows for this, and running this two times in
>> a row could be even faster than trying to cancel the unnecessary run.
>
> I've thought a bit more about my last proposal and come to the same
> conclusion as you, hmm, almost. I thought we could call
> cancel_delayed_work() unconditionally and then dst_gc_task().
>
>  	if (event == NETDEV_UNREGISTER) {
>                 cancel_delayed_work(&dst_gc_work);
>    	        dst_gc_task(&dst_gc_work.work);
>         }
>
> But, you're right, calling only dst_gc_task() seems fine to me.

I'm OK with any of these versions.

>
> I'll run some tests tomorrow and send a new patch.
>
> Do we agree that this fix (calling dst_gc_task() in dst_dev_event())
> is better/more appropriate than the first one (replacing rtnl_unlock()
> by the non-blocking __rtnl_unlock() in linkwatch_event())?

I agree it's more appropriate, because:
a) the job is done according to the rules (in the comment) during
   the notification and not in some future,
b) it removes some hidden dependencies between processes/works, which,
   even if they currently don't exist except this one in the linkwatch,
   could be extremly hard to diagnose, if added accidentally in the
   future.

On the other hand, until we are not sure of the reasons, why something
like this (the full destroying) was avoided in the past, and until it's
heavily tested (with lockdep), your first patch looks to me much safer
if applying to any -stable is considered.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Oct. 4, 2008, 7:42 a.m.
On Thu, Oct 02, 2008 at 10:34:35PM +0200, Jarek Poplawski wrote:
> On Thu, Oct 02, 2008 at 09:55:02PM +0200, Benjamin Thery  wrote:
> > Quoting Jarek Poplawski <jarkao2@gmail.com>:
> ...
> >> As a matter of fact I've thought about something even simpler, which
> >> probably should help for all above concerns too:
> >>
> >> 	if (event == NETDEV_UNREGISTER)
> >>   	        dst_gc_task(&dst_gc_work.work);
> >>
> >> dst_gc_task() locking allows for this, and running this two times in
> >> a row could be even faster than trying to cancel the unnecessary run.
> >
> > I've thought a bit more about my last proposal and come to the same
> > conclusion as you, hmm, almost. I thought we could call
> > cancel_delayed_work() unconditionally and then dst_gc_task().
> >
> >  	if (event == NETDEV_UNREGISTER) {
> >                 cancel_delayed_work(&dst_gc_work);
> >    	        dst_gc_task(&dst_gc_work.work);
> >         }
> >
> > But, you're right, calling only dst_gc_task() seems fine to me.
> 
> I'm OK with any of these versions.
> 
> >
> > I'll run some tests tomorrow and send a new patch.
> >
> > Do we agree that this fix (calling dst_gc_task() in dst_dev_event())
> > is better/more appropriate than the first one (replacing rtnl_unlock()
> > by the non-blocking __rtnl_unlock() in linkwatch_event())?

Hmm... You can kill me, but after more looking at this I've changed my
mind.

> I agree it's more appropriate, because:
> a) the job is done according to the rules (in the comment) during
>    the notification and not in some future,

But, since the rules are wrong or protocols buggy, the solution isn't
effective: there should be no deadlock after this, but it's also hard
to predict how many dst_gc_task() runs are needed to free all refs.
During this all other works would be unnecessarily blocked by the
linkwatch, what looks a bit too hoggish.

My current idea is we should move linkwatch or dst_gc_task() (or both)
to it's own workqueue.

> b) it removes some hidden dependencies between processes/works, which,
>    even if they currently don't exist except this one in the linkwatch,
>    could be extremly hard to diagnose, if added accidentally in the
>    future.
> 
> On the other hand, until we are not sure of the reasons, why something

BTW, sorry, of course:
"On the other hand, until we are sure of the reasons, why something"

> like this (the full destroying) was avoided in the past, and until it's
> heavily tested (with lockdep), your first patch looks to me much safer
> if applying to any -stable is considered.

And this first patch looks better and better each time...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Oct. 4, 2008, 7:52 a.m.
On Sat, Oct 04, 2008 at 09:42:48AM +0200, Jarek Poplawski wrote:
...
> During this all other works would be unnecessarily blocked by the
> linkwatch, what looks a bit too hoggish.

Or even more: can't some of these blockeds task hold our refs too?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: net-next-2.6/net/core/dst.c
===================================================================
--- net-next-2.6.orig/net/core/dst.c
+++ net-next-2.6/net/core/dst.c
@@ -328,6 +328,10 @@  static int dst_dev_event(struct notifier
 			dst_ifdown(dst, dev, event != NETDEV_DOWN);
 		}
 		mutex_unlock(&dst_gc_mutex);
+
+		if (event == NETDEV_UNREGISTER &&
+		    cancel_delayed_work(&dst_gc_work))
+			dst_gc_task(&dst_gc_work.work);
 		break;
 	}
 	return NOTIFY_DONE;