diff mbox

net/sched: latent livelock in dev_deactivate_many() due to yield() usage

Message ID 1491362429.4536.77.camel@gmx.de
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Mike Galbraith April 5, 2017, 3:20 a.m. UTC
On Tue, 2017-04-04 at 15:39 -0700, Cong Wang wrote:

> Thanks for the report! Looks like a quick solution here is to replace
> this yield() with cond_resched(), it is harder to really wait for
> all qdisc's to transmit all packets.

No, cond_resched() won't help.  What I did is below, but I suspect net
wizards will do something better.

---
 net/sched/sch_generic.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Cong Wang April 5, 2017, 5:25 a.m. UTC | #1
On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith <efault@gmx.de> wrote:
> -               while (some_qdisc_is_busy(dev))
> -                       yield();
> +               swait_event_timeout(swait, !some_qdisc_is_busy(dev), 1);
>  }

I don't see why this is an improvement even if I don't care about the
hardcoded timeout for now... Why the scheduler can make a better
decision with swait_event_timeout() than with cond_resched()?
Mike Galbraith April 5, 2017, 6:12 a.m. UTC | #2
On Tue, 2017-04-04 at 22:25 -0700, Cong Wang wrote:
> On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith <efault@gmx.de> wrote:
> > -               while (some_qdisc_is_busy(dev))
> > -                       yield();
> > +               swait_event_timeout(swait,
> > !some_qdisc_is_busy(dev), 1);
> >  }
> 
> I don't see why this is an improvement even if I don't care about the
> hardcoded timeout for now... Why the scheduler can make a better
> decision with swait_event_timeout() than with cond_resched()?

Because sleeping gets you out of the way?  There is no other decision
the scheduler can make while a SCHED_FIFO task is trying to yield when
it is the one and only task at it's priority.  The scheduler is doing
exactly what it is supposed to do, problem is people calling yield()
tend to think it does something it does not do, which is why it is
decorated with "if you think you want yield(), think again"

Yes, yield semantics suck rocks, basically don't exist.  Hop in your
time machine and slap whoever you find claiming responsibility :)

	-Mike
Cong Wang April 5, 2017, 11:55 p.m. UTC | #3
On Tue, Apr 4, 2017 at 11:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-04-04 at 22:25 -0700, Cong Wang wrote:
>> On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith <efault@gmx.de> wrote:
>> > -               while (some_qdisc_is_busy(dev))
>> > -                       yield();
>> > +               swait_event_timeout(swait,
>> > !some_qdisc_is_busy(dev), 1);
>> >  }
>>
>> I don't see why this is an improvement even if I don't care about the
>> hardcoded timeout for now... Why the scheduler can make a better
>> decision with swait_event_timeout() than with cond_resched()?
>
> Because sleeping gets you out of the way?  There is no other decision
> the scheduler can make while a SCHED_FIFO task is trying to yield when
> it is the one and only task at it's priority.  The scheduler is doing
> exactly what it is supposed to do, problem is people calling yield()
> tend to think it does something it does not do, which is why it is
> decorated with "if you think you want yield(), think again"
>
> Yes, yield semantics suck rocks, basically don't exist.  Hop in your
> time machine and slap whoever you find claiming responsibility :)

I am not trying to defend for yield(), I am trying to understand when
cond_resched() is not a right solution to replace yield() and when it is.
For me, the dev_deactivate_many() case is, because I interpret
"be nice" differently.

Thanks.
Mike Galbraith April 6, 2017, 1:08 a.m. UTC | #4
On Wed, 2017-04-05 at 16:55 -0700, Cong Wang wrote:
> On Tue, Apr 4, 2017 at 11:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> > On Tue, 2017-04-04 at 22:25 -0700, Cong Wang wrote:
> > > On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith <efault@gmx.de> wrote:
> > > > -               while (some_qdisc_is_busy(dev))
> > > > -                       yield();
> > > > +               swait_event_timeout(swait,
> > > > !some_qdisc_is_busy(dev), 1);
> > > >  }
> > > 
> > > I don't see why this is an improvement even if I don't care about the
> > > hardcoded timeout for now... Why the scheduler can make a better
> > > decision with swait_event_timeout() than with cond_resched()?
> > 
> > Because sleeping gets you out of the way?  There is no other decision
> > the scheduler can make while a SCHED_FIFO task is trying to yield when
> > it is the one and only task at it's priority.  The scheduler is doing
> > exactly what it is supposed to do, problem is people calling yield()
> > tend to think it does something it does not do, which is why it is
> > decorated with "if you think you want yield(), think again"
> > 
> > Yes, yield semantics suck rocks, basically don't exist.  Hop in your
> > time machine and slap whoever you find claiming responsibility :)
> 
> I am not trying to defend for yield(), I am trying to understand when
> cond_resched() is not a right solution to replace yield() and when it is.
> For me, the dev_deactivate_many() case is, because I interpret
> "be nice" differently.

Yeah, I know you weren't defending it, just as I know that the net-fu
masters don't need that comment held close to their noses in order to
be able to read it.. waving it about wasn't for their benefit ;-)

	-Mike
Peter Zijlstra April 6, 2017, 10:26 a.m. UTC | #5
On Tue, Apr 04, 2017 at 10:25:19PM -0700, Cong Wang wrote:
> On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith <efault@gmx.de> wrote:
> > -               while (some_qdisc_is_busy(dev))
> > -                       yield();
> > +               swait_event_timeout(swait, !some_qdisc_is_busy(dev), 1);
> >  }
> 
> I don't see why this is an improvement even if I don't care about the
> hardcoded timeout for now... Why the scheduler can make a better
> decision with swait_event_timeout() than with cond_resched()?

cond_resched() might be a no-op.

and doing yield() will result in a priority inversion deadlock. Imagine
the task doing yield() being the top priority (fifo99) task in the
system. Then it will simply spin forever, not giving whatever task is
required to make your condition true time to run.
diff mbox

Patch

--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -16,6 +16,7 @@ 
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/swait.h>
 #include <linux/string.h>
 #include <linux/errno.h>
 #include <linux/netdevice.h>
@@ -901,6 +902,7 @@  static bool some_qdisc_is_busy(struct ne
  */
 void dev_deactivate_many(struct list_head *head)
 {
+	DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(swait);
 	struct net_device *dev;
 	bool sync_needed = false;
 
@@ -924,8 +926,7 @@  void dev_deactivate_many(struct list_hea
 
 	/* Wait for outstanding qdisc_run calls. */
 	list_for_each_entry(dev, head, close_list)
-		while (some_qdisc_is_busy(dev))
-			yield();
+		swait_event_timeout(swait, !some_qdisc_is_busy(dev), 1);
 }
 
 void dev_deactivate(struct net_device *dev)