Message ID | 1265657560.4236.80.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 08 Feb 2010 20:32:40 +0100 > [PATCH] dst: call cond_resched() in dst_gc_task() > > On some workloads, it is quite possible to get a huge dst list to > process in dst_gc_task(), and trigger soft lockup detection. > > Fix is to call cond_resched(), as we run in process context. > > Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> > Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied and queued up to -stable. When fixing bugs with kernel bugzilla entries, please mention them in the commit message. I fixed this up for you but please take care of it next time. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 08 Feb 2010 20:32:40 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > [PATCH] dst: call cond_resched() in dst_gc_task() > > On some workloads, it is quite possible to get a huge dst list to > process in dst_gc_task(), and trigger soft lockup detection. > > Fix is to call cond_resched(), as we run in process context. > > Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> > Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > > diff --git a/net/core/dst.c b/net/core/dst.c > index 57bc4d5..cb1b348 100644 > --- a/net/core/dst.c > +++ b/net/core/dst.c > @@ -17,6 +17,7 @@ > #include <linux/string.h> > #include <linux/types.h> > #include <net/net_namespace.h> > +#include <linux/sched.h> > > #include <net/dst.h> > > @@ -79,6 +80,7 @@ loop: > while ((dst = next) != NULL) { > next = dst->next; > prefetch(&next->next); > + cond_resched(); > if (likely(atomic_read(&dst->__refcnt))) { > last->next = dst; > last = dst; Gad. Am I understanding this right? The softlockup threshold is sixty seconds! I assume that this function spends most of its time walking over busy entries? Is a more powerful data structure needed? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 8 Feb 2010 15:26:06 -0800 > I assume that this function spends most of its time walking over busy > entries? Is a more powerful data structure needed? When you're getting pounded with millions of packets per second, all mostly to different destinations (and thus resolving to different routing cache entries), this is what happens. For a busy router, really, this is normal behavior. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 08 Feb 2010 15:34:06 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Mon, 8 Feb 2010 15:26:06 -0800 > > > I assume that this function spends most of its time walking over busy > > entries? Is a more powerful data structure needed? > > When you're getting pounded with millions of packets per second, > all mostly to different destinations (and thus resolving to > different routing cache entries), this is what happens. > > For a busy router, really, this is normal behavior. Is the cache a net win in that scenario? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 8 Feb 2010 15:37:44 -0800 > On Mon, 08 Feb 2010 15:34:06 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > >> For a busy router, really, this is normal behavior. > > Is the cache a net win in that scenario? Absolutely. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 8 Feb 2010 15:37:44 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 08 Feb 2010 15:34:06 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > > > From: Andrew Morton <akpm@linux-foundation.org> > > Date: Mon, 8 Feb 2010 15:26:06 -0800 > > > > > I assume that this function spends most of its time walking over busy > > > entries? Is a more powerful data structure needed? > > > > When you're getting pounded with millions of packets per second, > > all mostly to different destinations (and thus resolving to > > different routing cache entries), this is what happens. > > > > For a busy router, really, this is normal behavior. > > Is the cache a net win in that scenario? No, cache doesn't help. Robert who is the expert in this area, runs with FIB TRIE and no routing cache. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 08 février 2010 à 15:50 -0800, Stephen Hemminger a écrit : > No, cache doesn't help. > > Robert who is the expert in this area, runs with FIB TRIE and > no routing cache. Who knows, it probably depends on many factors. I always run with cache enabled, because it saves cycles on moderate load. FIB_TRIE is unrelated here, if routing table is very small, it fits HASH or TRIE. Pawel hit the bug with tunables that basically enabled the cache but in a non helpful way (filling the list of busy dst). User error combined with a lazy kernel function :) Please note that conversion from softirq to workqueue, without scheduling point, might/probably use same cpu for handling network irqs and running dst_gc_task() : On big routers, admins usually use irq affinities, so we can have very litle cpu time available to run other tasks on those cpus. After this patch, I believe that scheduler is allowed to migrate dst_gc_task() to an idle cpu. Another point (for 2.6.34) to address is the dst_gc_mutex that can delay NETDEV_UNREGISTER/NETDEV_DOWN events for a long period. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 08 février 2010 à 15:01 -0800, David Miller a écrit : > > When fixing bugs with kernel bugzilla entries, please > mention them in the commit message. I fixed this up for > you but please take care of it next time. > > Thanks! Sorry Dave, I was not aware of the bugzilla entry. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 09 Feb 2010 07:06:38 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > After this patch, I believe that scheduler is allowed to migrate > dst_gc_task() to an idle cpu. No, keventd threads are each pinned to a single CPU (kthread_bind() in start_workqueue_thread()), so dst_gc_task() gets run on the CPU which ran schedule_delayed_work() and no other. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 08 février 2010 à 22:35 -0800, Andrew Morton a écrit : > On Tue, 09 Feb 2010 07:06:38 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > After this patch, I believe that scheduler is allowed to migrate > > dst_gc_task() to an idle cpu. > > No, keventd threads are each pinned to a single CPU (kthread_bind() in > start_workqueue_thread()), so dst_gc_task() gets run on the CPU which > ran schedule_delayed_work() and no other. > Ah OK, thanks Andrew for this clarification. I suppose offlining a cpu migrates its works to another (online) cpu ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 09 Feb 2010 08:20:36 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> I suppose offlining a cpu migrates its works to another (online) cpu ?
Sort of, effectively. The workqueue code runs all the pending works on
the to-be-offlined CPU and then it's done.
schedule_delayed_work() starts out with a timer, and the timer code
_does_ perform migration off the going-away CPU.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dst.c b/net/core/dst.c index 57bc4d5..cb1b348 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -17,6 +17,7 @@ #include <linux/string.h> #include <linux/types.h> #include <net/net_namespace.h> +#include <linux/sched.h> #include <net/dst.h> @@ -79,6 +80,7 @@ loop: while ((dst = next) != NULL) { next = dst->next; prefetch(&next->next); + cond_resched(); if (likely(atomic_read(&dst->__refcnt))) { last->next = dst; last = dst;