Message ID | 1411261290.26859.103.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 20 Sep 2014 18:01:30 -0700 > From: Eric Dumazet <edumazet@google.com> > > While using a MQ + NETEM setup, I had confirmation that the default > timer migration ( /proc/sys/kernel/timer_migration ) is killing us. > > Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX > queues) : ... > We can see that timers get migrated into a single cpu, presumably idle > at the time timers are set up. > Then all qdisc dequeues run from this cpu and huge lock contention > happens. This single cpu is stuck in softirq mode and cannot dequeue > fast enough. > > 39.24% [kernel] [k] _raw_spin_lock > 2.65% [kernel] [k] netem_enqueue > 1.80% [kernel] [k] netem_dequeue > 1.63% [kernel] [k] copy_user_enhanced_fast_string > 1.45% [kernel] [k] _raw_spin_lock_bh > > By pinning qdisc timers on the cpu running the qdisc, we respect proper > XPS setting and remove this lock contention. > > 5.84% [kernel] [k] netem_enqueue > 4.83% [kernel] [k] _raw_spin_lock > 2.92% [kernel] [k] copy_user_enhanced_fast_string > > Current Qdiscs that benefit from this change are : > > netem, cbq, fq, hfsc, tbf, htb. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Looks great, applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26/09/14 07:27, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Sat, 20 Sep 2014 18:01:30 -0700 > >> From: Eric Dumazet <edumazet@google.com> >> >> While using a MQ + NETEM setup, I had confirmation that the default >> timer migration ( /proc/sys/kernel/timer_migration ) is killing us. >> >> Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX >> queues) : > ... >> Current Qdiscs that benefit from this change are : >> >> netem, cbq, fq, hfsc, tbf, htb. >> >> Signed-off-by: Eric Dumazet <edumazet@google.com> > Looks great, applied, thanks Eric. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hi Eric, I saw that this patch didn't make it's way into the stable branches. So I have two questions: - Would it be safe to apply to linux-3.12.x stable? - If yes, would there be any [noticeable] efects on a [pretty complex] HTB setup? (I know, test and I'll see, but if theory sais I won't, then there would be no point to the test, would there?) Thank you!
On Wed, 2015-01-07 at 12:06 +0200, Cosmin GIRADU wrote: > Hi Eric, > > I saw that this patch didn't make it's way into the stable branches. > So I have two questions: > - Would it be safe to apply to linux-3.12.x stable? > - If yes, would there be any [noticeable] efects on a [pretty > complex] HTB setup? (I know, test and I'll see, > but if theory sais I won't, then there would be no point to > the test, would there?) > It is safe to backport. If you want to test an equivalent, without kernel patch, you simply can do : echo 0 >/proc/sys/kernel/timer_migration -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index ca6248345937..15e7beee266c 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -586,7 +586,7 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer) void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc) { - hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); wd->timer.function = qdisc_watchdog; wd->qdisc = qdisc; } @@ -602,7 +602,7 @@ void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires) hrtimer_start(&wd->timer, ns_to_ktime(expires), - HRTIMER_MODE_ABS); + HRTIMER_MODE_ABS_PINNED); } EXPORT_SYMBOL(qdisc_watchdog_schedule_ns); diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index a3244a800501..d2cd981ba60d 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -617,7 +617,7 @@ static enum hrtimer_restart cbq_undelay(struct hrtimer *timer) time = ktime_set(0, 0); time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay)); - hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS); + hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS_PINNED); } qdisc_unthrottled(sch); @@ -1386,7 +1386,7 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt) q->link.minidle = -0x7FFFFFFF; qdisc_watchdog_init(&q->watchdog, sch); - hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); q->delay_timer.function = cbq_undelay; q->toplevel = TC_CBQ_MAXLEVEL; q->now = psched_get_time(); diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 14408f262143..063e953d9848 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -932,7 +932,7 @@ ok: ktime_t time = ns_to_ktime(next_event); qdisc_throttled(q->watchdog.qdisc); hrtimer_start(&q->watchdog.timer, time, - HRTIMER_MODE_ABS); + HRTIMER_MODE_ABS_PINNED); } } else { schedule_work(&q->work);