From patchwork Wed Mar 18 19:07:19 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 24623 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 879EEDDD04 for ; Thu, 19 Mar 2009 06:07:32 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756390AbZCRTHa (ORCPT ); Wed, 18 Mar 2009 15:07:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753744AbZCRTHa (ORCPT ); Wed, 18 Mar 2009 15:07:30 -0400 Received: from gw1.cosmosbay.com ([212.99.114.194]:54162 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753361AbZCRTH3 convert rfc822-to-8bit (ORCPT ); Wed, 18 Mar 2009 15:07:29 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n2IJ7J58003899; Wed, 18 Mar 2009 20:07:20 +0100 Message-ID: <49C14667.2040806@cosmosbay.com> Date: Wed, 18 Mar 2009 20:07:19 +0100 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: Vernon Mauery CC: netdev , LKML , rt-users Subject: Re: High contention on the sk_buff_head.lock References: <49C12E64.1000301@us.ibm.com> In-Reply-To: <49C12E64.1000301@us.ibm.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Wed, 18 Mar 2009 20:07:20 +0100 (CET) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Vernon Mauery a écrit : > I have been beating on network throughput in the -rt kernel for some time > now. After digging down through the send path of UDP packets, I found > that the sk_buff_head.lock is under some very high contention. This lock > is acquired each time a packet is enqueued on a qdisc and then acquired > again to dequeue the packet. Under high networking loads, the enqueueing > processes are not only contending among each other for the lock, but also > with the net-tx soft irq. This makes for some very high contention on this > one lock. My testcase is running varying numbers of concurrent netperf > instances pushing UDP traffic to another machine. As the count goes from > 1 to 2, the network performance increases. But from 2 to 4 and from 4 > to 8, > we see a big decline, with 8 instances pushing about half of what a single > thread can do. > > Running 2.6.29-rc6-rt3 on an 8-way machine with a 10GbE card (I have tried > both NetXen and Broadcom, with very similar results), I can only push about > 1200 Mb/s. Whereas with the mainline 2.6.29-rc8 kernel, I can push nearly > 6000 Mb/s. But still not as much as I think is possible. I was curious and > decided to see if the mainline kernel was hitting the same lock, and using > /proc/lock_stat, it is hitting the sk_buff_head.lock as well (it was the > number one contended lock). > > So while this issue really hits -rt kernels hard, it has a real effect on > mainline kernels as well. The contention of the spinlocks is amplified > when they get turned into rt-mutexes, which causes a double context switch. > > Below is the top of the lock_stat for 2.6.29-rc8. This was captured from > a 1 minute network stress test. The next high contender had 2 orders of > magnitude fewer contentions. Think of the throughput increase if we could > ease this contention a bit. We might even be able to saturate a 10GbE > link. > > lock_stat version 0.3 > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > class name con-bounces contentions waittime-min > waittime-max waittime-total acq-bounces acquisitions > holdtime-min holdtime-max holdtime-total > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > &list->lock#3: 24517307 24643791 0.71 > 1286.62 56516392.42 34834296 44904018 > 0.60 164.79 31314786.02 > ------------- > &list->lock#3 15596927 [] > dev_queue_xmit+0x2ea/0x468 > &list->lock#3 9046864 [] > __qdisc_run+0x11b/0x1ef > ------------- > &list->lock#3 6525300 [] > __qdisc_run+0x11b/0x1ef > &list->lock#3 18118491 [] > dev_queue_xmit+0x2ea/0x468 > > > The story is the same for -rt kernels, only the waittime and holdtime > are both > orders of magnitude greater. > > I am not exactly clear on the solution, but if I understand correctly, > in the > past there has been some discussion of batched enqueueing and > dequeueing. Is > anyone else working on this problem right now who has just not yet posted > anything for review? Questions, comments, flames? > Yes we have a known contention point here, but before adding more complex code, could you try following patch please ? [PATCH] net: Reorder fields of struct Qdisc dev_queue_xmit() needs to dirty fields "state" and "q" On x86_64 arch, they currently span two cache lines, involving more cache line ping pongs than necessary. Before patch : offsetof(struct Qdisc, state)=0x38 offsetof(struct Qdisc, q)=0x48 offsetof(struct Qdisc, dev_queue)=0x60 After patch : offsetof(struct Qdisc, dev_queue)=0x38 offsetof(struct Qdisc, state)=0x48 offsetof(struct Qdisc, q)=0x50 Signed-off-by: Eric Dumazet --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index f8c4742..e24feeb 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -51,10 +51,11 @@ struct Qdisc u32 handle; u32 parent; atomic_t refcnt; - unsigned long state; + struct netdev_queue *dev_queue; + struct sk_buff *gso_skb; + unsigned long state; struct sk_buff_head q; - struct netdev_queue *dev_queue; struct Qdisc *next_sched; struct list_head list;