High contention on the sk_buff_head.lock

Message ID	20090320232943.GA3024@ami.dom.local
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:in-reply-to:user-agent; b=cB28K+wvcTl4TqpttVST3pEl04w1GGB1zH8li+tH272IreXdJICRS8XCkKKrxrjGXk aHztFIP+G01VDrHnXq8oG7QsarNmo8NwhCoegtCWUvqHHN60cDUxPtlfoyED2oSEiJ6C fIdqUYH58CKRh5ucbUUnmEg2cBa9tlhdd5LE8= Date: Sat, 21 Mar 2009 00:29:43 +0100 From: Jarek Poplawski <jarkao2@gmail.com> To: Vernon Mauery <vernux@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com>, netdev <netdev@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, rt-users <linux-rt-users@vger.kernel.org> Subject: Re: High contention on the sk_buff_head.lock Message-ID: <20090320232943.GA3024@ami.dom.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49C156E8.1090306@us.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: netdev-owner@vger.kernel.org Precedence: bulk

Jarek Poplawski March 20, 2009, 11:29 p.m. UTC

Vernon Mauery wrote, On 03/18/2009 09:17 PM:
...
> This patch does seem to reduce the number of contentions by about 10%.  That is
> a good start (and a good catch on the cacheline bounces).  But, like I mentioned
> above, this lock still has 2 orders of magnitude greater contention than the
> next lock, so even a large decrease like 10% makes little difference in the
> overall contention characteristics.
> 
> So we will have to do something more.  Whether it needs to be more complex or
> not is still up in the air.  Batched enqueueing/dequeueing are just two options
> and the former would be a *lot* less complex than the latter.
> 
> If anyone else has any ideas they have been holding back, now would be a great
> time to get them out in the open.

I think there would be interesting to check another idea around this
contention: not all contenders are equal here. One thread is doing
qdisc_run() and owning the transmit queue (even after releasing the TX
lock). So if it waits for the qdisc lock the NIC, if not multiqueue,
is idle. Probably some handicap like in the patch below could make
some difference in throughput; alas I didn't test it.

Jarek P.
---

 net/core/dev.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet March 23, 2009, 8:32 a.m. UTC | #1

Jarek Poplawski a écrit :
> Vernon Mauery wrote, On 03/18/2009 09:17 PM:
> ...
>> This patch does seem to reduce the number of contentions by about 10%.  That is
>> a good start (and a good catch on the cacheline bounces).  But, like I mentioned
>> above, this lock still has 2 orders of magnitude greater contention than the
>> next lock, so even a large decrease like 10% makes little difference in the
>> overall contention characteristics.
>>
>> So we will have to do something more.  Whether it needs to be more complex or
>> not is still up in the air.  Batched enqueueing/dequeueing are just two options
>> and the former would be a *lot* less complex than the latter.
>>
>> If anyone else has any ideas they have been holding back, now would be a great
>> time to get them out in the open.
> 
> I think there would be interesting to check another idea around this
> contention: not all contenders are equal here. One thread is doing
> qdisc_run() and owning the transmit queue (even after releasing the TX
> lock). So if it waits for the qdisc lock the NIC, if not multiqueue,
> is idle. Probably some handicap like in the patch below could make
> some difference in throughput; alas I didn't test it.
> 
> Jarek P.
> ---
> 
>  net/core/dev.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f112970..d5ad808 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1852,7 +1852,11 @@ gso:
>  	if (q->enqueue) {
>  		spinlock_t *root_lock = qdisc_lock(q);
>  
> -		spin_lock(root_lock);
> +		while (!spin_trylock(root_lock)) {
> +			do {
> +				cpu_relax();
> +			} while (spin_is_locked(root_lock));
> +		}
>  
>  		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
>  			kfree_skb(skb);
> 
> 

I dont understand, doesnt it defeat the ticket spinlock thing and fairness ?

Thread doing __qdisc_run() already owns the __QDISC_STATE_RUNNING bit.

trying or taking spinlock has same effect, since it force a cache line ping pong,
and this is the real problem.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller March 23, 2009, 8:37 a.m. UTC | #2

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Mon, 23 Mar 2009 09:32:39 +0100

> I dont understand, doesnt it defeat the ticket spinlock thing and fairness ?
> 
> Thread doing __qdisc_run() already owns the __QDISC_STATE_RUNNING bit.

Right.

Remember, the way this is designed is that if there is a busy
cpu taking packets out of the queue and putting them into the
device then other cpus will simply add to the queue and immediately
return.  This effectively keeps the queue running there processing
all the new work that other cpus are adding to the qdisc.

Those other cpus make these decisions by looking at that
__QDISC_STATE_RUNNING bit, which the queue runner grabs before
it does any work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jarek Poplawski March 23, 2009, 8:50 a.m. UTC | #3

On Mon, Mar 23, 2009 at 01:37:49AM -0700, David Miller wrote:
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Mon, 23 Mar 2009 09:32:39 +0100
> 
> > I dont understand, doesnt it defeat the ticket spinlock thing and fairness ?
> > 
> > Thread doing __qdisc_run() already owns the __QDISC_STATE_RUNNING bit.
> 
> Right.
> 
> Remember, the way this is designed is that if there is a busy
> cpu taking packets out of the queue and putting them into the
> device then other cpus will simply add to the queue and immediately
> return.

But this "busy cpu" can't take packets out of the queue when it's
waiting on the contended spinlock. Anyway, it's only for testing,
and I didn't say it has to be right.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Herbert Xu April 2, 2009, 2:13 p.m. UTC | #4

David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Mon, 23 Mar 2009 09:32:39 +0100
> 
>> I dont understand, doesnt it defeat the ticket spinlock thing and fairness ?
>> 
>> Thread doing __qdisc_run() already owns the __QDISC_STATE_RUNNING bit.
> 
> Right.
> 
> Remember, the way this is designed is that if there is a busy
> cpu taking packets out of the queue and putting them into the
> device then other cpus will simply add to the queue and immediately
> return.  This effectively keeps the queue running there processing
> all the new work that other cpus are adding to the qdisc.
> 
> Those other cpus make these decisions by looking at that
> __QDISC_STATE_RUNNING bit, which the queue runner grabs before
> it does any work.

Come on guys, if this lock is a problem. go out and buy a proper
NIC that supports multiequeue TX!

Cheers,

Herbert Xu April 2, 2009, 2:15 p.m. UTC | #5

On Thu, Apr 02, 2009 at 10:13:42PM +0800, Herbert Xu wrote:
>
> Come on guys, if this lock is a problem. go out and buy a proper
> NIC that supports multiequeue TX!

Nevermind, I was reading old emails again :)

High contention on the sk_buff_head.lock

Commit Message

Comments

Patch