[2/2] udp: RCU handling for Unicast packets.

From: Eric Dumazet <dada1@cosmosbay.com>

Paul E. McKenney a écrit :
> On Wed, Oct 29, 2008 at 09:00:13PM +0100, Eric Dumazet wrote:
>> Hum... Another way of handling all those cases and avoid memory barriers
>> would be to have different "NULL" pointers.
>>
>> Each hash chain should have a unique "NULL" pointer (in the case of UDP, it
>> can be the 128 values : [ (void*)0 .. (void *)127 ]
>>
>> Then, when performing a lookup, a reader should check the "NULL" pointer
>> it get at the end of its lookup has is the "hash" value of its chain.
>>
>> If not -> restart the loop, aka "goto begin;" :)
>>
>> We could avoid memory barriers then.
>>
>> In the two cases Corey mentioned, this trick could let us avoid memory 
>> barriers.
>> (existing one in sk_add_node_rcu(sk, &hslot->head); should be enough)
>>
>> What do you think ?
> 
> Kinky!!!  ;-)
> 
> Then the rcu_dereference() would be supplying the needed memory barriers.
> 
> Hmmm...  I guess that the only confusion would be if the element got
> removed and then added to the same list.  But then if its pointer was
> pseudo-NULL, then that would mean that all subsequent elements had been
> removed, and all preceding ones added after the scan started.
> 
> Which might well be harmless, but I must defer to you on this one at
> the moment.
> 
> If you need a larger hash table, another approach would be to set the
> pointer's low-order bit, allowing the upper bits to be a full-sized
> index -- or even a pointer to the list header.  Just make very sure
> to clear the pointer when freeing, or an element on the freelist
> could end up looking like a legitimate end of list...  Which again
> might well be safe, but why inflict this on oneself?

Well, for UDP case, hash table will be <= 65536 anyway, we can assume
no dynamic kernel memory is in the range [0 .. 65535]

Here is a patch (untested yet, its really time for a sleep for me ;) )

[PATCH] udp: Introduce special NULL pointers for hlist termination

In order to safely detect changes in chains, we would like to have different
'NULL' pointers. Each chain in hash table is terminated by an unique 'NULL'
value, so that the lockless readers can detect their lookups evaded from
their starting chain.

We define 'NULL' values as ((unsigned long)values < UDP_HTABLE_SIZE)

This saves memory barriers (a read barrier to fetch 'next' pointers
*before* checking key values) we added in commit 
96631ed16c514cf8b28fab991a076985ce378c26 (udp: introduce 
sk_for_each_rcu_safenext()) 

This also saves a missing memory barrier spotted by Corey Minyard 
(a write one in udp_lib_get_port(), between sk_hash update and ->next update)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
include/linux/list.h    |   32 ++++++++++++++++++++++++++++++++
include/linux/rculist.h |   15 ++++++++-------
include/net/sock.h      |    9 +++++++--
net/ipv4/udp.c          |   19 +++++++++++++------
net/ipv6/udp.c          |   16 ++++++++++++----
5 files changed, 72 insertions(+), 19 deletions(-)

Message ID	4908DEDE.5030706@cosmosbay.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Message-ID: <4908DEDE.5030706@cosmosbay.com> Date: Wed, 29 Oct 2008 23:08:30 +0100 From: Eric Dumazet <dada1@cosmosbay.com> User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: Corey Minyard <minyard@acm.org>, David Miller <davem@davemloft.net>, shemminger@vyatta.com, benny+usenet@amorsen.dk, netdev@vger.kernel.org, Christoph Lameter <cl@linux-foundation.org>, a.p.zijlstra@chello.nl, johnpol@2ka.mipt.ru, Christian Bell <christian@myri.com> Subject: Re: [PATCH 2/2] udp: RCU handling for Unicast packets. References: <4908627C.6030001@acm.org> <490874F2.2060306@cosmosbay.com> <49088288.6050805@acm.org> <49088AD1.7040805@cosmosbay.com> <20081029163739.GB6732@linux.vnet.ibm.com> <49089BE5.3090705@acm.org> <4908A134.4040705@cosmosbay.com> <4908AB3F.1060003@acm.org> <20081029185200.GE6732@linux.vnet.ibm.com> <4908C0CD.5050406@cosmosbay.com> <20081029201759.GF6732@linux.vnet.ibm.com> In-Reply-To: <20081029201759.GF6732@linux.vnet.ibm.com> Content-Type: multipart/mixed; boundary="------------070202070503030008050306" Sender: netdev-owner@vger.kernel.org Precedence: bulk

[2/2] udp: RCU handling for Unicast packets.

Commit Message

Comments

Patch