netfilter: use per-cpu spinlock rather than RCU

Stephen Hemminger a écrit :
> On Tue, 14 Apr 2009 16:23:33 +0200
> Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
>> Patrick McHardy a écrit :
>>> Stephen Hemminger wrote:
>>>> This is an alternative version of ip/ip6/arp tables locking using
>>>> per-cpu locks.  This avoids the overhead of synchronize_net() during
>>>> update but still removes the expensive rwlock in earlier versions.
>>>>
>>>> The idea for this came from an earlier version done by Eric Duzamet.
>>>> Locking is done per-cpu, the fast path locks on the current cpu
>>>> and updates counters.  The slow case involves acquiring the locks on
>>>> all cpu's.
>>>>
>>>> The mutex that was added for 2.6.30 in xt_table is unnecessary since
>>>> there already is a mutex for xt[af].mutex that is held.
>>>>
>>>> Tested basic functionality (add/remove/list), but don't have test cases
>>>> for stress, ip6tables or arptables.
>>> Thanks Stephen, I'll do some testing with ip6tables.
>> Here is the patch I cooked on top of Stephen one to get proper locking.
> 
> I see no demonstrated problem with locking in my version.

Yes, I did not crash any machine around there, should we wait for a bug report ? :)

> The reader/writer race is already handled. On replace the race of
> 
> CPU 0                          CPU 1
>                            lock (iptables(1))
>                            refer to oldinfo
> swap in new info
> foreach CPU
>    lock iptables(i)
>    (spin)                  unlock(iptables(1))
>    read oldinfo
>    unlock
> ...
> 
> The point is my locking works, you just seem to feel more comfortable with
> a global "stop all CPU's" solution.

Oh right, I missed that xt_replace_table() was *followed* by a get_counters()
call, but I am pretty sure something is needed in xt_replace_table().

A memory barrier at least (smp_wmb())

As soon as we do "table->private = newinfo;", other cpus might fetch incorrect
values for newinfo->fields.

In the past, we had a write_lock_bh()/write_unlock_bh() pair that was
doing this for us.
Then we had rcu_assign_pointer() that also had this memory barrier implied.

Even if vmalloc() calls we do before calling xt_replace_table() probably
already force barriers, add one for reference, just in case we change callers
logic to call kmalloc() instead of vmalloc() or whatever...

> 
>> In the "iptables -L" case, we freeze updates on all cpus to get previous
>> RCU behavior (not sure it is mandatory, but anyway...)
> 
> No, it isn't. Because the code in get_counters will fetch all CPU's.

Previous to RCU conversion, we had a rwlock.

Doing a write_lock_bh() on it while reading counters (iptables -L)
*did* stop all cpus from doing their read_lock_bh() and counters updates.

After RCU and your last patch, an "iptables -L" locks each table one by one.

This is correct, since a cpu wont update its table while we are fetching it,
but we lost previous "rwlock freeze all" behavior, and some apps/users could
complain about it, this is why I said "not sure it is mandatory"...

Here is an updated patch ontop of yours, with the smp_wmb() in xt_replace_table() :

Thank you

 include/linux/netfilter/x_tables.h |    5 ++
 net/ipv4/netfilter/arp_tables.c    |   20 +++------
 net/ipv4/netfilter/ip_tables.c     |   24 ++++-------
 net/ipv6/netfilter/ip6_tables.c    |   24 ++++-------
 net/netfilter/x_tables.c           |   57 ++++++++++++++++++++++++++-
 5 files changed, 86 insertions(+), 44 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	49E4B0A5.70404@cosmosbay.com
State	Not Applicable, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 0697CDE0DF for <patchwork-incoming@ozlabs.org>; Wed, 15 Apr 2009 01:51:55 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754811AbZDNPvn (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Tue, 14 Apr 2009 11:51:43 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754062AbZDNPvn (ORCPT <rfc822; netdev-outgoing>); Tue, 14 Apr 2009 11:51:43 -0400 Received: from gw1.cosmosbay.com ([212.99.114.194]:52506 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753581AbZDNPvl convert rfc822-to-8bit (ORCPT <rfc822;netdev@vger.kernel.org>); Tue, 14 Apr 2009 11:51:41 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n3EFnvl6020974; Tue, 14 Apr 2009 17:49:58 +0200 Message-ID: <49E4B0A5.70404@cosmosbay.com> Date: Tue, 14 Apr 2009 17:49:57 +0200 From: Eric Dumazet <dada1@cosmosbay.com> User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Stephen Hemminger <shemminger@vyatta.com> CC: Patrick McHardy <kaber@trash.net>, paulmck@linux.vnet.ibm.com, David Miller <davem@davemloft.net>, paulus@samba.org, mingo@elte.hu, torvalds@linux-foundation.org, laijs@cn.fujitsu.com, jeff.chua.linux@gmail.com, jengelh@medozas.de, r000n@r000n.net, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, benh@kernel.crashing.org Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU References: <20090411174801.GG6822@linux.vnet.ibm.com> <18913.53699.544083.320542@cargo.ozlabs.ibm.com> <20090412173108.GO6822@linux.vnet.ibm.com> <20090412.181330.23529546.davem@davemloft.net> <20090413040413.GQ6822@linux.vnet.ibm.com> <20090413095309.631cf395@nehalam> <49E48136.5060700@trash.net> <49E49C65.8060808@cosmosbay.com> <20090414074554.7fa73e2f@nehalam> In-Reply-To: <20090414074554.7fa73e2f@nehalam> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Tue, 14 Apr 2009 17:50:01 +0200 (CEST) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

netfilter: use per-cpu spinlock rather than RCU

Commit Message

Comments

Patch