From patchwork Thu Dec 16 16:53:56 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 75780 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5DDB7100912 for ; Fri, 17 Dec 2010 03:54:30 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754771Ab0LPQyH (ORCPT ); Thu, 16 Dec 2010 11:54:07 -0500 Received: from mail-bw0-f45.google.com ([209.85.214.45]:49249 "EHLO mail-bw0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754759Ab0LPQyE (ORCPT ); Thu, 16 Dec 2010 11:54:04 -0500 Received: by bwz16 with SMTP id 16so3789634bwz.4 for ; Thu, 16 Dec 2010 08:54:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=rSkdPJh3RH3rHsMdiy8Q54y5JOpXo9jItB8BIfaKmzQ=; b=H+LHZLZ5xn51JSyedHbVdx4udELj7kyMf2Rg/2zlUQy3bE+tAM8FNzZU9TyYr8wv0Q IbNLo1vGby4mf4wPVakR9/oOZqerJRA6ZF6RcK+IzfE7/MqQ8CBity71WEjZCpnAPeAa AEnUYMnUGWgP7CDX07FIm96npxDvzKJLDaMD4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=Cv9qZ+Bz6MAXvOeMQOKAfcDq3smilvt/9isyD3992kbzPIkZPq2y9tC3k2Entcylp7 Eww0OSYZzoKKGgMtTTbFRcBVb7ChAi491ouTXY5nnN8rEMaMSkTsOnIbXx/E7H45aFQJ C8UyCGh/jeNHuWuLIfme2b+KsoIzTSxos/UDk= Received: by 10.204.56.194 with SMTP id z2mr2810937bkg.81.1292518441824; Thu, 16 Dec 2010 08:54:01 -0800 (PST) Received: from [10.150.51.216] (gw0.net.jmsp.net [212.23.165.14]) by mx.google.com with ESMTPS id q18sm1269475bka.3.2010.12.16.08.53.59 (version=SSLv3 cipher=RC4-MD5); Thu, 16 Dec 2010 08:54:00 -0800 (PST) Subject: [PATCH v2 net-next-2.6] netfilter: ip_tables: dont block BH while reading counters From: Eric Dumazet To: Jesper Dangaard Brouer Cc: Patrick McHardy , Arnaldo Carvalho de Melo , Steven Rostedt , Alexander Duyck , Stephen Hemminger , netfilter-devel , netdev , Peter P Waskiewicz Jr In-Reply-To: <1292515625.2883.336.camel@edumazet-laptop> References: <1292337974.9155.68.camel@firesoul.comx.local> <1292340702.5934.5.camel@edumazet-laptop> <1292342958.9155.91.camel@firesoul.comx.local> <1292343855.5934.27.camel@edumazet-laptop> <1292508266.31289.12.camel@firesoul.comx.local> <1292508733.2883.152.camel@edumazet-laptop> <1292509489.31289.20.camel@firesoul.comx.local> <1292509775.2883.187.camel@edumazet-laptop> <1292511761.2883.236.camel@edumazet-laptop> <1292515625.2883.336.camel@edumazet-laptop> Date: Thu, 16 Dec 2010 17:53:56 +0100 Message-ID: <1292518436.2883.393.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Le jeudi 16 décembre 2010 à 17:07 +0100, Eric Dumazet a écrit : > Here is a tested version : no need for a (buggy in previous patch) > memset() if we use vzalloc() > > Note : We miss a this_cpu_write_seqcount_begin() interface. > I'll bug lkml to get it asap. Well, we have a faster solution : Add seqcount in "struct xt_info_lock" so that we make the increment pair once per table, not once per rule, and we already have the seq address, so no need for this_cpu_write_seqcount_begin() interface. [PATCH v2 net-next-2.6] netfilter: ip_tables: dont block BH while reading counters Using "iptables -L" with a lot of rules might have a too big BH latency. Jesper mentioned ~6 ms and worried of frame drops. Switch to a per_cpu seqcount scheme, so that taking a snapshot of counters doesnt need to block BH (for this cpu, but also other cpus). Reported-by: Jesper Dangaard Brouer Signed-off-by: Eric Dumazet --- include/linux/netfilter/x_tables.h | 9 ++++- net/ipv4/netfilter/ip_tables.c | 45 ++++++++------------------- 2 files changed, 21 insertions(+), 33 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 742bec0..7027762 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -473,6 +473,7 @@ extern void xt_free_table_info(struct xt_table_info *info); */ struct xt_info_lock { spinlock_t lock; + seqcount_t seq; unsigned char readers; }; DECLARE_PER_CPU(struct xt_info_lock, xt_info_locks); @@ -496,16 +497,20 @@ static inline void xt_info_rdlock_bh(void) local_bh_disable(); lock = &__get_cpu_var(xt_info_locks); - if (likely(!lock->readers++)) + if (likely(!lock->readers++)) { spin_lock(&lock->lock); + write_seqcount_begin(&lock->seq); + } } static inline void xt_info_rdunlock_bh(void) { struct xt_info_lock *lock = &__get_cpu_var(xt_info_locks); - if (likely(!--lock->readers)) + if (likely(!--lock->readers)) { + write_seqcount_end(&lock->seq); spin_unlock(&lock->lock); + } local_bh_enable(); } diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index a846d63..7fe3d7c 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -884,42 +884,25 @@ get_counters(const struct xt_table_info *t, struct ipt_entry *iter; unsigned int cpu; unsigned int i; - unsigned int curcpu = get_cpu(); - - /* Instead of clearing (by a previous call to memset()) - * the counters and using adds, we set the counters - * with data used by 'current' CPU. - * - * Bottom half has to be disabled to prevent deadlock - * if new softirq were to run and call ipt_do_table - */ - local_bh_disable(); - i = 0; - xt_entry_foreach(iter, t->entries[curcpu], t->size) { - SET_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); - ++i; - } - local_bh_enable(); - /* Processing counters from other cpus, we can let bottom half enabled, - * (preemption is disabled) - */ for_each_possible_cpu(cpu) { - if (cpu == curcpu) - continue; + seqcount_t *seq = &per_cpu(xt_info_locks, cpu).seq; + i = 0; - local_bh_disable(); - xt_info_wrlock(cpu); xt_entry_foreach(iter, t->entries[cpu], t->size) { - ADD_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); + u64 bcnt, pcnt; + unsigned int start; + + do { + start = read_seqcount_begin(seq); + bcnt = iter->counters.bcnt; + pcnt = iter->counters.pcnt; + } while (read_seqcount_retry(seq, start)); + + ADD_COUNTER(counters[i], bcnt, pcnt); ++i; /* macro does multi eval of i */ } - xt_info_wrunlock(cpu); - local_bh_enable(); } - put_cpu(); } static struct xt_counters *alloc_counters(const struct xt_table *table) @@ -932,7 +915,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table) (other than comefrom, which userspace doesn't care about). */ countersize = sizeof(struct xt_counters) * private->number; - counters = vmalloc(countersize); + counters = vzalloc(countersize); if (counters == NULL) return ERR_PTR(-ENOMEM); @@ -1203,7 +1186,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks, struct ipt_entry *iter; ret = 0; - counters = vmalloc(num_counters * sizeof(struct xt_counters)); + counters = vzalloc(num_counters * sizeof(struct xt_counters)); if (!counters) { ret = -ENOMEM; goto out;