diff mbox

[RFC] percpu: convert SNMP mibs to new infra

Message ID 49D3B61F.8010507@cosmosbay.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet April 1, 2009, 6:44 p.m. UTC
Eric Dumazet a écrit :
> Ingo Molnar a écrit :
>>
>> [ Btw., it's definitely cool that you will make heavy use for it for 
>>   SNMP mib statistics - please share with us your experiences with 
>>   the facilities - good or bad experiences alike! ]
> 
> I tried but I miss kind of an indirect percpu_add() function.
> 
> because of Net namespaces, mibs are dynamically allocated, and
> current percpu_add() works on static percpu only (because of added
> per_cpu__ prefix)
> 
> #define percpu_add(var, val)   percpu_to_op("add", per_cpu__##var, val)
> 
> I tried adding :
> 
> #define dyn_percpu_add(var, val)   percpu_to_op("add", var, val)
> 
> But I dont know it this is the plan ?
> Should we get rid of "per_cpu__" prefix and use a special ELF section/
> marker instead ?
> 

Here is a preliminary patch for SNMP mibs that seems to work well on x86_32

[RFC] percpu: convert SNMP mibs to new infra

Some arches can use percpu infrastructure for safe changes to mibs.
(percpu_add() is safe against preemption and interrupts), but
we want the real thing (a single instruction), not an emulation.

On arches still using an emulation, its better to keep the two views
per mib and preemption disable/enable

This shrinks size of mibs by 50%, but also shrinks vmlinux text size
(minimum IPV4 config)

$ size vmlinux.old vmlinux.new
   text    data     bss     dec     hex filename
4308458  561092 1728512 6598062  64adae vmlinux.old
4303834  561092 1728512 6593438  649b9e vmlinux.new



Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 arch/x86/include/asm/percpu.h |    3 +++
 include/net/snmp.h            |   27 ++++++++++++++++++++++-----
 net/ipv4/af_inet.c            |   28 +++++++++++++++++++---------
 3 files changed, 44 insertions(+), 14 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tejun Heo April 2, 2009, 12:13 a.m. UTC | #1
Hello, Eric, Ingo.

Eric Dumazet wrote:
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index aee103b..6b82f6b 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -135,6 +135,9 @@ do {							\
>  #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
>  #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
>  #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
> +#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
> +#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
> +#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
>  #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
>  #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
>  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)

The final goal is to unify static and dynamic accesses but we aren't
there yet, so, for the time being, we'll need some interim solutions.
I would prefer percpu_ptr_add() tho.

Thanks.
Ingo Molnar April 2, 2009, 4:05 a.m. UTC | #2
* Tejun Heo <htejun@gmail.com> wrote:

> Hello, Eric, Ingo.
> 
> Eric Dumazet wrote:
> > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > index aee103b..6b82f6b 100644
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -135,6 +135,9 @@ do {							\
> >  #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
> >  #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
> >  #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
> > +#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
> > +#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
> > +#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
> >  #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
> >  #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
> >  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
> 
> The final goal is to unify static and dynamic accesses but we 
> aren't there yet, so, for the time being, we'll need some interim 
> solutions. I would prefer percpu_ptr_add() tho.

Yep, that's the standard naming scheme for new APIs: generic to 
specific, left to right.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell April 2, 2009, 5:04 a.m. UTC | #3
On Thursday 02 April 2009 05:14:47 Eric Dumazet wrote:
> Here is a preliminary patch for SNMP mibs that seems to work well on x86_32
> 
> [RFC] percpu: convert SNMP mibs to new infra

OK, I have a whole heap of "convert to dynamic per-cpu" patches waiting in
the wings too, once Tejun's conversion is complete.

Also, what is optimal depends on the arch: we had a long discussion on this
(it's what local_t was supposed to do, with cpu_local_inc() etc: see
Subject: local_add_return 2008-12-16 thread).

eg. on S/390, atomic_inc is a win over the two-counter version.  On Sparc,
two-counter wins.  On x86, inc wins (obviously).

But efforts to create a single primitive have been problematic: maybe
open-coding it like this is the Right Thing.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 2, 2009, 5:19 a.m. UTC | #4
Rusty Russell a écrit :
> On Thursday 02 April 2009 05:14:47 Eric Dumazet wrote:
>> Here is a preliminary patch for SNMP mibs that seems to work well on x86_32
>>
>> [RFC] percpu: convert SNMP mibs to new infra
> 
> OK, I have a whole heap of "convert to dynamic per-cpu" patches waiting in
> the wings too, once Tejun's conversion is complete.
> 
> Also, what is optimal depends on the arch: we had a long discussion on this
> (it's what local_t was supposed to do, with cpu_local_inc() etc: see
> Subject: local_add_return 2008-12-16 thread).
> 
> eg. on S/390, atomic_inc is a win over the two-counter version.  On Sparc,
> two-counter wins.  On x86, inc wins (obviously).
> 
> But efforts to create a single primitive have been problematic: maybe
> open-coding it like this is the Right Thing.
> 

I tried to find a generic CONFIG_ define that would annonce that an arche
has a fast percpu_add() implementation. (faster than __raw_get_cpu_var,
for example, when we already are in a preempt disabled section)

Any idea ?


For example, net/ipv4/route.c has :

static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
#define RT_CACHE_STAT_INC(field) \
        (__raw_get_cpu_var(rt_cache_stat).field++)

We could use percpu_add(rt_cache_stat.field, 1) instead, only if percpu_add()
is not the generic one.

#define __percpu_generic_to_op(var, val, op)                            \
do {                                                                    \
        get_cpu_var(var) op val;                                        \
        put_cpu_var(var);                                               \
} while (0)
#ifndef percpu_add
# define percpu_add(var, val)           __percpu_generic_to_op(var, (val), +=)
#endif

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell April 2, 2009, 11:46 a.m. UTC | #5
On Thursday 02 April 2009 15:49:19 Eric Dumazet wrote:
> Rusty Russell a écrit :
> > eg. on S/390, atomic_inc is a win over the two-counter version.  On Sparc,
> > two-counter wins.  On x86, inc wins (obviously).
> > 
> > But efforts to create a single primitive have been problematic: maybe
> > open-coding it like this is the Right Thing.
> 
> I tried to find a generic CONFIG_ define that would annonce that an arche
> has a fast percpu_add() implementation. (faster than __raw_get_cpu_var,
> for example, when we already are in a preempt disabled section)

Nope, we don't have one.  It was supposed to work like this:
	DEFINE_PER_CPU(local_t, counter);

	cpu_local_inc(counter);

That would do incl in x86, local_t could even be a long[3] (one for hardirq,
one for softirq, one for user context).  But there were issues:

1) It didn't work on dynamic percpu allocs, which was much of the interesting
   use (Tejun is fixing this bit right now)
2) The x86 version wasn't optimized anyway,
3) Everyone did atomic_long_inc(), so the ftrace code assumed it would be nmi
   safe (tho atomic_t isn't nmi-safe on some archs anyway), so the long[3]
   method would break them,
4) The long[3] version was overkill for networking, which doesn't need hardirq
   so we'd want another variant of local_t plus all the ops,
5) Some people didn't want long: Christoph had a more generic but more complex
   version,
6) It's still not used anywhere in the tree (tho local_t is), so there's no
   reason to stick to the current semantics.

> For example, net/ipv4/route.c has :
> 
> static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
> #define RT_CACHE_STAT_INC(field) \
>         (__raw_get_cpu_var(rt_cache_stat).field++)
> 
> We could use percpu_add(rt_cache_stat.field, 1) instead, only if percpu_add()
> is not the generic one.

Yep, but this one is different from the SNMP stats which needs softirq vs
user context safety.  This is where I start wondering how many interfaces
we're going to have...

Sorry to add more questions than answers :(
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..6b82f6b 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -135,6 +135,9 @@  do {							\
 #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
 #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
 #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
+#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
+#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
+#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
 #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
 #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
 #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 57c9362..ef9ed31 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -123,15 +123,31 @@  struct linux_xfrm_mib {
 };
 
 /* 
- * FIXME: On x86 and some other CPUs the split into user and softirq parts
+ * On x86 and some other CPUs the split into user and softirq parts
  * is not needed because addl $1,memory is atomic against interrupts (but 
- * atomic_inc would be overkill because of the lock cycles). Wants new 
- * nonlocked_atomic_inc() primitives -AK
+ * atomic_inc would be overkill because of the lock cycles).
  */ 
+#ifdef CONFIG_X86
+# define SNMP_ARRAY_SZ 1
+#else
+# define SNMP_ARRAY_SZ 2
+#endif
+
 #define DEFINE_SNMP_STAT(type, name)	\
-	__typeof__(type) *name[2]
+	__typeof__(type) *name[SNMP_ARRAY_SZ]
 #define DECLARE_SNMP_STAT(type, name)	\
-	extern __typeof__(type) *name[2]
+	extern __typeof__(type) *name[SNMP_ARRAY_SZ]
+
+#if SNMP_ARRAY_SZ == 1
+#define SNMP_INC_STATS(mib, field)	indir_percpu_inc(&mib[0]->mibs[field])
+#define SNMP_INC_STATS_BH(mib, field)	SNMP_INC_STATS(mib, field)
+#define SNMP_INC_STATS_USER(mib, field) SNMP_INC_STATS(mib, field)
+#define SNMP_DEC_STATS(mib, field)	indir_percpu_dec(&mib[0]->mibs[field])
+#define SNMP_ADD_STATS_BH(mib, field, addend) 	\
+				indir_percpu_add(&mib[0]->mibs[field], addend)
+#define SNMP_ADD_STATS_USER(mib, field, addend) 	\
+				indir_percpu_add(&mib[0]->mibs[field], addend)
+#else
 
 #define SNMP_STAT_BHPTR(name)	(name[0])
 #define SNMP_STAT_USRPTR(name)	(name[1])
@@ -160,5 +176,6 @@  struct linux_xfrm_mib {
 		per_cpu_ptr(mib[1], get_cpu())->mibs[field] += addend; \
 		put_cpu(); \
 	} while (0)
+#endif
 
 #endif
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7f03373..badb568 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1366,27 +1366,37 @@  unsigned long snmp_fold_field(void *mib[], int offt)
 
 	for_each_possible_cpu(i) {
 		res += *(((unsigned long *) per_cpu_ptr(mib[0], i)) + offt);
+#if SNMP_ARRAY_SZ == 2
 		res += *(((unsigned long *) per_cpu_ptr(mib[1], i)) + offt);
+#endif
 	}
 	return res;
 }
 EXPORT_SYMBOL_GPL(snmp_fold_field);
 
-int snmp_mib_init(void *ptr[2], size_t mibsize)
+int snmp_mib_init(void *ptr[SNMP_ARRAY_SZ], size_t mibsize)
 {
 	BUG_ON(ptr == NULL);
 	ptr[0] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
 	if (!ptr[0])
-		goto err0;
+		return -ENOMEM;
+#if SNMP_ARRAY_SZ == 2
 	ptr[1] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
-	if (!ptr[1])
-		goto err1;
+	if (!ptr[1]) {
+		free_percpu(ptr[0]);
+		ptr[0] = NULL;
+		return -ENOMEM;
+	}
+#endif
+	{
+	int i;
+	printk(KERN_INFO "snmp_mib_init(%u) %p ", (unsigned int)mibsize, ptr[0]);
+	for_each_possible_cpu(i) {
+		printk(KERN_INFO "%p ", per_cpu_ptr(ptr[0], i));
+		}
+	printk(KERN_INFO "\n");
+	}
 	return 0;
-err1:
-	free_percpu(ptr[0]);
-	ptr[0] = NULL;
-err0:
-	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(snmp_mib_init);