diff mbox

Poll about irqsafe_cpu_add and others

Message ID 1300380801.6315.306.camel@edumazet-laptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet March 17, 2011, 4:53 p.m. UTC
Le jeudi 17 mars 2011 à 10:18 -0500, Christoph Lameter a écrit :
> On Thu, 17 Mar 2011, David Miller wrote:
> 
> >
> > I had been meaning to bring this up from another perspective.
> >
> > In networking, we often only ever access objects in base or
> > BH context.  Therefore in BH context cases we can do just
> > normal counter bumps without any of the special atomic or
> > IRQ disabling code at all.
> 
> We have the __ functions for that purpose. __this_cpu_inc f.e. falls back
> to a simply ++ operation if the arch cannot provide something better.
> irqsafe_xx are only used if the context does not provide any protection
> and if there is the potential of the counter being incremented from an
> interrupt context.
> 

What David and I have in mind is to use one array per mib instead of
two. This is an old idea.

https://patchwork.kernel.org/patch/15883/

When we know we run from BH context, we can use __this_cpu_inc(), but if
we dont know or run from user/process context, we would need irqsafe_inc
variant.

For x86 this maps to same single instruction, but for other arches, this
might be too expensive.

BTW, I think following patch is possible to save some text and useless
tests (on 64bit platform at least)

size vmlinux.old vmlinux

Thanks

[PATCH] snmp: SNMP_UPD_PO_STATS_BH() always called from softirq

We dont need to test if we run from softirq context.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
---
 include/net/snmp.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Lameter (Ampere) March 17, 2011, 5:46 p.m. UTC | #1
On Thu, 17 Mar 2011, Eric Dumazet wrote:

> When we know we run from BH context, we can use __this_cpu_inc(), but if
> we dont know or run from user/process context, we would need irqsafe_inc
> variant.

If the BH context is the only one where we care about performance then its
ok I would think.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 17, 2011, 6:29 p.m. UTC | #2
Le jeudi 17 mars 2011 à 12:46 -0500, Christoph Lameter a écrit :
> On Thu, 17 Mar 2011, Eric Dumazet wrote:
> 
> > When we know we run from BH context, we can use __this_cpu_inc(), but if
> > we dont know or run from user/process context, we would need irqsafe_inc
> > variant.
> 
> If the BH context is the only one where we care about performance then its
> ok I would think.

Hmm... yes.

By the way, I noticed :

DECLARE_PER_CPU(u64, xt_u64);
__this_cpu_add(xt_u64, 2) translates to following x86_32 code :

	mov	$xt_u64,%eax
	add	%fs:0x0,%eax
	addl	$0x2,(%eax)
	adcl	$0x0,0x4(%eax)


I wonder why we dont use :

	addl	$0x2,%fs:xt_u64
	addcl	$0x0,%fs:xt_u64+4



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) March 17, 2011, 6:42 p.m. UTC | #3
On Thu, 17 Mar 2011, Eric Dumazet wrote:

> By the way, I noticed :
>
> DECLARE_PER_CPU(u64, xt_u64);
> __this_cpu_add(xt_u64, 2) translates to following x86_32 code :
>
> 	mov	$xt_u64,%eax
> 	add	%fs:0x0,%eax
> 	addl	$0x2,(%eax)
> 	adcl	$0x0,0x4(%eax)
>
>
> I wonder why we dont use :
>
> 	addl	$0x2,%fs:xt_u64
> 	addcl	$0x0,%fs:xt_u64+4

The compiler is fed the following

	*__this_cpu_ptr(xt_u64) += 2

__this_cpu_ptr makes it:

	*(xt_u64 + __my_cpu_offset) += 2

So the compiler calculates the address first and then increments it.

The compiler could optimize this I think. Wonder why that does not happen.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 17, 2011, 6:55 p.m. UTC | #4
Le jeudi 17 mars 2011 à 13:42 -0500, Christoph Lameter a écrit :
> On Thu, 17 Mar 2011, Eric Dumazet wrote:
> 
> > By the way, I noticed :
> >
> > DECLARE_PER_CPU(u64, xt_u64);
> > __this_cpu_add(xt_u64, 2) translates to following x86_32 code :
> >
> > 	mov	$xt_u64,%eax
> > 	add	%fs:0x0,%eax
> > 	addl	$0x2,(%eax)
> > 	adcl	$0x0,0x4(%eax)
> >
> >
> > I wonder why we dont use :
> >
> > 	addl	$0x2,%fs:xt_u64
> > 	addcl	$0x0,%fs:xt_u64+4
> 
> The compiler is fed the following
> 
> 	*__this_cpu_ptr(xt_u64) += 2
> 
> __this_cpu_ptr makes it:
> 
> 	*(xt_u64 + __my_cpu_offset) += 2
> 
> So the compiler calculates the address first and then increments it.
> 
> The compiler could optimize this I think. Wonder why that does not happen.

Compiler is really forced to compute addr, thats why.

Hmm, we should not fallback to generic ops I think, but tweak 

percpu_add_op() { 
...
	case 8: 
#if CONFIG_X86_64_SMP
                if (pao_ID__ == 1)                                      \
                        asm("incq "__percpu_arg(0) : "+m" (var));       \
                else if (pao_ID__ == -1)                                \
                        asm("decq "__percpu_arg(0) : "+m" (var));       \
                else                                                    \
                        asm("addq %1, "__percpu_arg(0)                  \
                            : "+m" (var)                                \
                            : "re" ((pao_T__)(val)));                   \
                break;                                                  \

#else
                asm("addl %1, "__percpu_arg(0)                  	\
                            : "+m" (var)                                \
                            : "ri" ((u32)(val)));                   	\
		asm("adcl %1, "__percpu_arg(0) 				\
			: "+m" ((char *)var+4)				\
			: "ri" ((u32)(val>>32));			\
                break;                                                  \

#endif
....
}






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) March 17, 2011, 7:21 p.m. UTC | #5
On Thu, 17 Mar 2011, Eric Dumazet wrote:

> > > I wonder why we dont use :
> > >
> > > 	addl	$0x2,%fs:xt_u64
> > > 	addcl	$0x0,%fs:xt_u64+4
> >
> > The compiler is fed the following
> >
> > 	*__this_cpu_ptr(xt_u64) += 2
> >
> > __this_cpu_ptr makes it:
> >
> > 	*(xt_u64 + __my_cpu_offset) += 2
> >
> > So the compiler calculates the address first and then increments it.
> >
> > The compiler could optimize this I think. Wonder why that does not happen.
>
> Compiler is really forced to compute addr, thats why.
>
> Hmm, we should not fallback to generic ops I think, but tweak
>
> percpu_add_op() {

percpu_add_op() is not used. This is a 64 bit operation on a 32 bit
machine thus we fall back to this_cpu_generic_to_op()

#define __this_cpu_generic_to_op(pcp, val, op)                          \
do {                                                                    \
        *__this_cpu_ptr(&(pcp)) op val;                                 \
} while (0)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/snmp.h b/include/net/snmp.h
index 762e2ab..be2424d 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -149,8 +149,8 @@  struct linux_xfrm_mib {
 	} while (0)
 #define SNMP_UPD_PO_STATS_BH(mib, basefield, addend)	\
 	do { \
-		__typeof__(*mib[0]) *ptr = \
-			__this_cpu_ptr((mib)[!in_softirq()]); \
+		__typeof__(*mib[0]) *ptr = __this_cpu_ptr((mib)[0]);	\
+									\
 		ptr->mibs[basefield##PKTS]++; \
 		ptr->mibs[basefield##OCTETS] += addend;\
 	} while (0)