Message ID | 1300380801.6315.306.camel@edumazet-laptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 17 Mar 2011, Eric Dumazet wrote: > When we know we run from BH context, we can use __this_cpu_inc(), but if > we dont know or run from user/process context, we would need irqsafe_inc > variant. If the BH context is the only one where we care about performance then its ok I would think. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 17 mars 2011 à 12:46 -0500, Christoph Lameter a écrit : > On Thu, 17 Mar 2011, Eric Dumazet wrote: > > > When we know we run from BH context, we can use __this_cpu_inc(), but if > > we dont know or run from user/process context, we would need irqsafe_inc > > variant. > > If the BH context is the only one where we care about performance then its > ok I would think. Hmm... yes. By the way, I noticed : DECLARE_PER_CPU(u64, xt_u64); __this_cpu_add(xt_u64, 2) translates to following x86_32 code : mov $xt_u64,%eax add %fs:0x0,%eax addl $0x2,(%eax) adcl $0x0,0x4(%eax) I wonder why we dont use : addl $0x2,%fs:xt_u64 addcl $0x0,%fs:xt_u64+4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 17 Mar 2011, Eric Dumazet wrote: > By the way, I noticed : > > DECLARE_PER_CPU(u64, xt_u64); > __this_cpu_add(xt_u64, 2) translates to following x86_32 code : > > mov $xt_u64,%eax > add %fs:0x0,%eax > addl $0x2,(%eax) > adcl $0x0,0x4(%eax) > > > I wonder why we dont use : > > addl $0x2,%fs:xt_u64 > addcl $0x0,%fs:xt_u64+4 The compiler is fed the following *__this_cpu_ptr(xt_u64) += 2 __this_cpu_ptr makes it: *(xt_u64 + __my_cpu_offset) += 2 So the compiler calculates the address first and then increments it. The compiler could optimize this I think. Wonder why that does not happen. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 17 mars 2011 à 13:42 -0500, Christoph Lameter a écrit : > On Thu, 17 Mar 2011, Eric Dumazet wrote: > > > By the way, I noticed : > > > > DECLARE_PER_CPU(u64, xt_u64); > > __this_cpu_add(xt_u64, 2) translates to following x86_32 code : > > > > mov $xt_u64,%eax > > add %fs:0x0,%eax > > addl $0x2,(%eax) > > adcl $0x0,0x4(%eax) > > > > > > I wonder why we dont use : > > > > addl $0x2,%fs:xt_u64 > > addcl $0x0,%fs:xt_u64+4 > > The compiler is fed the following > > *__this_cpu_ptr(xt_u64) += 2 > > __this_cpu_ptr makes it: > > *(xt_u64 + __my_cpu_offset) += 2 > > So the compiler calculates the address first and then increments it. > > The compiler could optimize this I think. Wonder why that does not happen. Compiler is really forced to compute addr, thats why. Hmm, we should not fallback to generic ops I think, but tweak percpu_add_op() { ... case 8: #if CONFIG_X86_64_SMP if (pao_ID__ == 1) \ asm("incq "__percpu_arg(0) : "+m" (var)); \ else if (pao_ID__ == -1) \ asm("decq "__percpu_arg(0) : "+m" (var)); \ else \ asm("addq %1, "__percpu_arg(0) \ : "+m" (var) \ : "re" ((pao_T__)(val))); \ break; \ #else asm("addl %1, "__percpu_arg(0) \ : "+m" (var) \ : "ri" ((u32)(val))); \ asm("adcl %1, "__percpu_arg(0) \ : "+m" ((char *)var+4) \ : "ri" ((u32)(val>>32)); \ break; \ #endif .... } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 17 Mar 2011, Eric Dumazet wrote: > > > I wonder why we dont use : > > > > > > addl $0x2,%fs:xt_u64 > > > addcl $0x0,%fs:xt_u64+4 > > > > The compiler is fed the following > > > > *__this_cpu_ptr(xt_u64) += 2 > > > > __this_cpu_ptr makes it: > > > > *(xt_u64 + __my_cpu_offset) += 2 > > > > So the compiler calculates the address first and then increments it. > > > > The compiler could optimize this I think. Wonder why that does not happen. > > Compiler is really forced to compute addr, thats why. > > Hmm, we should not fallback to generic ops I think, but tweak > > percpu_add_op() { percpu_add_op() is not used. This is a 64 bit operation on a 32 bit machine thus we fall back to this_cpu_generic_to_op() #define __this_cpu_generic_to_op(pcp, val, op) \ do { \ *__this_cpu_ptr(&(pcp)) op val; \ } while (0) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/snmp.h b/include/net/snmp.h index 762e2ab..be2424d 100644 --- a/include/net/snmp.h +++ b/include/net/snmp.h @@ -149,8 +149,8 @@ struct linux_xfrm_mib { } while (0) #define SNMP_UPD_PO_STATS_BH(mib, basefield, addend) \ do { \ - __typeof__(*mib[0]) *ptr = \ - __this_cpu_ptr((mib)[!in_softirq()]); \ + __typeof__(*mib[0]) *ptr = __this_cpu_ptr((mib)[0]); \ + \ ptr->mibs[basefield##PKTS]++; \ ptr->mibs[basefield##OCTETS] += addend;\ } while (0)