Message ID | 491D5725.50006@cosmosbay.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Nov 14, 2008 at 11:47:01AM +0100, Eric Dumazet wrote: > Alexey Dobriyan a écrit : >> On Fri, Nov 14, 2008 at 10:04:24AM +0100, Eric Dumazet wrote: >>> David Miller a écrit : >>>> From: Eric Dumazet <dada1@cosmosbay.com> >>>> Date: Fri, 14 Nov 2008 09:09:31 +0100 >>>> >>>>> During tbench/oprofile sessions, I found that dst_release() was in third position. >>>> ... >>>>> Instead of first checking the refcount value, then decrement it, >>>>> we use atomic_dec_return() to help CPU to make the right memory transaction >>>>> (ie getting the cache line in exclusive mode) >>>> ... >>>>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> >>>> This looks great, applied, thanks Eric. >>>> >>> Thanks David >>> >>> >>> I think I understood some regressions here on 32bits >>> >>> offsetof(struct dst_entry, __refcnt) is 0x7c again !!! >>> >>> This is really really bad for performance >>> >>> I believe this comes from a patch from Alexey Dobriyan >>> (commit def8b4faff5ca349beafbbfeb2c51f3602a6ef3a >>> net: reduce structures when XFRM=n) >> >> Ick. > > Well, your patch is a good thing, we only need to make adjustments. > >> >>> This kills effort from Zhang Yanmin (and me...) >>> >>> (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17 >>> [NET]: Fix tbench regression in 2.6.25-rc1) >>> >>> >>> Really we must find something so that this damned __refcnt is starting at 0x80 >> >> Make it last member? > > Yes, it will help tbench, but not machines that stress IP route cache > > (dst_use() must dirty the three fields "refcnt, __use , lastuse" ) > > Also, 'next' pointer should be in the same cache line, to speedup route > cache lookups. Knowledge taken. > Next problem is that offsets depend on architecture being 32 or 64 bits. > > On 64bit, offsetof(struct dst_entry, __refcnt) is 0xb0 : not very good... I think all these constraints can be satisfied with clever rearranging of dst_entry. Let me come up with alternative patch which still reduces dst slab size. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexey Dobriyan a écrit : > On Fri, Nov 14, 2008 at 11:47:01AM +0100, Eric Dumazet wrote: >> Alexey Dobriyan a écrit : >>> On Fri, Nov 14, 2008 at 10:04:24AM +0100, Eric Dumazet wrote: >>>> David Miller a écrit : >>>>> From: Eric Dumazet <dada1@cosmosbay.com> >>>>> Date: Fri, 14 Nov 2008 09:09:31 +0100 >>>>> >>>>>> During tbench/oprofile sessions, I found that dst_release() was in third position. >>>>> ... >>>>>> Instead of first checking the refcount value, then decrement it, >>>>>> we use atomic_dec_return() to help CPU to make the right memory transaction >>>>>> (ie getting the cache line in exclusive mode) >>>>> ... >>>>>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> >>>>> This looks great, applied, thanks Eric. >>>>> >>>> Thanks David >>>> >>>> >>>> I think I understood some regressions here on 32bits >>>> >>>> offsetof(struct dst_entry, __refcnt) is 0x7c again !!! >>>> >>>> This is really really bad for performance >>>> >>>> I believe this comes from a patch from Alexey Dobriyan >>>> (commit def8b4faff5ca349beafbbfeb2c51f3602a6ef3a >>>> net: reduce structures when XFRM=n) >>> Ick. >> Well, your patch is a good thing, we only need to make adjustments. >> >>>> This kills effort from Zhang Yanmin (and me...) >>>> >>>> (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17 >>>> [NET]: Fix tbench regression in 2.6.25-rc1) >>>> >>>> >>>> Really we must find something so that this damned __refcnt is starting at 0x80 >>> Make it last member? >> Yes, it will help tbench, but not machines that stress IP route cache >> >> (dst_use() must dirty the three fields "refcnt, __use , lastuse" ) >> >> Also, 'next' pointer should be in the same cache line, to speedup route >> cache lookups. > > Knowledge taken. > >> Next problem is that offsets depend on architecture being 32 or 64 bits. >> >> On 64bit, offsetof(struct dst_entry, __refcnt) is 0xb0 : not very good... > > I think all these constraints can be satisfied with clever rearranging of dst_entry. > Let me come up with alternative patch which still reduces dst slab size. You cannot reduce size, and it doesnt matter, since we use dst_entry inside rtable and rtable is using SLAB_HWCACHE_ALIGN kmem_cachep : we have many bytes available. After patch on 32 bits sizeof(struct rtable)=244 (12 bytes left) Same for other containers. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 14, 2008 at 12:43:06PM +0100, Eric Dumazet wrote: > Alexey Dobriyan a écrit : >> On Fri, Nov 14, 2008 at 11:47:01AM +0100, Eric Dumazet wrote: >>> Alexey Dobriyan a écrit : >>>> On Fri, Nov 14, 2008 at 10:04:24AM +0100, Eric Dumazet wrote: >>>>> David Miller a écrit : >>>>>> From: Eric Dumazet <dada1@cosmosbay.com> >>>>>> Date: Fri, 14 Nov 2008 09:09:31 +0100 >>>>>> >>>>>>> During tbench/oprofile sessions, I found that dst_release() was in third position. >>>>>> ... >>>>>>> Instead of first checking the refcount value, then decrement it, >>>>>>> we use atomic_dec_return() to help CPU to make the right memory transaction >>>>>>> (ie getting the cache line in exclusive mode) >>>>>> ... >>>>>>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> >>>>>> This looks great, applied, thanks Eric. >>>>>> >>>>> Thanks David >>>>> >>>>> >>>>> I think I understood some regressions here on 32bits >>>>> >>>>> offsetof(struct dst_entry, __refcnt) is 0x7c again !!! >>>>> >>>>> This is really really bad for performance >>>>> >>>>> I believe this comes from a patch from Alexey Dobriyan >>>>> (commit def8b4faff5ca349beafbbfeb2c51f3602a6ef3a >>>>> net: reduce structures when XFRM=n) >>>> Ick. >>> Well, your patch is a good thing, we only need to make adjustments. >>> >>>>> This kills effort from Zhang Yanmin (and me...) >>>>> >>>>> (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17 >>>>> [NET]: Fix tbench regression in 2.6.25-rc1) >>>>> >>>>> >>>>> Really we must find something so that this damned __refcnt is starting at 0x80 >>>> Make it last member? >>> Yes, it will help tbench, but not machines that stress IP route cache >>> >>> (dst_use() must dirty the three fields "refcnt, __use , lastuse" ) >>> >>> Also, 'next' pointer should be in the same cache line, to speedup route >>> cache lookups. >> >> Knowledge taken. >> >>> Next problem is that offsets depend on architecture being 32 or 64 bits. >>> >>> On 64bit, offsetof(struct dst_entry, __refcnt) is 0xb0 : not very good... >> >> I think all these constraints can be satisfied with clever rearranging of dst_entry. >> Let me come up with alternative patch which still reduces dst slab size. > > You cannot reduce size, and it doesnt matter, since we use dst_entry inside rtable > and rtable is using SLAB_HWCACHE_ALIGN kmem_cachep : we have many bytes available. > > After patch on 32 bits > > sizeof(struct rtable)=244 (12 bytes left) > > Same for other containers. Hmm, indeed. I tried moving __refcnt et al to the very beginning, but it seems to make things worse (on x86_64, almost within statistical error). And there is no way to use offset_of() inside struct definition. :-( -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexey Dobriyan a écrit : > Hmm, indeed. > > I tried moving __refcnt et al to the very beginning, but it seems to make > things worse (on x86_64, almost within statistical error). > > And there is no way to use offset_of() inside struct definition. :-( Yes, it is important that the beginning of structure contain read mostly fields. refcnt being the most written field (incremented / decremented for each packet), it is really important to move it outside of the first 128 bytes (192 bytes on 64 bit arches) of dst_entry I wonder if some real hot dst_entries could be splitted (one copy for each stream), to reduce ping-pongs. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <dada1@cosmosbay.com> Date: Fri, 14 Nov 2008 11:47:01 +0100 > [PATCH] net: make sure struct dst_entry refcount is aligned on 64 bytes Applied to net-next-2.6, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/dst.h b/include/net/dst.h index 65a60fa..6c77879 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -61,6 +61,8 @@ struct dst_entry struct hh_cache *hh; #ifdef CONFIG_XFRM struct xfrm_state *xfrm; +#else + void *__pad1; #endif int (*input)(struct sk_buff*); int (*output)(struct sk_buff*); @@ -71,8 +73,20 @@ struct dst_entry #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; +#else + __u32 __pad2; #endif + + /* + * Align __refcnt to a 64 bytes alignment + * (L1_CACHE_SIZE would be too much) + */ +#ifdef CONFIG_64BIT + long __pad_to_align_refcnt[2]; +#else + long __pad_to_align_refcnt[1]; +#endif /* * __refcnt wants to be on a different cache line from * input/output/ops or performance tanks badly @@ -157,6 +171,11 @@ dst_metric_locked(struct dst_entry *dst, int metric) static inline void dst_hold(struct dst_entry * dst) { + /* + * If your kernel compilation stops here, please check + * __pad_to_align_refcnt declaration in struct dst_entry + */ + BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63); atomic_inc(&dst->__refcnt); }