Message ID | 1343383283.2626.12691.camel@edumazet-glaptop |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet <eric.dumazet@gmail.com> writes: > Now IP route cache is removed, we should make sure fib structures > cant share cache lines with possibly often dirtied objects. > > On x86, kmalloc-96 cache can be source of such problems. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > +static inline void *fib_zalloc(size_t size) > +{ > + /* We want to avoid possible false sharing */ > + return kzalloc(max_t(size_t, 128, size), GFP_KERNEL); Why the hard coded 128 here? It seems more portable and obvious to do return kzalloc(round_up(size, L1_CACHE_BYTES), GFP_KERNEL); > +} Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-07-27 at 07:53 -0700, Eric W. Biederman wrote: > Eric Dumazet <eric.dumazet@gmail.com> writes: > > > Now IP route cache is removed, we should make sure fib structures > > cant share cache lines with possibly often dirtied objects. > > > > On x86, kmalloc-96 cache can be source of such problems. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > +static inline void *fib_zalloc(size_t size) > > +{ > > + /* We want to avoid possible false sharing */ > > + return kzalloc(max_t(size_t, 128, size), GFP_KERNEL); > > Why the hard coded 128 here? > > It seems more portable and obvious to do > return kzalloc(round_up(size, L1_CACHE_BYTES), GFP_KERNEL); > Its not that obvious, because some machines have an apparent L1_CACHE_BYTES of 64, but hardware prefetching to 128 bytes But using 2*L1_CACHE_BYTES as minimum allocation size might be overkill on some arches with 256 bytes cache lines. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> writes: > On Fri, 2012-07-27 at 07:53 -0700, Eric W. Biederman wrote: >> Eric Dumazet <eric.dumazet@gmail.com> writes: >> >> > Now IP route cache is removed, we should make sure fib structures >> > cant share cache lines with possibly often dirtied objects. >> > >> > On x86, kmalloc-96 cache can be source of such problems. >> > >> > Signed-off-by: Eric Dumazet <edumazet@google.com> >> >> >> > +static inline void *fib_zalloc(size_t size) >> > +{ >> > + /* We want to avoid possible false sharing */ >> > + return kzalloc(max_t(size_t, 128, size), GFP_KERNEL); >> >> Why the hard coded 128 here? >> >> It seems more portable and obvious to do >> return kzalloc(round_up(size, L1_CACHE_BYTES), GFP_KERNEL); >> > > Its not that obvious, because some machines have an apparent > L1_CACHE_BYTES of 64, but hardware prefetching to 128 bytes I am familiar. But does hardware prefetching make a difference if your object is less than 64 bytes? I don't believe only allocating 64 bytes will be a problem, as no one else well be dirtying your cache line. I suppose you could run into pathologies where your object is 3*64 bytes in size, but your expression doesn't handle that case either. > But using 2*L1_CACHE_BYTES as minimum allocation size might be overkill > on some arches with 256 bytes cache lines. The other alternative to guarantee very good cache behavior is to ensure you are allocating a power of two size up to some limit, perhaps page size. My point is the magic 128 likely requires an explicatory comment and I think the net result is you have encoded something fragile that is good for testing but that will in the fullness of time do strange things that will be easy to overlook. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-07-27 at 09:23 -0700, Eric W. Biederman wrote: > I am familiar. But does hardware prefetching make a difference > if your object is less than 64 bytes? > Apparently yes, if the prefetch touches a dirtied neighbour cache line. > I don't believe only allocating 64 bytes will be a problem, > as no one else well be dirtying your cache line. > > I suppose you could run into pathologies where your object > is 3*64 bytes in size, but your expression doesn't handle > that case either. > Sure, but in most cases fib objects are under 128 bytes. > The other alternative to guarantee very good cache behavior is > to ensure you are allocating a power of two size up to some limit, > perhaps page size. > Good idea. > My point is the magic 128 likely requires an explicatory comment and I > think the net result is you have encoded something fragile that is good > for testing but that will in the fullness of time do strange things that > will be easy to overlook. Sure, I'll send a v2, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 27, 2012 at 9:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2012-07-27 at 09:23 -0700, Eric W. Biederman wrote: > >> I am familiar. But does hardware prefetching make a difference >> if your object is less than 64 bytes? >> > > Apparently yes, if the prefetch touches a dirtied neighbour cache line. > >> I don't believe only allocating 64 bytes will be a problem, >> as no one else well be dirtying your cache line. >> >> I suppose you could run into pathologies where your object >> is 3*64 bytes in size, but your expression doesn't handle >> that case either. >> > > Sure, but in most cases fib objects are under 128 bytes. > > >> The other alternative to guarantee very good cache behavior is >> to ensure you are allocating a power of two size up to some limit, >> perhaps page size. >> > > Good idea. > >> My point is the magic 128 likely requires an explicatory comment and I >> think the net result is you have encoded something fragile that is good >> for testing but that will in the fullness of time do strange things that >> will be easy to overlook. > > Sure, I'll send a v2, thanks. > > I tested out v1 of your patch and didn't seem much of a change in my test environment. I figure I will spend most of today going through the code trying to figure out what is causing the issues I am seeing and why your changes had no effect on my setup. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index e69c3a4..5b6a9b3 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -295,6 +295,12 @@ extern void fib_select_multipath(struct fib_result *res); /* Exported by fib_trie.c */ extern void fib_trie_init(void); extern struct fib_table *fib_trie_table(u32 id); +static inline void *fib_zalloc(size_t size) +{ + /* We want to avoid possible false sharing */ + return kzalloc(max_t(size_t, 128, size), GFP_KERNEL); +} + static inline void fib_combine_itag(u32 *itag, const struct fib_result *res) { diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 8732cc7..a3d0285 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1089,10 +1089,7 @@ static int __net_init ip_fib_net_init(struct net *net) int err; size_t size = sizeof(struct hlist_head) * FIB_TABLE_HASHSZ; - /* Avoid false sharing : Use at least a full cache line */ - size = max_t(size_t, size, L1_CACHE_BYTES); - - net->ipv4.fib_table_hash = kzalloc(size, GFP_KERNEL); + net->ipv4.fib_table_hash = fib_zalloc(size); if (net->ipv4.fib_table_hash == NULL) return -ENOMEM; diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index da0cc2e..a9f5e87 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -651,17 +651,17 @@ static inline unsigned int fib_laddr_hashfn(__be32 val) ((__force u32)val >> 14)) & mask; } -static struct hlist_head *fib_info_hash_alloc(int bytes) +static struct hlist_head *fib_info_hash_alloc(size_t bytes) { if (bytes <= PAGE_SIZE) - return kzalloc(bytes, GFP_KERNEL); + return fib_zalloc(bytes); else return (struct hlist_head *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(bytes)); } -static void fib_info_hash_free(struct hlist_head *hash, int bytes) +static void fib_info_hash_free(struct hlist_head *hash, size_t bytes) { if (!hash) return; @@ -678,7 +678,8 @@ static void fib_info_hash_move(struct hlist_head *new_info_hash, { struct hlist_head *old_info_hash, *old_laddrhash; unsigned int old_size = fib_info_hash_size; - unsigned int i, bytes; + unsigned int i; + size_t bytes; spin_lock_bh(&fib_info_lock); old_info_hash = fib_info_hash; @@ -766,10 +767,10 @@ struct fib_info *fib_create_info(struct fib_config *cfg) unsigned int new_size = fib_info_hash_size << 1; struct hlist_head *new_info_hash; struct hlist_head *new_laddrhash; - unsigned int bytes; + size_t bytes; if (!new_size) - new_size = 1; + new_size = 16; bytes = new_size * sizeof(struct hlist_head *); new_info_hash = fib_info_hash_alloc(bytes); new_laddrhash = fib_info_hash_alloc(bytes); @@ -783,7 +784,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg) goto failure; } - fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL); + fi = fib_zalloc(sizeof(*fi) + nhs*sizeof(struct fib_nh)); if (fi == NULL) goto failure; if (cfg->fc_mx) { diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 18cbc15..664152c 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -379,7 +379,7 @@ static inline void free_leaf_info(struct leaf_info *leaf) static struct tnode *tnode_alloc(size_t size) { if (size <= PAGE_SIZE) - return kzalloc(size, GFP_KERNEL); + return fib_zalloc(size); else return vzalloc(size); } @@ -449,7 +449,7 @@ static struct leaf *leaf_new(void) static struct leaf_info *leaf_info_new(int plen) { - struct leaf_info *li = kmalloc(sizeof(struct leaf_info), GFP_KERNEL); + struct leaf_info *li = fib_zalloc(sizeof(struct leaf_info)); if (li) { li->plen = plen; li->mask_plen = ntohl(inet_make_mask(plen)); @@ -1969,32 +1969,24 @@ void __init fib_trie_init(void) { fn_alias_kmem = kmem_cache_create("ip_fib_alias", sizeof(struct fib_alias), - 0, SLAB_PANIC, NULL); + 0, SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL); trie_leaf_kmem = kmem_cache_create("ip_fib_trie", max(sizeof(struct leaf), sizeof(struct leaf_info)), - 0, SLAB_PANIC, NULL); + 0, SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL); } struct fib_table *fib_trie_table(u32 id) { struct fib_table *tb; - struct trie *t; - - tb = kmalloc(sizeof(struct fib_table) + sizeof(struct trie), - GFP_KERNEL); - if (tb == NULL) - return NULL; - - tb->tb_id = id; - tb->tb_default = -1; - tb->tb_num_default = 0; - - t = (struct trie *) tb->tb_data; - memset(t, 0, sizeof(*t)); + tb = fib_zalloc(sizeof(struct fib_table) + sizeof(struct trie)); + if (tb) { + tb->tb_id = id; + tb->tb_default = -1; + } return tb; }