Message ID | 20120726.140601.1137230112117936793.davem@davemloft.net |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Jul 26, 2012 at 2:06 PM, David Miller <davem@davemloft.net> wrote: > From: Alexander Duyck <alexander.duyck@gmail.com> > Date: Thu, 26 Jul 2012 11:26:26 -0700 > >> The previous results were with a slight modifications to your earlier >> patch. With this patch applied I am seeing 10.4Mpps with 8 queues, >> reaching a maximum of 11.6Mpps with 9 queues. > > For fun you might want to see what this patch does for your tests, > it should cut the number of fib_table_lookup() calls roughly in half. So with your patch, Eric's patch, and this most recent patch we are now at 11.8Mpps with 8 or 9 queues. At this point I am staring to hit the hardware limits since 82599 will typically max out at about 12Mpps w/ 9 queues. Here is the latest perf results with all of these patches in place. As you predicted your patch essentially cut the lookup overhead in half: 10.65% [k] ixgbe_poll 7.77% [k] fib_table_lookup 6.21% [k] ixgbe_xmit_frame_ring 6.08% [k] __netif_receive_skb 4.41% [k] _raw_spin_lock 3.95% [k] kmem_cache_free 3.30% [k] build_skb 3.17% [k] memcpy 2.96% [k] dev_queue_xmit 2.79% [k] ip_finish_output 2.66% [k] kmem_cache_alloc 2.57% [k] check_leaf 2.52% [k] ip_route_input_noref 2.50% [k] netdev_alloc_frag 2.17% [k] ip_rcv 2.16% [k] __phys_addr I will probably do some more poking around over the next few days in order to get my head around the fib_table_lookup overhead. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 26 Jul 2012 15:03:39 -0700 Alexander Duyck <alexander.duyck@gmail.com> wrote: > On Thu, Jul 26, 2012 at 2:06 PM, David Miller <davem@davemloft.net> wrote: > > From: Alexander Duyck <alexander.duyck@gmail.com> > > Date: Thu, 26 Jul 2012 11:26:26 -0700 > > > >> The previous results were with a slight modifications to your earlier > >> patch. With this patch applied I am seeing 10.4Mpps with 8 queues, > >> reaching a maximum of 11.6Mpps with 9 queues. > > > > For fun you might want to see what this patch does for your tests, > > it should cut the number of fib_table_lookup() calls roughly in half. > > So with your patch, Eric's patch, and this most recent patch we are > now at 11.8Mpps with 8 or 9 queues. At this point I am staring to hit > the hardware limits since 82599 will typically max out at about 12Mpps > w/ 9 queues. > > Here is the latest perf results with all of these patches in place. > As you predicted your patch essentially cut the lookup overhead in > half: > 10.65% [k] ixgbe_poll > 7.77% [k] fib_table_lookup > 6.21% [k] ixgbe_xmit_frame_ring > 6.08% [k] __netif_receive_skb > 4.41% [k] _raw_spin_lock > 3.95% [k] kmem_cache_free > 3.30% [k] build_skb > 3.17% [k] memcpy > 2.96% [k] dev_queue_xmit > 2.79% [k] ip_finish_output > 2.66% [k] kmem_cache_alloc > 2.57% [k] check_leaf > 2.52% [k] ip_route_input_noref > 2.50% [k] netdev_alloc_frag > 2.17% [k] ip_rcv > 2.16% [k] __phys_addr > > I will probably do some more poking around over the next few days in > order to get my head around the fib_table_lookup overhead. > > Thanks, > > Alex > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html The fib trie stats are global, you may want to either disable CONFIG_IP_FIB_TRIE_STATS or convert them to per-cpu. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-07-26 at 15:13 -0700, Stephen Hemminger wrote: > The fib trie stats are global, you may want to either disable CONFIG_IP_FIB_TRIE_STATS > or convert them to per-cpu. I guess its already disabled in Alex case ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Stephen Hemminger <shemminger@vyatta.com> Date: Thu, 26 Jul 2012 15:13:12 -0700 > The fib trie stats are global, you may want to either disable > CONFIG_IP_FIB_TRIE_STATS or convert them to per-cpu. Oh yeah, turn that stuff off when testing :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexander Duyck <alexander.duyck@gmail.com> Date: Thu, 26 Jul 2012 15:03:39 -0700 > Here is the latest perf results with all of these patches in place. > As you predicted your patch essentially cut the lookup overhead in > half: Ok good. That patch is hard to make legitimate, I'd have to do a bit or work before we could realize that change. We can only combine the LOCAL and MAIN tables like that so long as there are no overlaps in the routes covered by the two tables. We'd also have to be sure to report the routes properly in dumps too. I really wish we had never segregated these two tables, it's completely pointless and hurts performance. But now we have to accomodate this legacy. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 26, 2012 at 3:53 PM, David Miller <davem@davemloft.net> wrote: > From: Alexander Duyck <alexander.duyck@gmail.com> > Date: Thu, 26 Jul 2012 15:03:39 -0700 > >> Here is the latest perf results with all of these patches in place. >> As you predicted your patch essentially cut the lookup overhead in >> half: > > Ok good. > > That patch is hard to make legitimate, I'd have to do a bit or work > before we could realize that change. > > We can only combine the LOCAL and MAIN tables like that so long as > there are no overlaps in the routes covered by the two tables. We'd > also have to be sure to report the routes properly in dumps too. > > I really wish we had never segregated these two tables, it's > completely pointless and hurts performance. But now we have to > accomodate this legacy. Any idea why these look-ups are so expensive in the first place? When I dump fib_trie it doesn't look like I have much there. I would have thought the table would be pretty static with just 8 flows all going to the same destination address, but it seems like I was getting hit with cache misses for some reason. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexander Duyck <alexander.duyck@gmail.com> Date: Thu, 26 Jul 2012 19:14:55 -0700 > Any idea why these look-ups are so expensive in the first place? When > I dump fib_trie it doesn't look like I have much there. I would have > thought the table would be pretty static with just 8 flows all going > to the same destination address, but it seems like I was getting hit > with cache misses for some reason. A lot of the overhead comes from write traffic that results from filling in the "fib_result" structure onto the callers stack. The return value from a fib_lookup() has far too many components. But simplifying things is not easy. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Miller <davem@davemloft.net> Date: Thu, 26 Jul 2012 20:08:46 -0700 (PDT) > A lot of the overhead comes from write traffic that results from > filling in the "fib_result" structure onto the callers stack. Here's the longer analysis of how things are now. There are several components to a route lookup result, and struct fib_result tries to encapsulate all of this. Another aspect is that our route tables are broken up into different datas tructures which reference each other, in order to save space. So the actual objects in the FIB trie are fib_alias structures, and those point to fib_info. There is a many to one relationship between FIB trie nodes and fib_info objects. The idea is that many routes have the same set of nexthops, metrics, preferred source address, etc. So one thing we return in the fib_result is a pointer to the fib_info and an index into the nexthop array (nh_sel). That's why we have all of these funny accessor's FIB_RES_X(res) which essentially provide res.fi->fib_nh[res.nh_sel].X Therefore one area of simplification would be to just return a pointer to the FIB nexthop, rather than the fib_info pointer and the nexthop index. We can get to the fib_info, if we need to, via the nh_parent pointer of the nexthop. It seems also that the res->scope value can be cribbed from the fib_info as well. res->type is embedded in the fib_alias we select hanging off of the FIB trie node. And the res->prefixlen is taken from the FIB trie node. res->tclassid is problematic, because it comes from the FIB rules tables rather than the FIB trie. We used to store a full FIB rules pointer in the fib_result, but I reduced it down to just the u32 tclassid. This whole area, as well as the FIB trie lookup itself, is an area ripe for a large number of small micro-optimizations that in the end make it's overhead much more reasonable. Another thing I haven't mentioned is that another part of FIB trie's overhead is that it does backtracking. The shorter prefixes sit at the top of the trie, so when it traverses down it does so until it can't get a match, then it walks back up to the root until it does have a match. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Miller <davem@davemloft.net> Date: Thu, 26 Jul 2012 23:02:46 -0700 (PDT) > Therefore one area of simplification would be to just return a pointer > to the FIB nexthop, rather than the fib_info pointer and the nexthop > index. We can get to the fib_info, if we need to, via the nh_parent > pointer of the nexthop. So I'm about to post an RFC set of patches which show this kind of simplification. It gets fib_result down to two members: u32 tclassid; struct fib_nh *nh; If I could get rid of that tclassid it would be really nice. But that's hard because the tclassid is fetched from the fib_rule and all of that lookup path is abstracted behind a common layer that's shared between ipv4 and ipv6 so it's a bit of work changing arg conventions. These changes help, but only ever so slightly, in my testing. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 27, 2012 at 9:15 PM, David Miller <davem@davemloft.net> wrote: > From: David Miller <davem@davemloft.net> > Date: Thu, 26 Jul 2012 23:02:46 -0700 (PDT) > >> Therefore one area of simplification would be to just return a pointer >> to the FIB nexthop, rather than the fib_info pointer and the nexthop >> index. We can get to the fib_info, if we need to, via the nh_parent >> pointer of the nexthop. > > So I'm about to post an RFC set of patches which show this kind > of simplification. It gets fib_result down to two members: > > u32 tclassid; > struct fib_nh *nh; > > If I could get rid of that tclassid it would be really nice. But > that's hard because the tclassid is fetched from the fib_rule and > all of that lookup path is abstracted behind a common layer > that's shared between ipv4 and ipv6 so it's a bit of work changing > arg conventions. > > These changes help, but only ever so slightly, in my testing. I probably won't be able to do any real testing for the patches until Monday, or at least I may let somebody else test the patches first just to make sure they don't panic the system before I do anything with them. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index b64a19c..fc7eade 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -204,9 +204,7 @@ static inline struct fib_table *fib_get_table(struct net *net, u32 id) { struct hlist_head *ptr; - ptr = id == RT_TABLE_LOCAL ? - &net->ipv4.fib_table_hash[TABLE_LOCAL_INDEX] : - &net->ipv4.fib_table_hash[TABLE_MAIN_INDEX]; + ptr = &net->ipv4.fib_table_hash[TABLE_MAIN_INDEX]; return hlist_entry(ptr->first, struct fib_table, tb_hlist); } @@ -220,10 +218,6 @@ static inline int fib_lookup(struct net *net, const struct flowi4 *flp, { struct fib_table *table; - table = fib_get_table(net, RT_TABLE_LOCAL); - if (!fib_table_lookup(table, flp, res, FIB_LOOKUP_NOREF)) - return 0; - table = fib_get_table(net, RT_TABLE_MAIN); if (!fib_table_lookup(table, flp, res, FIB_LOOKUP_NOREF)) return 0; @@ -245,10 +239,6 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp, { if (!net->ipv4.fib_has_custom_rules) { res->tclassid = 0; - if (net->ipv4.fib_local && - !fib_table_lookup(net->ipv4.fib_local, flp, res, - FIB_LOOKUP_NOREF)) - return 0; if (net->ipv4.fib_main && !fib_table_lookup(net->ipv4.fib_main, flp, res, FIB_LOOKUP_NOREF)) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 44bf82e..bdc0231 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -160,7 +160,7 @@ struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref) /* Fallback to FIB local table so that communication * over loopback subnets work. */ - local = fib_get_table(net, RT_TABLE_LOCAL); + local = fib_get_table(net, RT_TABLE_MAIN); if (local && !fib_table_lookup(local, &fl4, &res, FIB_LOOKUP_NOREF) && res.type == RTN_LOCAL) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index c1fde53..ddfe398 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -52,16 +52,10 @@ static int __net_init fib4_rules_init(struct net *net) { struct fib_table *local_table, *main_table; - local_table = fib_trie_table(RT_TABLE_LOCAL); - if (local_table == NULL) - return -ENOMEM; - main_table = fib_trie_table(RT_TABLE_MAIN); if (main_table == NULL) goto fail; - hlist_add_head_rcu(&local_table->tb_hlist, - &net->ipv4.fib_table_hash[TABLE_LOCAL_INDEX]); hlist_add_head_rcu(&main_table->tb_hlist, &net->ipv4.fib_table_hash[TABLE_MAIN_INDEX]); return 0; @@ -169,7 +163,7 @@ static inline unsigned int __inet_dev_addr_type(struct net *net, if (ipv4_is_multicast(addr)) return RTN_MULTICAST; - local_table = fib_get_table(net, RT_TABLE_LOCAL); + local_table = fib_get_table(net, RT_TABLE_MAIN); if (local_table) { ret = RTN_UNICAST; rcu_read_lock(); @@ -712,11 +706,7 @@ static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifad }, }; - if (type == RTN_UNICAST) - tb = fib_new_table(net, RT_TABLE_MAIN); - else - tb = fib_new_table(net, RT_TABLE_LOCAL); - + tb = fib_new_table(net, RT_TABLE_MAIN); if (tb == NULL) return; diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index a83d74e..65135dd 100644 --- a/net/ipv4/fib_rules.c +++ b/net/ipv4/fib_rules.c @@ -284,9 +284,6 @@ static int fib_default_rules_init(struct fib_rules_ops *ops) { int err; - err = fib_default_rule_add(ops, 0, RT_TABLE_LOCAL, 0); - if (err < 0) - return err; err = fib_default_rule_add(ops, 0x7FFE, RT_TABLE_MAIN, 0); if (err < 0) return err;