Message ID | 1411464288-18069-3-git-send-email-pablo@netfilter.org |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2014-09-23 at 11:24 +0200, Pablo Neira Ayuso wrote: > The sets are released from the rcu callback, after the rule is removed > from the chain list, which implies that nfnetlink cannot update the > rbtree and no packets are walking on the set anymore. Thus, we can get > rid of the spinlock in the set destroy path there. > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > Reviewied-by: Thomas Graf <tgraf@suug.ch> > --- > net/netfilter/nft_rbtree.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c > index e1836ff..46214f2 100644 > --- a/net/netfilter/nft_rbtree.c > +++ b/net/netfilter/nft_rbtree.c > @@ -234,13 +234,11 @@ static void nft_rbtree_destroy(const struct nft_set *set) > struct nft_rbtree_elem *rbe; > struct rb_node *node; > > - spin_lock_bh(&nft_rbtree_lock); > while ((node = priv->root.rb_node) != NULL) { > rb_erase(node, &priv->root); > rbe = rb_entry(node, struct nft_rbtree_elem, node); > nft_rbtree_elem_destroy(set, rbe); > } > - spin_unlock_bh(&nft_rbtree_lock); > } > > static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features, BTW, do you know if destroying an rbtree is faster this way, or using rb_first() ? Most cases I see in the kernel use a rb_first instead of taking the root. Examples : (its not an exhaustive list) net/netfilter/xt_connlimit.c:402 net/sched/sch_netem.c:380 net/sched/sch_fq.c:519 drivers/infiniband/hw/mlx4/cm.c:439 drivers/iommu/iova.c:324 drivers/md/dm-thin.c:1491 drivers/mtd/mtdswap.c:625 drivers/mtd/ubi/attach.c:636 This might be better for large trees, to get better cache locality, but I have no experimental data. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 23, 2014 at 02:52:37AM -0700, Eric Dumazet wrote: > On Tue, 2014-09-23 at 11:24 +0200, Pablo Neira Ayuso wrote: > > The sets are released from the rcu callback, after the rule is removed > > from the chain list, which implies that nfnetlink cannot update the > > rbtree and no packets are walking on the set anymore. Thus, we can get > > rid of the spinlock in the set destroy path there. > > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > Reviewied-by: Thomas Graf <tgraf@suug.ch> > > --- > > net/netfilter/nft_rbtree.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c > > index e1836ff..46214f2 100644 > > --- a/net/netfilter/nft_rbtree.c > > +++ b/net/netfilter/nft_rbtree.c > > @@ -234,13 +234,11 @@ static void nft_rbtree_destroy(const struct nft_set *set) > > struct nft_rbtree_elem *rbe; > > struct rb_node *node; > > > > - spin_lock_bh(&nft_rbtree_lock); > > while ((node = priv->root.rb_node) != NULL) { > > rb_erase(node, &priv->root); > > rbe = rb_entry(node, struct nft_rbtree_elem, node); > > nft_rbtree_elem_destroy(set, rbe); > > } > > - spin_unlock_bh(&nft_rbtree_lock); > > } > > > > static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features, > > BTW, do you know if destroying an rbtree is faster this way, or using > rb_first() ? > > Most cases I see in the kernel use a rb_first instead of taking the > root. > > Examples : (its not an exhaustive list) > > net/netfilter/xt_connlimit.c:402 > net/sched/sch_netem.c:380 > net/sched/sch_fq.c:519 > drivers/infiniband/hw/mlx4/cm.c:439 > drivers/iommu/iova.c:324 > drivers/md/dm-thin.c:1491 > drivers/mtd/mtdswap.c:625 > drivers/mtd/ubi/attach.c:636 > > This might be better for large trees, to get better cache locality, > but I have no experimental data. I'll send a follow up patch for nf-next to use rb_first() in that patch. Thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-09-23 at 13:01 +0200, Pablo Neira Ayuso wrote: > I'll send a follow up patch for nf-next to use rb_first() in that > patch. Thanks Eric. I did a test, and its indeed a bit faster to use rb_first(), by about 5% Real win is to be able to build a chain using rb_first()/rb_next(), (leaving the tree as is), then deleting the items in the chain, and simply reset rb_root. This only needs to reuse one pointer to store the item->next pointer. This is then about ~50% faster, because we do not constantly rebalance tree for every removed item. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 23, 2014 at 04:54:05AM -0700, Eric Dumazet wrote: > On Tue, 2014-09-23 at 13:01 +0200, Pablo Neira Ayuso wrote: > > > I'll send a follow up patch for nf-next to use rb_first() in that > > patch. Thanks Eric. > > I did a test, and its indeed a bit faster to use rb_first(), by about 5% > > Real win is to be able to build a chain using rb_first()/rb_next(), > (leaving the tree as is), then deleting the items in the chain, and > simply reset rb_root. > > This only needs to reuse one pointer to store the item->next pointer. > > This is then about ~50% faster, because we do not constantly rebalance > tree for every removed item. Indeed. struct nft_rbtree_elem { struct rb_node node; u16 flags; struct nft_data key; struct nft_data data[]; }; Actually, I could add to nft_data a pointer in the union area, but I'm not very confortable with adding it for this specific case. At this moment we're releasing this from rcu_callback which is "hiding" the deletion time from the netlink interface. But I'll keep this back in my head if we later on have some pointer candidate to be reused in a nice way. I'll send a patch to make the rb_first()/rb_next() conversion though. Thanks for your comments! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-09-23 at 18:10 +0200, Pablo Neira Ayuso wrote: > Actually, I could add to nft_data a pointer in the union area, but I'm > not very confortable with adding it for this specific case. At this > moment we're releasing this from rcu_callback which is "hiding" the > deletion time from the netlink interface. > > But I'll keep this back in my head if we later on have some pointer > candidate to be reused in a nice way. > > I'll send a patch to make the rb_first()/rb_next() conversion though. Ah, forget what I said, its actually slower to build a list by 7%, I had an error in my test. So the rb_first() thing is the easy and fast way. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c index e1836ff..46214f2 100644 --- a/net/netfilter/nft_rbtree.c +++ b/net/netfilter/nft_rbtree.c @@ -234,13 +234,11 @@ static void nft_rbtree_destroy(const struct nft_set *set) struct nft_rbtree_elem *rbe; struct rb_node *node; - spin_lock_bh(&nft_rbtree_lock); while ((node = priv->root.rb_node) != NULL) { rb_erase(node, &priv->root); rbe = rb_entry(node, struct nft_rbtree_elem, node); nft_rbtree_elem_destroy(set, rbe); } - spin_unlock_bh(&nft_rbtree_lock); } static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
The sets are released from the rcu callback, after the rule is removed from the chain list, which implies that nfnetlink cannot update the rbtree and no packets are walking on the set anymore. Thus, we can get rid of the spinlock in the set destroy path there. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewied-by: Thomas Graf <tgraf@suug.ch> --- net/netfilter/nft_rbtree.c | 2 -- 1 file changed, 2 deletions(-)