Message ID | 20190416142047.3453-1-vladbu@mellanox.com |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
Series | [RFC,net-next] net: sched: flower: refactor reoffload for concurrent access | expand |
On Tue, 16 Apr 2019 17:20:47 +0300, Vlad Buslov wrote: > @@ -1551,6 +1558,10 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, > goto errout_mask; > > if (!tc_skip_hw(fnew->flags)) { > + spin_lock(&tp->lock); > + list_add(&fnew->hw_list, &head->hw_filters); > + spin_unlock(&tp->lock); > + > err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack); > if (err) > goto errout_ht; Duplicated deletes should be fine, but I'm not sure same is true for adds. Won't seeing an add with the same cookie twice confuse drivers? There's also the minor issue of offloaded count being off in that case :)
On Wed 17 Apr 2019 at 00:49, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > On Tue, 16 Apr 2019 17:20:47 +0300, Vlad Buslov wrote: >> @@ -1551,6 +1558,10 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, >> goto errout_mask; >> >> if (!tc_skip_hw(fnew->flags)) { >> + spin_lock(&tp->lock); >> + list_add(&fnew->hw_list, &head->hw_filters); >> + spin_unlock(&tp->lock); >> + >> err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack); >> if (err) >> goto errout_ht; > > Duplicated deletes should be fine, but I'm not sure same is true for > adds. Won't seeing an add with the same cookie twice confuse drivers? > > There's also the minor issue of offloaded count being off in that > case :) Hmmm, okay. Rejecting duplicate cookies should be a trivial change to drivers though. Do you see any faults with this approach in general?
On Wed, 17 Apr 2019 07:29:36 +0000, Vlad Buslov wrote: > On Wed 17 Apr 2019 at 00:49, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Tue, 16 Apr 2019 17:20:47 +0300, Vlad Buslov wrote: > >> @@ -1551,6 +1558,10 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, > >> goto errout_mask; > >> > >> if (!tc_skip_hw(fnew->flags)) { > >> + spin_lock(&tp->lock); > >> + list_add(&fnew->hw_list, &head->hw_filters); > >> + spin_unlock(&tp->lock); > >> + > >> err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack); > >> if (err) > >> goto errout_ht; > > > > Duplicated deletes should be fine, but I'm not sure same is true for > > adds. Won't seeing an add with the same cookie twice confuse drivers? > > > > There's also the minor issue of offloaded count being off in that > > case :) > > Hmmm, okay. Rejecting duplicate cookies should be a trivial change to > drivers though. Do you see any faults with this approach in general? Trivial or not it adds up, the stack should make driver authors' job as easy as possible. The simplest thing to do would be to add a mutex around the HW calls. But that obviously doesn't work for you, cause you want multiple outstanding requests to the FW for a single tp, right? How about a RW lock, that would take R on normal add/replace/del paths and W on replays? That should scale, no?
On Wed 17 Apr 2019 at 19:34, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > On Wed, 17 Apr 2019 07:29:36 +0000, Vlad Buslov wrote: >> On Wed 17 Apr 2019 at 00:49, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: >> > On Tue, 16 Apr 2019 17:20:47 +0300, Vlad Buslov wrote: >> >> @@ -1551,6 +1558,10 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, >> >> goto errout_mask; >> >> >> >> if (!tc_skip_hw(fnew->flags)) { >> >> + spin_lock(&tp->lock); >> >> + list_add(&fnew->hw_list, &head->hw_filters); >> >> + spin_unlock(&tp->lock); >> >> + >> >> err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack); >> >> if (err) >> >> goto errout_ht; >> > >> > Duplicated deletes should be fine, but I'm not sure same is true for >> > adds. Won't seeing an add with the same cookie twice confuse drivers? >> > >> > There's also the minor issue of offloaded count being off in that >> > case :) >> >> Hmmm, okay. Rejecting duplicate cookies should be a trivial change to >> drivers though. Do you see any faults with this approach in general? > > Trivial or not it adds up, the stack should make driver authors' job as > easy as possible. The simplest thing to do would be to add a mutex Agree. However, all driver flower offload implementations already have all necessary functionality to lookup flow by cookie because they need it to implement flow deletion. > around the HW calls. But that obviously doesn't work for you, cause > you want multiple outstanding requests to the FW for a single tp, > right? Right. > > How about a RW lock, that would take R on normal add/replace/del paths > and W on replays? That should scale, no? Yes. But I would prefer to avoid adding another sleeping lock on rule update path of every filter (including non-offloaded use cases when reoffload is not used at all). Jiri, what approach would you prefer?
On Wed 17 Apr 2019 at 19:34, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > On Wed, 17 Apr 2019 07:29:36 +0000, Vlad Buslov wrote: >> On Wed 17 Apr 2019 at 00:49, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: >> > On Tue, 16 Apr 2019 17:20:47 +0300, Vlad Buslov wrote: >> >> @@ -1551,6 +1558,10 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, >> >> goto errout_mask; >> >> >> >> if (!tc_skip_hw(fnew->flags)) { >> >> + spin_lock(&tp->lock); >> >> + list_add(&fnew->hw_list, &head->hw_filters); >> >> + spin_unlock(&tp->lock); >> >> + >> >> err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack); >> >> if (err) >> >> goto errout_ht; >> > >> > Duplicated deletes should be fine, but I'm not sure same is true for >> > adds. Won't seeing an add with the same cookie twice confuse drivers? >> > >> > There's also the minor issue of offloaded count being off in that >> > case :) >> >> Hmmm, okay. Rejecting duplicate cookies should be a trivial change to >> drivers though. Do you see any faults with this approach in general? > > Trivial or not it adds up, the stack should make driver authors' job as > easy as possible. The simplest thing to do would be to add a mutex > around the HW calls. But that obviously doesn't work for you, cause > you want multiple outstanding requests to the FW for a single tp, right? > > How about a RW lock, that would take R on normal add/replace/del paths > and W on replays? That should scale, no? I've been thinking some more about possible ways to mitigate the problem. First of all I tried to implement POC of rwlock in flower and it isn't straightforward because of lock ordering. Observe that fl_reoffload() is always called with rtnl lock taken (I didn't do any work to unlock bind/unbind), but fl_change() can be called without rtnl lock and needs to obtain it before offloading rules. This means that we have deadlock here, if fl_change() obtains locks in order rwlock ---> rtnl_lock and fl_reoffload() obtains locks in order rtnl_lock ---> rwlock. Considering this, I tried to improve my solution to remove possibility of multiple adds of same filter and it seems to me that it would be enough to move hw_filters list management in flower offloads functions: add filter to list while holding rtnl lock in fl_hw_replace_filter() and remove it from list while holding rtnl lock in fl_hw_destroy_filter(). What do you think?
On Thu, 18 Apr 2019 16:33:22 +0000, Vlad Buslov wrote: > Considering this, I tried to improve my solution to remove possibility > of multiple adds of same filter and it seems to me that it would be > enough to move hw_filters list management in flower offloads functions: > add filter to list while holding rtnl lock in fl_hw_replace_filter() and > remove it from list while holding rtnl lock in fl_hw_destroy_filter(). > What do you think? Sounds good for now, but I presume the plan is to get rid of rtnl around the driver call.. at which point we would switch to a rwlock? :)
On Thu 18 Apr 2019 at 20:46, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > On Thu, 18 Apr 2019 16:33:22 +0000, Vlad Buslov wrote: >> Considering this, I tried to improve my solution to remove possibility >> of multiple adds of same filter and it seems to me that it would be >> enough to move hw_filters list management in flower offloads functions: >> add filter to list while holding rtnl lock in fl_hw_replace_filter() and >> remove it from list while holding rtnl lock in fl_hw_destroy_filter(). >> What do you think? > > Sounds good for now, but I presume the plan is to get rid of rtnl > around the driver call.. at which point we would switch to a rwlock? :) Yes, but I would like the lock to be in cls hw offloads API (tc_setup_cb_replace(), etc) and not in flower itself. That would also solve deadlock issue and make code reusable for any further unlocked classifiers implementations.
On Thu, 18 Apr 2019 17:58:26 +0000, Vlad Buslov wrote: > On Thu 18 Apr 2019 at 20:46, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Thu, 18 Apr 2019 16:33:22 +0000, Vlad Buslov wrote: > >> Considering this, I tried to improve my solution to remove possibility > >> of multiple adds of same filter and it seems to me that it would be > >> enough to move hw_filters list management in flower offloads functions: > >> add filter to list while holding rtnl lock in fl_hw_replace_filter() and > >> remove it from list while holding rtnl lock in fl_hw_destroy_filter(). > >> What do you think? > > > > Sounds good for now, but I presume the plan is to get rid of rtnl > > around the driver call.. at which point we would switch to a rwlock? :) > > Yes, but I would like the lock to be in cls hw offloads API > (tc_setup_cb_replace(), etc) and not in flower itself. That would also > solve deadlock issue and make code reusable for any further unlocked > classifiers implementations. And then the HW list goes along with it into the common code? That'd be quite nice.
On Thu 18 Apr 2019 at 21:02, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > On Thu, 18 Apr 2019 17:58:26 +0000, Vlad Buslov wrote: >> On Thu 18 Apr 2019 at 20:46, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: >> > On Thu, 18 Apr 2019 16:33:22 +0000, Vlad Buslov wrote: >> >> Considering this, I tried to improve my solution to remove possibility >> >> of multiple adds of same filter and it seems to me that it would be >> >> enough to move hw_filters list management in flower offloads functions: >> >> add filter to list while holding rtnl lock in fl_hw_replace_filter() and >> >> remove it from list while holding rtnl lock in fl_hw_destroy_filter(). >> >> What do you think? >> > >> > Sounds good for now, but I presume the plan is to get rid of rtnl >> > around the driver call.. at which point we would switch to a rwlock? :) >> >> Yes, but I would like the lock to be in cls hw offloads API >> (tc_setup_cb_replace(), etc) and not in flower itself. That would also >> solve deadlock issue and make code reusable for any further unlocked >> classifiers implementations. > > And then the HW list goes along with it into the common code? > That'd be quite nice. The goal is to have a lock in tcf_block and use it synchronize cb_list and all related counters (offloadcnt, etc). Now I also want to use it to protect hw_filters list and prevent the double-add issue. Meanwhile rtnl lock can do the job.
On Thu, 18 Apr 2019 18:13:37 +0000, Vlad Buslov wrote: > On Thu 18 Apr 2019 at 21:02, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Thu, 18 Apr 2019 17:58:26 +0000, Vlad Buslov wrote: > >> On Thu 18 Apr 2019 at 20:46, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > >> > On Thu, 18 Apr 2019 16:33:22 +0000, Vlad Buslov wrote: > >> >> Considering this, I tried to improve my solution to remove possibility > >> >> of multiple adds of same filter and it seems to me that it would be > >> >> enough to move hw_filters list management in flower offloads functions: > >> >> add filter to list while holding rtnl lock in fl_hw_replace_filter() and > >> >> remove it from list while holding rtnl lock in fl_hw_destroy_filter(). > >> >> What do you think? > >> > > >> > Sounds good for now, but I presume the plan is to get rid of rtnl > >> > around the driver call.. at which point we would switch to a rwlock? :) > >> > >> Yes, but I would like the lock to be in cls hw offloads API > >> (tc_setup_cb_replace(), etc) and not in flower itself. That would also > >> solve deadlock issue and make code reusable for any further unlocked > >> classifiers implementations. > > > > And then the HW list goes along with it into the common code? > > That'd be quite nice. > > The goal is to have a lock in tcf_block and use it synchronize cb_list > and all related counters (offloadcnt, etc). Now I also want to use it to > protect hw_filters list and prevent the double-add issue. Meanwhile rtnl > lock can do the job. SGTM 👍
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 4b5585358699..e0f921c537ab 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -90,6 +90,7 @@ struct cls_fl_head { struct rhashtable ht; spinlock_t masks_lock; /* Protect masks list */ struct list_head masks; + struct list_head hw_filters; struct rcu_work rwork; struct idr handle_idr; }; @@ -102,6 +103,7 @@ struct cls_fl_filter { struct tcf_result res; struct fl_flow_key key; struct list_head list; + struct list_head hw_list; u32 handle; u32 flags; u32 in_hw_count; @@ -315,6 +317,7 @@ static int fl_init(struct tcf_proto *tp) spin_lock_init(&head->masks_lock); INIT_LIST_HEAD_RCU(&head->masks); + INIT_LIST_HEAD(&head->hw_filters); rcu_assign_pointer(tp->root, head); idr_init(&head->handle_idr); @@ -485,12 +488,15 @@ static struct cls_fl_head *fl_head_dereference(struct tcf_proto *tp) return rcu_dereference_raw(tp->root); } -static void __fl_put(struct cls_fl_filter *f) +static void __fl_put(struct tcf_proto *tp, struct cls_fl_filter *f) { if (!refcount_dec_and_test(&f->refcnt)) return; - WARN_ON(!f->deleted); + spin_lock(&tp->lock); + if (!list_empty(&f->hw_list)) + list_del(&f->hw_list); + spin_unlock(&tp->lock); if (tcf_exts_get_net(&f->exts)) tcf_queue_work(&f->rwork, fl_destroy_filter_work); @@ -554,7 +560,7 @@ static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f, if (!tc_skip_hw(f->flags)) fl_hw_destroy_filter(tp, f, rtnl_held, extack); tcf_unbind_filter(tp, &f->res); - __fl_put(f); + __fl_put(tp, f); return 0; } @@ -595,7 +601,7 @@ static void fl_put(struct tcf_proto *tp, void *arg) { struct cls_fl_filter *f = arg; - __fl_put(f); + __fl_put(tp, f); } static void *fl_get(struct tcf_proto *tp, u32 handle) @@ -1522,6 +1528,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, err = -ENOBUFS; goto errout_tb; } + INIT_LIST_HEAD(&fnew->hw_list); refcount_set(&fnew->refcnt, 1); err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0); @@ -1551,6 +1558,10 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, goto errout_mask; if (!tc_skip_hw(fnew->flags)) { + spin_lock(&tp->lock); + list_add(&fnew->hw_list, &head->hw_filters); + spin_unlock(&tp->lock); + err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack); if (err) goto errout_ht; @@ -1569,7 +1580,6 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, goto errout_hw; } - refcount_inc(&fnew->refcnt); if (fold) { /* Fold filter was deleted concurrently. Retry lookup. */ if (fold->deleted) { @@ -1591,6 +1601,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, in_ht = true; } + refcount_inc(&fnew->refcnt); rhashtable_remove_fast(&fold->mask->ht, &fold->ht_node, fold->mask->filter_ht_params); @@ -1608,7 +1619,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, * after this. */ refcount_dec(&fold->refcnt); - __fl_put(fold); + __fl_put(tp, fold); } else { if (handle) { /* user specifies a handle and it doesn't exist */ @@ -1631,6 +1642,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, if (err) goto errout_hw; + refcount_inc(&fnew->refcnt); fnew->handle = handle; list_add_tail_rcu(&fnew->list, &fnew->mask->filters); spin_unlock(&tp->lock); @@ -1642,26 +1654,27 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, kfree(mask); return 0; +errout_ht: + spin_lock(&tp->lock); errout_hw: + fnew->deleted = true; spin_unlock(&tp->lock); if (!tc_skip_hw(fnew->flags)) fl_hw_destroy_filter(tp, fnew, rtnl_held, NULL); -errout_ht: if (in_ht) rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node, fnew->mask->filter_ht_params); errout_mask: fl_mask_put(head, fnew->mask); errout: - tcf_exts_get_net(&fnew->exts); - tcf_queue_work(&fnew->rwork, fl_destroy_filter_work); + __fl_put(tp, fnew); errout_tb: kfree(tb); errout_mask_alloc: kfree(mask); errout_fold: if (fold) - __fl_put(fold); + __fl_put(tp, fold); return err; } @@ -1675,7 +1688,7 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last, err = __fl_delete(tp, f, &last_on_mask, rtnl_held, extack); *last = list_empty(&head->masks); - __fl_put(f); + __fl_put(tp, f); return err; } @@ -1689,33 +1702,61 @@ static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg, while ((f = fl_get_next_filter(tp, &arg->cookie)) != NULL) { if (arg->fn(tp, f, arg) < 0) { - __fl_put(f); + __fl_put(tp, f); arg->stop = 1; break; } - __fl_put(f); + __fl_put(tp, f); arg->cookie++; arg->count++; } } +static struct cls_fl_filter * +fl_get_next_hw_filter(struct tcf_proto *tp, struct cls_fl_filter *f, bool add) +{ + struct cls_fl_head *head = fl_head_dereference(tp); + + spin_lock(&tp->lock); + if (!f) { + if (list_empty(&head->hw_filters)) { + spin_unlock(&tp->lock); + return NULL; + } + + f = list_first_entry(&head->hw_filters, struct cls_fl_filter, + hw_list); + } else { + f = list_next_entry(f, hw_list); + } + + list_for_each_entry_from(f, &head->hw_filters, hw_list) { + if (!(add && f->deleted) && refcount_inc_not_zero(&f->refcnt)) { + spin_unlock(&tp->lock); + return f; + } + } + + spin_unlock(&tp->lock); + return NULL; +} + static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb, void *cb_priv, struct netlink_ext_ack *extack) { struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; - unsigned long handle = 0; - struct cls_fl_filter *f; + struct cls_fl_filter *f = NULL; int err; - while ((f = fl_get_next_filter(tp, &handle))) { + while ((f = fl_get_next_hw_filter(tp, f, add))) { if (tc_skip_hw(f->flags)) goto next_flow; cls_flower.rule = flow_rule_alloc(tcf_exts_num_actions(&f->exts)); if (!cls_flower.rule) { - __fl_put(f); + __fl_put(tp, f); return -ENOMEM; } @@ -1733,7 +1774,7 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb, kfree(cls_flower.rule); if (tc_skip_sw(f->flags)) { NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action"); - __fl_put(f); + __fl_put(tp, f); return err; } goto next_flow; @@ -1746,7 +1787,7 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb, if (err) { if (add && tc_skip_sw(f->flags)) { - __fl_put(f); + __fl_put(tp, f); return err; } goto next_flow; @@ -1757,8 +1798,7 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb, add); spin_unlock(&tp->lock); next_flow: - handle++; - __fl_put(f); + __fl_put(tp, f); } return 0;
Recent changes that introduced unlocked flower did not properly account for case when reoffload is initiated concurrently with filter updates. To fix the issue, extend flower with 'hw_filters' list that is used to store filters that don't have 'skip_hw' flag set. Filter is added to the list before it is inserted to hardware and only removed from it after last reference to filter has been released. This ensures that concurrent reoffload can still access filter that is being deleted and removes race condition when driver callback can be removed when filter is no longer accessible trough idr, but is still present in hardware. Refactor fl_change() to insert filter to hw_list before offloading it to hardware and to properly cleanup filter in case of error with __fl_put(). This allows for concurrent access to filter from fl_reoffload() and protects it with reference counting. Refactor fl_reoffload() to iterate over hw_filters list instead of idr. Implement fl_get_next_hw_filter() helper function that is used to iterate over hw_filters list with reference counting and skips filters that are being concurrently deleted. Fixes: 92149190067d ("net: sched: flower: set unlocked flag for flower proto ops") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> --- net/sched/cls_flower.c | 82 +++++++++++++++++++++++++++++++----------- 1 file changed, 61 insertions(+), 21 deletions(-)