diff mbox series

[net-next,09/12] net: sched: flower: handle concurrent tcf proto deletion

Message ID 20190214074712.17846-10-vladbu@mellanox.com
State Changes Requested
Delegated to: David Miller
Headers show
Series Refactor flower classifier to remove dependency on rtnl lock | expand

Commit Message

Vlad Buslov Feb. 14, 2019, 7:47 a.m. UTC
Without rtnl lock protection tcf proto can be deleted concurrently. Check
tcf proto 'deleting' flag after taking tcf spinlock to verify that no
concurrent deletion is in progress. Return EAGAIN error if concurrent
deletion detected, which will cause caller to retry and possibly create new
instance of tcf proto.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Cong Wang Feb. 18, 2019, 8:47 p.m. UTC | #1
On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Without rtnl lock protection tcf proto can be deleted concurrently. Check
> tcf proto 'deleting' flag after taking tcf spinlock to verify that no
> concurrent deletion is in progress. Return EAGAIN error if concurrent
> deletion detected, which will cause caller to retry and possibly create new
> instance of tcf proto.
>

Please state the reason why you prefer retry over locking the whole
tp without retrying, that is why and how it is better?

Personally I always prefer non-retry logic, because it is very easy
to understand and justify its correctness.

As you prefer otherwise, please share your reasoning in changelog.

Thanks!
Vlad Buslov Feb. 19, 2019, 2:08 p.m. UTC | #2
On Mon 18 Feb 2019 at 20:47, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Without rtnl lock protection tcf proto can be deleted concurrently. Check
>> tcf proto 'deleting' flag after taking tcf spinlock to verify that no
>> concurrent deletion is in progress. Return EAGAIN error if concurrent
>> deletion detected, which will cause caller to retry and possibly create new
>> instance of tcf proto.
>>
>
> Please state the reason why you prefer retry over locking the whole
> tp without retrying, that is why and how it is better?
>
> Personally I always prefer non-retry logic, because it is very easy
> to understand and justify its correctness.
>
> As you prefer otherwise, please share your reasoning in changelog.
>
> Thanks!

At the moment filter removal code is implemented by cls API in following
fashion:

1) tc_del_tfilter() obtains opaque void pointer to filter by calling
tp->ops->get()

2) Pass filter pointer to tfilter_del_notify() which prepares skb with
all necessary info about filter that is being removed and...

3) ... calls tp->ops->delete() to actually delete filter.

Between 1) and 3) filter can be removed concurrently and there is
nothing we can do about it in flower, besides account for that with some
kind of retry logic. I will explain why I prefer cls API to not just
lock whole classifier instance when modifying it in any way in reply to
cls API patch "net: sched: protect filter_chain list with
filter_chain_lock mutex" discussion.
diff mbox series

Patch

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 114cb7876133..bfef7d6c597d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1497,6 +1497,14 @@  static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (!tc_in_hw(fnew->flags))
 		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
+	/* tp was deleted concurrently. EAGAIN will cause caller to lookup proto
+	 * again or create new one, if necessary.
+	 */
+	if (tp->deleting) {
+		err = -EAGAIN;
+		goto errout_hw;
+	}
+
 	refcount_inc(&fnew->refcnt);
 	if (fold) {
 		/* Fold filter was deleted concurrently. Retry lookup. */