diff mbox

[net-next] net_sched: act_bpf: remove spinlock in fast path

Message ID 1438664952-24712-1-git-send-email-ast@plumgrid.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov Aug. 4, 2015, 5:09 a.m. UTC
Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing.

Also similar to gact/mirred there is a race between prog->filter and
prog->tcf_action. Meaning that the program being replaced may use
previous default action if it happened to return TC_ACT_UNSPEC.
act_mirred race betwen tcf_action and tcfm_dev is similar.
In all cases the race is harmless.
Long term we may want to improve the situation by replacing the whole
struct tc_action as single pointer instead of updating inner fields one by one.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/sched/act_bpf.c |   15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

Comments

Daniel Borkmann Aug. 4, 2015, 8:55 a.m. UTC | #1
On 08/04/2015 07:09 AM, Alexei Starovoitov wrote:
> Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing.
>
> Also similar to gact/mirred there is a race between prog->filter and
> prog->tcf_action. Meaning that the program being replaced may use
> previous default action if it happened to return TC_ACT_UNSPEC.
> act_mirred race betwen tcf_action and tcfm_dev is similar.
> In all cases the race is harmless.

Okay, what happens however, when we have an action attached to a
classifier and do a replace on that action, meaning one CPU is still
executing the filter inside tcf_bpf(), while another one is already
running tcf_bpf_cfg_cleanup() on that prog? Afaik, the schedule_work()
that's called during freeing maps/progs might 'mitigate' this race,
but doesn't give a hard guarantee, right?

> Long term we may want to improve the situation by replacing the whole
> struct tc_action as single pointer instead of updating inner fields one by one.
>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Aug. 4, 2015, 4:35 p.m. UTC | #2
On 8/4/15 1:55 AM, Daniel Borkmann wrote:
> Okay, what happens however, when we have an action attached to a
> classifier and do a replace on that action, meaning one CPU is still
> executing the filter inside tcf_bpf(), while another one is already
> running tcf_bpf_cfg_cleanup() on that prog? Afaik, the schedule_work()
> that's called during freeing maps/progs might 'mitigate' this race,
> but doesn't give a hard guarantee, right?

ahh, yes, that's completely different race. tcf_bpf_cfg_cleanup should
be doing call_rcu.
Will respin the patch.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 1b97dabc621a..2b8c47200152 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -43,10 +43,8 @@  static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
 	if (unlikely(!skb_mac_header_was_set(skb)))
 		return TC_ACT_UNSPEC;
 
-	spin_lock(&prog->tcf_lock);
-
-	prog->tcf_tm.lastuse = jiffies;
-	bstats_update(&prog->tcf_bstats, skb);
+	tcf_lastuse_update(&prog->tcf_tm);
+	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
 
 	/* Needed here for accessing maps. */
 	rcu_read_lock();
@@ -77,7 +75,7 @@  static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
 		break;
 	case TC_ACT_SHOT:
 		action = filter_res;
-		prog->tcf_qstats.drops++;
+		qstats_drop_inc(this_cpu_ptr(prog->common.cpu_qstats));
 		break;
 	case TC_ACT_UNSPEC:
 		action = prog->tcf_action;
@@ -87,7 +85,6 @@  static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
 		break;
 	}
 
-	spin_unlock(&prog->tcf_lock);
 	return action;
 }
 
@@ -294,7 +291,7 @@  static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
 	if (!tcf_hash_check(parm->index, act, bind)) {
 		ret = tcf_hash_create(parm->index, est, act,
-				      sizeof(*prog), bind, false);
+				      sizeof(*prog), bind, true);
 		if (ret < 0)
 			return ret;
 
@@ -325,7 +322,7 @@  static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 		goto out;
 
 	prog = to_bpf(act);
-	spin_lock_bh(&prog->tcf_lock);
+	ASSERT_RTNL();
 
 	if (ret != ACT_P_CREATED)
 		tcf_bpf_prog_fill_cfg(prog, &old);
@@ -341,8 +338,6 @@  static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 	prog->tcf_action = parm->action;
 	prog->filter = cfg.filter;
 
-	spin_unlock_bh(&prog->tcf_lock);
-
 	if (res == ACT_P_CREATED)
 		tcf_hash_insert(act);
 	else