Message ID | 20171106214730.24421-1-xiyou.wangcong@gmail.com |
---|---|
Headers | show |
Series | net_sched: close the race between call_rcu() and cleanup_net() | expand |
From: Cong Wang <xiyou.wangcong@gmail.com> Date: Mon, 6 Nov 2017 13:47:17 -0800 > This patchset tries to fix the race between call_rcu() and > cleanup_net() again. Without holding the netns refcnt the > tc_action_net_exit() in netns workqueue could be called before > filter destroy works in tc filter workqueue. This patchset > moves the netns refcnt from tc actions to tcf_exts, without > breaking per-netns tc actions. > > Patch 1 reverts the previous fix, patch 2 introduces two new > API's to help to address the bug and the rest patches switch > to the new API's. Please see each patch for details. > > I was not able to reproduce this bug, but now after adding > some delay in filter destroy work I manage to trigger the > crash. After this patchset, the crash is not reproducible > any more and the debugging printk's show the order is expected > too. > > Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") > Reported-by: Lucas Bates <lucasb@mojatatu.com> > Cc: Lucas Bates <lucasb@mojatatu.com> > Cc: Jamal Hadi Salim <jhs@mojatatu.com> > Cc: Jiri Pirko <jiri@resnulli.us> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> I have to say this was a lot of churn so late in the release cycle.... but I ended up pulling anyways. I cannot guarantee that I will be able to push this to Linus in time for 4.14-final. Thanks.
On Wed, Nov 8, 2017 at 6:26 PM, David Miller <davem@davemloft.net> wrote: > From: Cong Wang <xiyou.wangcong@gmail.com> > Date: Mon, 6 Nov 2017 13:47:17 -0800 > >> This patchset tries to fix the race between call_rcu() and >> cleanup_net() again. Without holding the netns refcnt the >> tc_action_net_exit() in netns workqueue could be called before >> filter destroy works in tc filter workqueue. This patchset >> moves the netns refcnt from tc actions to tcf_exts, without >> breaking per-netns tc actions. >> >> Patch 1 reverts the previous fix, patch 2 introduces two new >> API's to help to address the bug and the rest patches switch >> to the new API's. Please see each patch for details. >> >> I was not able to reproduce this bug, but now after adding >> some delay in filter destroy work I manage to trigger the >> crash. After this patchset, the crash is not reproducible >> any more and the debugging printk's show the order is expected >> too. >> >> Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") >> Reported-by: Lucas Bates <lucasb@mojatatu.com> >> Cc: Lucas Bates <lucasb@mojatatu.com> >> Cc: Jamal Hadi Salim <jhs@mojatatu.com> >> Cc: Jiri Pirko <jiri@resnulli.us> >> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > > I have to say this was a lot of churn so late in the release > cycle.... but I ended up pulling anyways. > > I cannot guarantee that I will be able to push this to Linus > in time for 4.14-final. > Yeah, it is kinda too late. I'd suggest you to just push the revert to Linus for 4.14 and defer the rest to 4.15, if it's doable... Thanks!
This patchset tries to fix the race between call_rcu() and cleanup_net() again. Without holding the netns refcnt the tc_action_net_exit() in netns workqueue could be called before filter destroy works in tc filter workqueue. This patchset moves the netns refcnt from tc actions to tcf_exts, without breaking per-netns tc actions. Patch 1 reverts the previous fix, patch 2 introduces two new API's to help to address the bug and the rest patches switch to the new API's. Please see each patch for details. I was not able to reproduce this bug, but now after adding some delay in filter destroy work I manage to trigger the crash. After this patchset, the crash is not reproducible any more and the debugging printk's show the order is expected too. Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") Reported-by: Lucas Bates <lucasb@mojatatu.com> Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Cong Wang (13): Revert "net_sched: hold netns refcnt for each action" net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net() cls_basic: use tcf_exts_get_net() before call_rcu() cls_bpf: use tcf_exts_get_net() before call_rcu() cls_cgroup: use tcf_exts_get_net() before call_rcu() cls_flow: use tcf_exts_get_net() before call_rcu() cls_flower: use tcf_exts_get_net() before call_rcu() cls_fw: use tcf_exts_get_net() before call_rcu() cls_matchall: use tcf_exts_get_net() before call_rcu() cls_route: use tcf_exts_get_net() before call_rcu() cls_rsvp: use tcf_exts_get_net() before call_rcu() cls_tcindex: use tcf_exts_get_net() before call_rcu() cls_u32: use tcf_exts_get_net() before call_rcu() include/net/act_api.h | 4 +--- include/net/pkt_cls.h | 24 ++++++++++++++++++++++++ net/sched/act_api.c | 2 -- net/sched/act_bpf.c | 2 +- net/sched/act_connmark.c | 2 +- net/sched/act_csum.c | 2 +- net/sched/act_gact.c | 2 +- net/sched/act_ife.c | 2 +- net/sched/act_ipt.c | 4 ++-- net/sched/act_mirred.c | 2 +- net/sched/act_nat.c | 2 +- net/sched/act_pedit.c | 2 +- net/sched/act_police.c | 2 +- net/sched/act_sample.c | 2 +- net/sched/act_simple.c | 2 +- net/sched/act_skbedit.c | 2 +- net/sched/act_skbmod.c | 2 +- net/sched/act_tunnel_key.c | 2 +- net/sched/act_vlan.c | 2 +- net/sched/cls_api.c | 1 + net/sched/cls_basic.c | 20 +++++++++++++++----- net/sched/cls_bpf.c | 7 ++++++- net/sched/cls_cgroup.c | 24 ++++++++++++++++++------ net/sched/cls_flow.c | 24 ++++++++++++++++++------ net/sched/cls_flower.c | 16 +++++++++++++--- net/sched/cls_fw.c | 17 ++++++++++++++--- net/sched/cls_matchall.c | 15 ++++++++++++--- net/sched/cls_route.c | 17 ++++++++++++++--- net/sched/cls_rsvp.h | 15 ++++++++++++--- net/sched/cls_tcindex.c | 33 ++++++++++++++++++++++++++------- net/sched/cls_u32.c | 8 +++++++- 31 files changed, 198 insertions(+), 63 deletions(-)