Message ID | 1550804401-16232-1-git-send-email-wenxu@ucloud.cn |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net-next,v2] ip_tunnel: Add dst_cache management lwtunnel_state of ip tunnel | expand |
On 2/21/19 10:00 PM, wenxu@ucloud.cn wrote: > From: wenxu <wenxu@ucloud.cn> > > The lwtunnel_state is not init the dst_cache Which make the > ip_md_tunnel_xmit can't use the dst_cache. It will lookup > route table every packets. > > Signed-off-by: wenxu <wenxu@ucloud.cn> > --- > net/ipv4/ip_tunnel_core.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c > index 9a0e67b..f91ef7a 100644 > --- a/net/ipv4/ip_tunnel_core.c > +++ b/net/ipv4/ip_tunnel_core.c > @@ -252,6 +252,12 @@ static int ip_tun_build_state(struct nlattr *attr, > > tun_info = lwt_tun_info(new_state); > > + err = dst_cache_init(&tun_info->dst_cache, GFP_ATOMIC); build_state is called with rtnl held. Unless I am missing something, you don't need ATOMIC here.
On 2/22/2019 11:39 AM, David Ahern wrote: > On 2/21/19 10:00 PM, wenxu@ucloud.cn wrote: >> From: wenxu <wenxu@ucloud.cn> >> >> The lwtunnel_state is not init the dst_cache Which make the >> ip_md_tunnel_xmit can't use the dst_cache. It will lookup >> route table every packets. >> >> Signed-off-by: wenxu <wenxu@ucloud.cn> >> --- >> net/ipv4/ip_tunnel_core.c | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c >> index 9a0e67b..f91ef7a 100644 >> --- a/net/ipv4/ip_tunnel_core.c >> +++ b/net/ipv4/ip_tunnel_core.c >> @@ -252,6 +252,12 @@ static int ip_tun_build_state(struct nlattr *attr, >> >> tun_info = lwt_tun_info(new_state); >> >> + err = dst_cache_init(&tun_info->dst_cache, GFP_ATOMIC); > build_state is called with rtnl held. Unless I am missing something, you > don't need ATOMIC here. If set GFP_KERNEL shows following: Feb 22 07:07:03 A1_V5_02 kernel: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:1301 Feb 22 07:07:03 A1_V5_02 kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 7950, name: ip Feb 22 07:07:03 A1_V5_02 kernel: CPU: 21 PID: 7950 Comm: ip Kdump: loaded Not tainted 5.0.0-rc6-next-20190215 #14 Feb 22 07:07:03 A1_V5_02 kernel: Hardware name: Inspur SA5112M5/YZMB-00870-101, BIOS 4.0.3 05/22/2018 Feb 22 07:07:03 A1_V5_02 kernel: Call Trace: Feb 22 07:07:03 A1_V5_02 kernel: dump_stack+0x5a/0x73 Feb 22 07:07:03 A1_V5_02 kernel: ___might_sleep+0xfc/0x120 Feb 22 07:07:03 A1_V5_02 kernel: mutex_lock_killable+0x1c/0x42 Feb 22 07:07:03 A1_V5_02 kernel: pcpu_alloc+0x390/0x6f0 Feb 22 07:07:03 A1_V5_02 kernel: ? __nla_parse+0xf8/0x160 Feb 22 07:07:03 A1_V5_02 kernel: dst_cache_init+0x1d/0x40 Feb 22 07:07:03 A1_V5_02 kernel: ip_tun_build_state+0x75/0x140 Feb 22 07:07:03 A1_V5_02 kernel: lwtunnel_build_state+0x92/0xf0 Feb 22 07:07:03 A1_V5_02 kernel: fib_create_info+0x74e/0x1020 build_state in the rcu_read_lock and disable the preempt rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[encap_type]); if (likely(ops && ops->build_state && try_module_get(ops->owner))) { found = true; ret = ops->build_state(encap, family, cfg, lws, extack); if (ret) module_put(ops->owner); } rcu_read_unlock();
On 2/21/19 11:14 PM, wenxu wrote: > build_state in the rcu_read_lock and disable the preempt > > rcu_read_lock(); > ops = rcu_dereference(lwtun_encaps[encap_type]); > if (likely(ops && ops->build_state && try_module_get(ops->owner))) { > found = true; > ret = ops->build_state(encap, family, cfg, lws, extack); > if (ret) > module_put(ops->owner); > } > rcu_read_unlock(); > Missed that. Once a reference is taken the rcu_read_lock can be dropped before calling build_state allowing the allocations to be GFP_KERNEL.
On 2019/2/22 下午11:03, David Ahern wrote: > On 2/21/19 11:14 PM, wenxu wrote: >> build_state in the rcu_read_lock and disable the preempt >> >> rcu_read_lock(); >> ops = rcu_dereference(lwtun_encaps[encap_type]); >> if (likely(ops && ops->build_state && try_module_get(ops->owner))) { >> found = true; >> ret = ops->build_state(encap, family, cfg, lws, extack); >> if (ret) >> module_put(ops->owner); >> } >> rcu_read_unlock(); >> > Missed that. > > Once a reference is taken the rcu_read_lock can be dropped before > calling build_state allowing the allocations to be GFP_KERNEL. > I don't think the rcu_read_lock can be dropped. The whole operation of the reference should be protect in the rcu_read_lock that ensure the ops will not be destroy (if it will be)
On 2/22/19 10:21 AM, wenxu wrote: > On 2019/2/22 下午11:03, David Ahern wrote: >> On 2/21/19 11:14 PM, wenxu wrote: >>> build_state in the rcu_read_lock and disable the preempt >>> >>> rcu_read_lock(); >>> ops = rcu_dereference(lwtun_encaps[encap_type]); >>> if (likely(ops && ops->build_state && try_module_get(ops->owner))) { >>> found = true; >>> ret = ops->build_state(encap, family, cfg, lws, extack); >>> if (ret) >>> module_put(ops->owner); >>> } >>> rcu_read_unlock(); >>> >> Missed that. >> >> Once a reference is taken the rcu_read_lock can be dropped before >> calling build_state allowing the allocations to be GFP_KERNEL. >> > I don't think the rcu_read_lock can be dropped. The whole operation of the reference should be protect > > in the rcu_read_lock that ensure the ops will not be destroy (if it will be) > If 'try_module_get(ops->owner)' succeeds then a reference is held so the ops will not go away.
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 9a0e67b..f91ef7a 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -252,6 +252,12 @@ static int ip_tun_build_state(struct nlattr *attr, tun_info = lwt_tun_info(new_state); + err = dst_cache_init(&tun_info->dst_cache, GFP_ATOMIC); + if (err) { + lwtstate_free(new_state); + return err; + } + if (tb[LWTUNNEL_IP_ID]) tun_info->key.tun_id = nla_get_be64(tb[LWTUNNEL_IP_ID]); @@ -278,6 +284,13 @@ static int ip_tun_build_state(struct nlattr *attr, return 0; } +static void ip_tun_destroy_state(struct lwtunnel_state *lwtstate) +{ + struct ip_tunnel_info *tun_info = lwt_tun_info(lwtstate); + + dst_cache_destroy(&tun_info->dst_cache); +} + static int ip_tun_fill_encap_info(struct sk_buff *skb, struct lwtunnel_state *lwtstate) { @@ -313,6 +326,7 @@ static int ip_tun_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b) static const struct lwtunnel_encap_ops ip_tun_lwt_ops = { .build_state = ip_tun_build_state, + .destroy_state = ip_tun_destroy_state, .fill_encap = ip_tun_fill_encap_info, .get_encap_size = ip_tun_encap_nlsize, .cmp_encap = ip_tun_cmp_encap,