diff mbox series

[net-next,v2] ip_tunnel: Add dst_cache management lwtunnel_state of ip tunnel

Message ID 1550804401-16232-1-git-send-email-wenxu@ucloud.cn
State Changes Requested
Delegated to: David Miller
Headers show
Series [net-next,v2] ip_tunnel: Add dst_cache management lwtunnel_state of ip tunnel | expand

Commit Message

wenxu Feb. 22, 2019, 3 a.m. UTC
From: wenxu <wenxu@ucloud.cn>

The lwtunnel_state is not init the dst_cache Which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 net/ipv4/ip_tunnel_core.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

David Ahern Feb. 22, 2019, 3:39 a.m. UTC | #1
On 2/21/19 10:00 PM, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> The lwtunnel_state is not init the dst_cache Which make the
> ip_md_tunnel_xmit can't use the dst_cache. It will lookup
> route table every packets.
> 
> Signed-off-by: wenxu <wenxu@ucloud.cn>
> ---
>  net/ipv4/ip_tunnel_core.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> index 9a0e67b..f91ef7a 100644
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -252,6 +252,12 @@ static int ip_tun_build_state(struct nlattr *attr,
>  
>  	tun_info = lwt_tun_info(new_state);
>  
> +	err = dst_cache_init(&tun_info->dst_cache, GFP_ATOMIC);

build_state is called with rtnl held. Unless I am missing something, you
don't need ATOMIC here.
wenxu Feb. 22, 2019, 4:14 a.m. UTC | #2
On 2/22/2019 11:39 AM, David Ahern wrote:
> On 2/21/19 10:00 PM, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> The lwtunnel_state is not init the dst_cache Which make the
>> ip_md_tunnel_xmit can't use the dst_cache. It will lookup
>> route table every packets.
>>
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>>  net/ipv4/ip_tunnel_core.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
>> index 9a0e67b..f91ef7a 100644
>> --- a/net/ipv4/ip_tunnel_core.c
>> +++ b/net/ipv4/ip_tunnel_core.c
>> @@ -252,6 +252,12 @@ static int ip_tun_build_state(struct nlattr *attr,
>>  
>>  	tun_info = lwt_tun_info(new_state);
>>  
>> +	err = dst_cache_init(&tun_info->dst_cache, GFP_ATOMIC);
> build_state is called with rtnl held. Unless I am missing something, you
> don't need ATOMIC here.

If set GFP_KERNEL shows following:

Feb 22 07:07:03 A1_V5_02 kernel: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:1301
Feb 22 07:07:03 A1_V5_02 kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 7950, name: ip
Feb 22 07:07:03 A1_V5_02 kernel: CPU: 21 PID: 7950 Comm: ip Kdump: loaded Not tainted 5.0.0-rc6-next-20190215 #14
Feb 22 07:07:03 A1_V5_02 kernel: Hardware name: Inspur SA5112M5/YZMB-00870-101, BIOS 4.0.3 05/22/2018
Feb 22 07:07:03 A1_V5_02 kernel: Call Trace:
Feb 22 07:07:03 A1_V5_02 kernel: dump_stack+0x5a/0x73
Feb 22 07:07:03 A1_V5_02 kernel: ___might_sleep+0xfc/0x120
Feb 22 07:07:03 A1_V5_02 kernel: mutex_lock_killable+0x1c/0x42
Feb 22 07:07:03 A1_V5_02 kernel: pcpu_alloc+0x390/0x6f0
Feb 22 07:07:03 A1_V5_02 kernel: ? __nla_parse+0xf8/0x160
Feb 22 07:07:03 A1_V5_02 kernel: dst_cache_init+0x1d/0x40
Feb 22 07:07:03 A1_V5_02 kernel: ip_tun_build_state+0x75/0x140
Feb 22 07:07:03 A1_V5_02 kernel: lwtunnel_build_state+0x92/0xf0

Feb 22 07:07:03 A1_V5_02 kernel: fib_create_info+0x74e/0x1020


build_state in the rcu_read_lock and disable the preempt

 rcu_read_lock();
    ops = rcu_dereference(lwtun_encaps[encap_type]);
    if (likely(ops && ops->build_state && try_module_get(ops->owner))) {
        found = true;
        ret = ops->build_state(encap, family, cfg, lws, extack);
        if (ret)
            module_put(ops->owner);
    }  
    rcu_read_unlock();
David Ahern Feb. 22, 2019, 3:03 p.m. UTC | #3
On 2/21/19 11:14 PM, wenxu wrote:
> build_state in the rcu_read_lock and disable the preempt
> 
>  rcu_read_lock();
>     ops = rcu_dereference(lwtun_encaps[encap_type]);
>     if (likely(ops && ops->build_state && try_module_get(ops->owner))) {
>         found = true;
>         ret = ops->build_state(encap, family, cfg, lws, extack);
>         if (ret)
>             module_put(ops->owner);
>     }  
>     rcu_read_unlock();
> 

Missed that.

Once a reference is taken the rcu_read_lock can be dropped before
calling build_state allowing the allocations to be GFP_KERNEL.
wenxu Feb. 22, 2019, 3:21 p.m. UTC | #4
On 2019/2/22 下午11:03, David Ahern wrote:
> On 2/21/19 11:14 PM, wenxu wrote:
>> build_state in the rcu_read_lock and disable the preempt
>>
>>  rcu_read_lock();
>>     ops = rcu_dereference(lwtun_encaps[encap_type]);
>>     if (likely(ops && ops->build_state && try_module_get(ops->owner))) {
>>         found = true;
>>         ret = ops->build_state(encap, family, cfg, lws, extack);
>>         if (ret)
>>             module_put(ops->owner);
>>     }  
>>     rcu_read_unlock();
>>
> Missed that.
>
> Once a reference is taken the rcu_read_lock can be dropped before
> calling build_state allowing the allocations to be GFP_KERNEL.
>
I don't think the rcu_read_lock can be dropped. The whole operation of the reference should be protect

in the rcu_read_lock that ensure the ops will not be destroy (if it will be)
David Ahern Feb. 22, 2019, 10:16 p.m. UTC | #5
On 2/22/19 10:21 AM, wenxu wrote:
> On 2019/2/22 下午11:03, David Ahern wrote:
>> On 2/21/19 11:14 PM, wenxu wrote:
>>> build_state in the rcu_read_lock and disable the preempt
>>>
>>>  rcu_read_lock();
>>>     ops = rcu_dereference(lwtun_encaps[encap_type]);
>>>     if (likely(ops && ops->build_state && try_module_get(ops->owner))) {
>>>         found = true;
>>>         ret = ops->build_state(encap, family, cfg, lws, extack);
>>>         if (ret)
>>>             module_put(ops->owner);
>>>     }  
>>>     rcu_read_unlock();
>>>
>> Missed that.
>>
>> Once a reference is taken the rcu_read_lock can be dropped before
>> calling build_state allowing the allocations to be GFP_KERNEL.
>>
> I don't think the rcu_read_lock can be dropped. The whole operation of the reference should be protect
> 
> in the rcu_read_lock that ensure the ops will not be destroy (if it will be)
> 

If 'try_module_get(ops->owner)' succeeds then a reference is held so the
ops will not go away.
diff mbox series

Patch

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 9a0e67b..f91ef7a 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -252,6 +252,12 @@  static int ip_tun_build_state(struct nlattr *attr,
 
 	tun_info = lwt_tun_info(new_state);
 
+	err = dst_cache_init(&tun_info->dst_cache, GFP_ATOMIC);
+	if (err) {
+		lwtstate_free(new_state);
+		return err;
+	}
+
 	if (tb[LWTUNNEL_IP_ID])
 		tun_info->key.tun_id = nla_get_be64(tb[LWTUNNEL_IP_ID]);
 
@@ -278,6 +284,13 @@  static int ip_tun_build_state(struct nlattr *attr,
 	return 0;
 }
 
+static void ip_tun_destroy_state(struct lwtunnel_state *lwtstate)
+{
+	struct ip_tunnel_info *tun_info = lwt_tun_info(lwtstate);
+
+	dst_cache_destroy(&tun_info->dst_cache);
+}
+
 static int ip_tun_fill_encap_info(struct sk_buff *skb,
 				  struct lwtunnel_state *lwtstate)
 {
@@ -313,6 +326,7 @@  static int ip_tun_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b)
 
 static const struct lwtunnel_encap_ops ip_tun_lwt_ops = {
 	.build_state = ip_tun_build_state,
+	.destroy_state = ip_tun_destroy_state,
 	.fill_encap = ip_tun_fill_encap_info,
 	.get_encap_size = ip_tun_encap_nlsize,
 	.cmp_encap = ip_tun_cmp_encap,