diff mbox series

[net] net: vrf: remove redundant vrf neigh entry

Message ID a0fb851b-f9a2-1424-f7f0-2800bb3b469c@huawei.com
State Rejected
Delegated to: David Miller
Headers show
Series [net] net: vrf: remove redundant vrf neigh entry | expand

Commit Message

Miaohe Lin March 22, 2019, 2:10 p.m. UTC
From: Miaohe Lin <linmiaohe@huawei.com>

When vrf->rth is created, it wouldn't change in his lifetime.And in
func vrf_finish_output, we always create a neigh with ip_hdr(skb)->daddr
as key because rth->rt_gateway is equal to 0. But I think we only need
one vrf neigh entry because we strip the ethernet header and reset the
dst_entry in vrf_process_v4_outbound.
So I set rth->rt_gateway to loopback addr(It's ok to set any other
valid ip address, just choose one.). And we would only create one vrf
neigh entry. This helps saving some memory and improving the hitting
rate of neigh lookup.
If there is something I misunderstand, it's very nice of you to
let me know. Thanks a lot.

Signed-off-by: linmiaohe <linmiaohe@huawei.com>
---
 drivers/net/vrf.c | 1 +
 1 file changed, 1 insertion(+)

Comments

David Ahern March 22, 2019, 3:50 p.m. UTC | #1
On 3/22/19 3:10 PM, linmiaohe wrote:
> From: Miaohe Lin <linmiaohe@huawei.com>
> 
> When vrf->rth is created, it wouldn't change in his lifetime.And in
> func vrf_finish_output, we always create a neigh with ip_hdr(skb)->daddr
> as key because rth->rt_gateway is equal to 0. But I think we only need
> one vrf neigh entry because we strip the ethernet header and reset the
> dst_entry in vrf_process_v4_outbound.
> So I set rth->rt_gateway to loopback addr(It's ok to set any other
> valid ip address, just choose one.). And we would only create one vrf
> neigh entry. This helps saving some memory and improving the hitting
> rate of neigh lookup.
> If there is something I misunderstand, it's very nice of you to
> let me know. Thanks a lot.
> 
> Signed-off-by: linmiaohe <linmiaohe@huawei.com>
> ---
>  drivers/net/vrf.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 7c1430ed0244..2b0227fb8f53 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -738,6 +738,7 @@ static int vrf_rtable_create(struct net_device *dev)
>  		return -ENOMEM;
> 
>  	rth->dst.output	= vrf_output;
> +	rth->rt_gateway = htonl(INADDR_LOOPBACK);
> 
>  	rcu_assign_pointer(vrf->rth, rth);
> 

Did you investigate how neighbor entries are getting created? The vrf
device has IFF_NOARP set, so neigh entries should not be created.
Miaohe Lin March 23, 2019, 1:58 a.m. UTC | #2
On 2019/3/22 23:50, David Ahern wrote:
> On 3/22/19 3:10 PM, linmiaohe wrote:
>> From: Miaohe Lin <linmiaohe@huawei.com>
>>
>> When vrf->rth is created, it wouldn't change in his lifetime.And in
>> func vrf_finish_output, we always create a neigh with ip_hdr(skb)->daddr
>> as key because rth->rt_gateway is equal to 0. But I think we only need
>> one vrf neigh entry because we strip the ethernet header and reset the
>> dst_entry in vrf_process_v4_outbound.
>> So I set rth->rt_gateway to loopback addr(It's ok to set any other
>> valid ip address, just choose one.). And we would only create one vrf
>> neigh entry. This helps saving some memory and improving the hitting
>> rate of neigh lookup.
>> If there is something I misunderstand, it's very nice of you to
>> let me know. Thanks a lot.
>>
>> Signed-off-by: linmiaohe <linmiaohe@huawei.com>
>> ---
>>  drivers/net/vrf.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
>> index 7c1430ed0244..2b0227fb8f53 100644
>> --- a/drivers/net/vrf.c
>> +++ b/drivers/net/vrf.c
>> @@ -738,6 +738,7 @@ static int vrf_rtable_create(struct net_device *dev)
>>  		return -ENOMEM;
>>
>>  	rth->dst.output	= vrf_output;
>> +	rth->rt_gateway = htonl(INADDR_LOOPBACK);
>>
>>  	rcu_assign_pointer(vrf->rth, rth);
>>
> 
> Did you investigate how neighbor entries are getting created? The vrf
> device has IFF_NOARP set, so neigh entries should not be created.
> 
> .
> 

Thanks a lot. I searched for how IFF_NOARP works, but I haven't
investigated how neighbor entries are getting created. I will pay
more effort to study neighbor subsystem and thanks for pointing out
that.
Miaohe Lin April 11, 2019, 3:39 a.m. UTC | #3
On 2019/3/22 23:50, David Ahern wrote:
> On 3/22/19 3:10 PM, linmiaohe wrote:
>> From: Miaohe Lin <linmiaohe@huawei.com>
>>
>> When vrf->rth is created, it wouldn't change in his lifetime.And in
>> func vrf_finish_output, we always create a neigh with ip_hdr(skb)->daddr
>> as key because rth->rt_gateway is equal to 0. But I think we only need
>> one vrf neigh entry because we strip the ethernet header and reset the
>> dst_entry in vrf_process_v4_outbound.
>> So I set rth->rt_gateway to loopback addr(It's ok to set any other
>> valid ip address, just choose one.). And we would only create one vrf
>> neigh entry. This helps saving some memory and improving the hitting
>> rate of neigh lookup.
>> If there is something I misunderstand, it's very nice of you to
>> let me know. Thanks a lot.
>>
>> Signed-off-by: linmiaohe <linmiaohe@huawei.com>
>> ---
>>  drivers/net/vrf.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
>> index 7c1430ed0244..2b0227fb8f53 100644
>> --- a/drivers/net/vrf.c
>> +++ b/drivers/net/vrf.c
>> @@ -738,6 +738,7 @@ static int vrf_rtable_create(struct net_device *dev)
>>  		return -ENOMEM;
>>
>>  	rth->dst.output	= vrf_output;
>> +	rth->rt_gateway = htonl(INADDR_LOOPBACK);
>>
>>  	rcu_assign_pointer(vrf->rth, rth);
>>
> 
> Did you investigate how neighbor entries are getting created? The vrf
> device has IFF_NOARP set, so neigh entries should not be created.
> 
> .
> 
Hi,David A.,I investigate how neighbor entries are getting created recently.
But I can't find where neigh entries is not created when vrf device has
IFF_NOARP set.
So I add some printk info,and I ping the different host, here is the output:

[root@localhost ~]# ip vrf exec vrf1 ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.78 ms
^C
--- 10.0.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.776/1.776/1.776/0.000 ms
[root@localhost ~]# ip vrf exec vrf1 ping 11.0.0.2
PING 11.0.0.2 (11.0.0.2) 56(84) bytes of data.
64 bytes from 11.0.0.2: icmp_seq=1 ttl=64 time=1.59 ms
^C
--- 11.0.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.591/1.591/1.591/0.000 ms
[root@localhost ~]# ip vrf exec vrf1 ping 11.0.0.3
PING 11.0.0.3 (11.0.0.3) 56(84) bytes of data.
^C
--- 11.0.0.3 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Apr 11 11:01:48 localhost kernel: [  337.311270] VRF: IFF_NOARP is set
Apr 11 11:01:48 localhost kernel: [  337.311279] VRF: nexthop = 200000a
Apr 11 11:01:48 localhost kernel: [  337.311284] VRF: neigh =           (null) after lookup
Apr 11 11:01:48 localhost kernel: [  337.311294] VRF: we create a neigh 000000001e8acd79
Apr 11 11:01:51 localhost kernel: [  340.026623] VRF: IFF_NOARP is set
Apr 11 11:01:51 localhost kernel: [  340.026627] VRF: nexthop = 200000b
Apr 11 11:01:51 localhost kernel: [  340.026631] VRF: neigh =           (null) after lookup
Apr 11 11:01:51 localhost kernel: [  340.026637] VRF: we create a neigh 00000000a0ad96da
Apr 11 11:01:56 localhost kernel: [  345.157529] VRF: IFF_NOARP is set
Apr 11 11:01:56 localhost kernel: [  345.157539] VRF: nexthop = 300000b
Apr 11 11:01:56 localhost kernel: [  345.157544] VRF: neigh =           (null) after lookup
Apr 11 11:01:56 localhost kernel: [  345.157556] VRF: we create a neigh 00000000a5167b56

And here is the printk code:

if (vrf_dev->flags & IFF_NOARP) {
    printk(KERN_ERR "VRF: IFF_NOARP is set\n");
    rth = rcu_dereference(vrf->rth);
    nexthop = (__force u32)rt_nexthop(rth, ip_hdr(skb)->daddr);
    printk(KERN_ERR "VRF: nexthop = %x\n", nexthop);
    neigh = __ipv4_neigh_lookup_noref(vrf_dev, nexthop);
    printk(KERN_ERR "VRF: neigh = %p after lookup\n", (void *)neigh);
    if (unlikely(!neigh)) {
        neigh = __neigh_create(&arp_tbl, &nexthop, vrf_dev, false);
        printk(KERN_ERR "VRF: we create a neigh %p\n", (void *)neigh);
    }
}

Could you please tell me if I was misunderstanding something again? It's very nice
of you if you can figure me out that. Thanks a lot.I am looking forward to your reply.
David Ahern April 15, 2019, 7:05 p.m. UTC | #4
On 4/10/19 9:39 PM, linmiaohe wrote:
> 
> On 2019/3/22 23:50, David Ahern wrote:
>> On 3/22/19 3:10 PM, linmiaohe wrote:
>>> From: Miaohe Lin <linmiaohe@huawei.com>
>>>
>>> When vrf->rth is created, it wouldn't change in his lifetime.And in
>>> func vrf_finish_output, we always create a neigh with ip_hdr(skb)->daddr
>>> as key because rth->rt_gateway is equal to 0. But I think we only need
>>> one vrf neigh entry because we strip the ethernet header and reset the
>>> dst_entry in vrf_process_v4_outbound.
>>> So I set rth->rt_gateway to loopback addr(It's ok to set any other
>>> valid ip address, just choose one.). And we would only create one vrf
>>> neigh entry. This helps saving some memory and improving the hitting
>>> rate of neigh lookup.
>>> If there is something I misunderstand, it's very nice of you to
>>> let me know. Thanks a lot.
>>>
>>> Signed-off-by: linmiaohe <linmiaohe@huawei.com>
>>> ---
>>>  drivers/net/vrf.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
>>> index 7c1430ed0244..2b0227fb8f53 100644
>>> --- a/drivers/net/vrf.c
>>> +++ b/drivers/net/vrf.c
>>> @@ -738,6 +738,7 @@ static int vrf_rtable_create(struct net_device *dev)
>>>  		return -ENOMEM;
>>>
>>>  	rth->dst.output	= vrf_output;
>>> +	rth->rt_gateway = htonl(INADDR_LOOPBACK);
>>>
>>>  	rcu_assign_pointer(vrf->rth, rth);
>>>
>>
>> Did you investigate how neighbor entries are getting created? The vrf
>> device has IFF_NOARP set, so neigh entries should not be created.
>>
>> .
>>
> Hi,David A.,I investigate how neighbor entries are getting created recently.
> But I can't find where neigh entries is not created when vrf device has
> IFF_NOARP set.
> So I add some printk info,and I ping the different host, here is the output:
> 
> [root@localhost ~]# ip vrf exec vrf1 ping 10.0.0.2
> PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
> 64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.78 ms
> ^C
> --- 10.0.0.2 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 1.776/1.776/1.776/0.000 ms
> [root@localhost ~]# ip vrf exec vrf1 ping 11.0.0.2
> PING 11.0.0.2 (11.0.0.2) 56(84) bytes of data.
> 64 bytes from 11.0.0.2: icmp_seq=1 ttl=64 time=1.59 ms
> ^C
> --- 11.0.0.2 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 1.591/1.591/1.591/0.000 ms
> [root@localhost ~]# ip vrf exec vrf1 ping 11.0.0.3
> PING 11.0.0.3 (11.0.0.3) 56(84) bytes of data.
> ^C
> --- 11.0.0.3 ping statistics ---
> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
> 
> Apr 11 11:01:48 localhost kernel: [  337.311270] VRF: IFF_NOARP is set
> Apr 11 11:01:48 localhost kernel: [  337.311279] VRF: nexthop = 200000a
> Apr 11 11:01:48 localhost kernel: [  337.311284] VRF: neigh =           (null) after lookup
> Apr 11 11:01:48 localhost kernel: [  337.311294] VRF: we create a neigh 000000001e8acd79
> Apr 11 11:01:51 localhost kernel: [  340.026623] VRF: IFF_NOARP is set
> Apr 11 11:01:51 localhost kernel: [  340.026627] VRF: nexthop = 200000b
> Apr 11 11:01:51 localhost kernel: [  340.026631] VRF: neigh =           (null) after lookup
> Apr 11 11:01:51 localhost kernel: [  340.026637] VRF: we create a neigh 00000000a0ad96da
> Apr 11 11:01:56 localhost kernel: [  345.157529] VRF: IFF_NOARP is set
> Apr 11 11:01:56 localhost kernel: [  345.157539] VRF: nexthop = 300000b
> Apr 11 11:01:56 localhost kernel: [  345.157544] VRF: neigh =           (null) after lookup
> Apr 11 11:01:56 localhost kernel: [  345.157556] VRF: we create a neigh 00000000a5167b56
> 
> And here is the printk code:
> 
> if (vrf_dev->flags & IFF_NOARP) {
>     printk(KERN_ERR "VRF: IFF_NOARP is set\n");
>     rth = rcu_dereference(vrf->rth);
>     nexthop = (__force u32)rt_nexthop(rth, ip_hdr(skb)->daddr);
>     printk(KERN_ERR "VRF: nexthop = %x\n", nexthop);
>     neigh = __ipv4_neigh_lookup_noref(vrf_dev, nexthop);
>     printk(KERN_ERR "VRF: neigh = %p after lookup\n", (void *)neigh);
>     if (unlikely(!neigh)) {
>         neigh = __neigh_create(&arp_tbl, &nexthop, vrf_dev, false);
>         printk(KERN_ERR "VRF: we create a neigh %p\n", (void *)neigh);
>     }
> }
> 
> Could you please tell me if I was misunderstanding something again? It's very nice
> of you if you can figure me out that. Thanks a lot.I am looking forward to your reply.
> 

In the above statements you are passing vrf_dev to neigh_create, so of
course it is creating an entry against that device. That should not be
happening. When dst->dev is a vrf device it is for local traffic only
(locally originated to a local address) and there are no neighbor
entries for that. Otherwise, it is an enslaved device and the
neigh_create is always done with it.
diff mbox series

Patch

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 7c1430ed0244..2b0227fb8f53 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -738,6 +738,7 @@  static int vrf_rtable_create(struct net_device *dev)
 		return -ENOMEM;

 	rth->dst.output	= vrf_output;
+	rth->rt_gateway = htonl(INADDR_LOOPBACK);

 	rcu_assign_pointer(vrf->rth, rth);