diff mbox

[net-next] vxlan: revert "vxlan: Bypass encapsulation if the destination is local"

Message ID 1365724799.3563.5.camel@sridhar.usor.ibm.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Sridhar Samudrala April 11, 2013, 11:59 p.m. UTC
On Thu, 2013-04-11 at 01:55 -0400, Cong Wang wrote:
> 
> ----- Original Message -----
> > On 4/10/2013 7:10 PM, Cong Wang wrote:
> > >> - when source and destination endpoints belonging to different vni's
> > >>    are on 2 different bridges on the same host. encap bypass is done
> > >>    in this scenario by checking if rt_flags has RTCF_LOCAL set. I think
> > >>    you must be hitting this path and the following patch should fix
> > >>    it by only doing bypass if the source and dest devices belong to
> > >>    the same net. Can you try it and see if it fixes your tests?
> > > I just tested it, unfortunately it doesn't work, the bug still exists.
> > >
> > > If you need any other info, please let me know.
> > So does it mean that you are hitting the if condition that does encap
> > bypass
> > even afterthe net_eq() check? Do the tests pass If you comment out the
> > 'if' block?
> 
> Yes, after adding a printk inside the 'if' block, I got:
> 
> [   71.456329] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   71.596551] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
> [   72.028574] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   72.436384] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
> [   73.028576] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   73.185134] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   73.436582] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
> [   74.184251] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> 
> It seems the dst dev is the dev which vxlan0 setup on, so
> there is no way to know if the packet is targeted for a different netns
> on the same host, at least I don't find such RTCF_* flag.

OK. i was able to setup vxlan between 2 net namespaces and reproduce
the issue.

The following patch fixes the issue for me. can you try it out?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Amerigo Wang April 12, 2013, 8:05 a.m. UTC | #1
----- Original Message -----
> 
> The following patch fixes the issue for me. can you try it out?
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 9a64715..d6509de 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1013,7 +1013,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb,
> struct net_device *dev,
>  	}
>  
>  	/* Bypass encapsulation if the destination is local */
> -	if (rt->rt_flags & RTCF_LOCAL) {
> +	if (rt->dst.dev->flags & IFF_LOOPBACK) {
>  		struct vxlan_dev *dst_vxlan;
>  
>  		ip_rt_put(rt);
> 

It almost surely can fix the problem, but do you really just want to bypass
encap for loopback devcie? Not all local devices?

The title of your original commit "vxlan: Bypass encapsulation if the destination is local"
is confusing...

(Sorry for the delay, I am on vacation.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 12, 2013, 7:19 p.m. UTC | #2
From: Sridhar Samudrala <sri@us.ibm.com>
Date: Thu, 11 Apr 2013 16:59:59 -0700

> @@ -1013,7 +1013,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  	}
>  
>  	/* Bypass encapsulation if the destination is local */
> -	if (rt->rt_flags & RTCF_LOCAL) {
> +	if (rt->dst.dev->flags & IFF_LOOPBACK) {
>  		struct vxlan_dev *dst_vxlan;

This looks terrible, and ad-hoc.  You need to find a cleaner
way to express exactly what you're trying to do otherwise I'm
reverting your change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sridhar Samudrala April 12, 2013, 11:07 p.m. UTC | #3
On 4/12/2013 12:19 PM, David Miller wrote:
> From: Sridhar Samudrala <sri@us.ibm.com>
> Date: Thu, 11 Apr 2013 16:59:59 -0700
>
>> @@ -1013,7 +1013,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>>   	}
>>   
>>   	/* Bypass encapsulation if the destination is local */
>> -	if (rt->rt_flags & RTCF_LOCAL) {
>> +	if (rt->dst.dev->flags & IFF_LOOPBACK) {
>>   		struct vxlan_dev *dst_vxlan;
> This looks terrible, and ad-hoc.  You need to find a cleaner
> way to express exactly what you're trying to do otherwise I'm
> reverting your change.
The idea is to bypass encap if the destination vxlan endpoint is on the same
host. To do this, i thought it is good enough to check if the route to reach
the destination ip is a local route.(RTCF_LOCAL is set).
This works fine for all the cases where destination ip is assigned to a
normal ethernet device. rt->dst.dev points to loopback device.

In case of veth(veth0,veth1), where the peer(veth1) and the destination 
vxlan
endpoint are on a different netns, ip_route_output for veth1's ipis 
returning
a route entry that has RTCF_LOCAL set and rt->dst.dev pointing to veth0.
Is it a bug that RTCF_LOCAL is set here when veth0's peer is moved to a
different netns?
We don't want to bypass encap in this scenario.

To address this behavior seen with veth, i had to change the if 
condition to
check for rt->dst.dev->flags rather than rt->rt_flags.

Thanks
Sridhar






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 12, 2013, 11:17 p.m. UTC | #4
From: Sridhar Samudrala <sri@us.ibm.com>
Date: Fri, 12 Apr 2013 16:07:41 -0700

> Is it a bug that RTCF_LOCAL is set here when veth0's peer is moved
> to a different netns?

Nope.

RTCF_LOCAL is set in several situations.

For example, it would be set on a broadcast route for a subnet we are
on.  It'd also be set on a multicast route for a multicast group the
local system is subscribed to.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amerigo Wang April 15, 2013, 2:31 a.m. UTC | #5
On Fri, 2013-04-12 at 16:07 -0700, Sridhar Samudrala wrote:
> To address this behavior seen with veth, i had to change the if 
> condition to
> check for rt->dst.dev->flags rather than rt->rt_flags. 

There is no specific IFF_ flag for veth, nor I think introducing one is
a correct fix.

I think we can just revert it for now, since it is not very easy to fix.
You can, of course, send a bug-free version later. This regression
blocks my VXLAN IPv6 support patches, by the way.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sridhar Samudrala April 15, 2013, 4:20 a.m. UTC | #6
On 4/14/2013 7:31 PM, Cong Wang wrote:
> On Fri, 2013-04-12 at 16:07 -0700, Sridhar Samudrala wrote:
>> To address this behavior seen with veth, i had to change the if
>> condition to
>> check for rt->dst.dev->flags rather than rt->rt_flags.
> There is no specific IFF_ flag for veth, nor I think introducing one is
> a correct fix.
>
> I think we can just revert it for now, since it is not very easy to fix.
> You can, of course, send a bug-free version later. This regression
> blocks my VXLAN IPv6 support patches, by the way.
>
>
I am not suggesting adding a new IFF_ flag for veth. I was referring to the
IFF_LOOPBACKflag and it should work fine for your setup.

However, i think Mike Rapaport't patch that adds a check to test for 
multicast/
broadcast routeisa better fix. Did you try that patch?
I tried it in my setup using both vxlan/DOVE configuration between VMs 
in the
same networknamespace on 2 different bridges as well as the veth 
configuration
between 2 network namespsaces.

Could you try his patch in your configuration. I think it will work and 
if so
we should go withthat patch.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9a64715..d6509de 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1013,7 +1013,7 @@  static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	}
 
 	/* Bypass encapsulation if the destination is local */
-	if (rt->rt_flags & RTCF_LOCAL) {
+	if (rt->dst.dev->flags & IFF_LOOPBACK) {
 		struct vxlan_dev *dst_vxlan;
 
 		ip_rt_put(rt);