diff mbox

[net-next,v2,6/6] vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses

Message ID 95533e3ca4958ba2eca3efcc661713638feea220.1492187126.git.mschiffer@universe-factory.net
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Matthias Schiffer April 14, 2017, 4:44 p.m. UTC
As link-local addresses are only valid for a single interface, we can allow
to use the same VNI for multiple independent VXLANs, as long as the used
interfaces are distinct. This way, VXLANs can always be used as a drop-in
replacement for VLANs with greater ID space.

This also extends VNI lookup to respect the ifindex when link-local IPv6
addresses are used, so using the same VNI on multiple interfaces can
actually work.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
---

v2: use VXLAN_F_IPV6_LINKLOCAL in vxlan_vs_find_vni; rebase.

 drivers/net/vxlan.c | 73 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 47 insertions(+), 26 deletions(-)

Comments

Stephen Hemminger April 14, 2017, 5:38 p.m. UTC | #1
On Fri, 14 Apr 2017 18:44:46 +0200
Matthias Schiffer <mschiffer@universe-factory.net> wrote:

> As link-local addresses are only valid for a single interface, we can allow
> to use the same VNI for multiple independent VXLANs, as long as the used
> interfaces are distinct. This way, VXLANs can always be used as a drop-in
> replacement for VLANs with greater ID space.
> 
> This also extends VNI lookup to respect the ifindex when link-local IPv6
> addresses are used, so using the same VNI on multiple interfaces can
> actually work.
> 
> Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>

Why does this have to be IPv6 specific?

What about the case where VXLAN is not bound to an interface?
If that is used then that could be a problem.
Matthias Schiffer April 16, 2017, 3:15 p.m. UTC | #2
On 04/14/2017 07:38 PM, Stephen Hemminger wrote:
> On Fri, 14 Apr 2017 18:44:46 +0200
> Matthias Schiffer <mschiffer@universe-factory.net> wrote:
> 
>> As link-local addresses are only valid for a single interface, we can allow
>> to use the same VNI for multiple independent VXLANs, as long as the used
>> interfaces are distinct. This way, VXLANs can always be used as a drop-in
>> replacement for VLANs with greater ID space.
>>
>> This also extends VNI lookup to respect the ifindex when link-local IPv6
>> addresses are used, so using the same VNI on multiple interfaces can
>> actually work.
>>
>> Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
> 
> Why does this have to be IPv6 specific?

I'm not familar with IPv4 link-local addresses and how route lookup works
for them. vxlan_get_route() sets flowi4_oif to the outgoing interface; does
__ip_route_output_key_hash() do the right thing for link-local addresses
when such addresses are used on multiple interfaces? I see some special
casing for multicast destinations, but none for link-local ones.

> 
> What about the case where VXLAN is not bound to an interface?
> If that is used then that could be a problem.
> 

With patch 4/6, link-local IPv6 addresses can't be configured without an
interface anymore.

Matthias
Matthias Schiffer June 8, 2017, 6:05 p.m. UTC | #3
On 04/16/2017 05:15 PM, Matthias Schiffer wrote:
> On 04/14/2017 07:38 PM, Stephen Hemminger wrote:
>> On Fri, 14 Apr 2017 18:44:46 +0200
>> Matthias Schiffer <mschiffer@universe-factory.net> wrote:
>>
>>> As link-local addresses are only valid for a single interface, we can allow
>>> to use the same VNI for multiple independent VXLANs, as long as the used
>>> interfaces are distinct. This way, VXLANs can always be used as a drop-in
>>> replacement for VLANs with greater ID space.
>>>
>>> This also extends VNI lookup to respect the ifindex when link-local IPv6
>>> addresses are used, so using the same VNI on multiple interfaces can
>>> actually work.
>>>
>>> Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
>>
>> Why does this have to be IPv6 specific?
> 
> I'm not familar with IPv4 link-local addresses and how route lookup works
> for them. vxlan_get_route() sets flowi4_oif to the outgoing interface; does
> __ip_route_output_key_hash() do the right thing for link-local addresses
> when such addresses are used on multiple interfaces? I see some special
> casing for multicast destinations, but none for link-local ones.
> 

Getting back to this (sorry for the delay, I got caught up in other
projects), I'm seeing the following pros and cons regarding the support of
VXLAN over IPv4 link-local addresses:

+ There should be no technical reason not to support it; as everything is
in the kernel, the usual problems with IPv4 LL (userspace APIs not
supporting passing a scope ID as part of the IP address) don't apply here

+ The code needed to support IPv4 LL should be easy to add

- IPv4 LL semantics aren't as well-defined as for IPv6. While IPv4 LL
addresses are usually in the 169.254.x.y range, the Linux kernel allows
setting the address scope independently of the range for IPv4. In contrast
to this, we need to judge the validity of the configuration based on
syntactic properties of the IP addresses (at least if we don't want to add
a lot of more compexity to the validation, and probably other parts of the
code.) Generally, code that checks for the 169.254.x.y range is uncommon in
the kernel (I think I only found a single instance, somewhere in the SCTP
implementation.)

- IPv4 LL addresses are mostly used for zeroconf; I don't really see a
usecase for zeroconf addresses + VXLANs

- Personally, I have no interest in IPv4


I probably forgot a few more arguments... All in all, I'd like the VXLAN
maintainers to decide if we do want IPv4 LL support or not, and if the
verdict is to support it, I'll implement it in the next revision of my
patchset.

Matthias
diff mbox

Patch

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2995b57a1551..9054245f0770 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -224,7 +224,8 @@  static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
 	return NULL;
 }
 
-static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, __be32 vni)
+static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, int ifindex,
+					   __be32 vni)
 {
 	struct vxlan_dev *vxlan;
 
@@ -233,17 +234,27 @@  static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, __be32 vni)
 		vni = 0;
 
 	hlist_for_each_entry_rcu(vxlan, vni_head(vs, vni), hlist) {
-		if (vxlan->default_dst.remote_vni == vni)
-			return vxlan;
+		if (vxlan->default_dst.remote_vni != vni)
+			continue;
+
+		if (IS_ENABLED(CONFIG_IPV6)) {
+			const struct vxlan_config *cfg = &vxlan->cfg;
+
+			if ((cfg->flags & VXLAN_F_IPV6_LINKLOCAL) &&
+			    cfg->remote_ifindex != ifindex)
+				continue;
+		}
+
+		return vxlan;
 	}
 
 	return NULL;
 }
 
 /* Look up VNI in a per net namespace table */
-static struct vxlan_dev *vxlan_find_vni(struct net *net, __be32 vni,
-					sa_family_t family, __be16 port,
-					u32 flags)
+static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
+					__be32 vni, sa_family_t family,
+					__be16 port, u32 flags)
 {
 	struct vxlan_sock *vs;
 
@@ -251,7 +262,7 @@  static struct vxlan_dev *vxlan_find_vni(struct net *net, __be32 vni,
 	if (!vs)
 		return NULL;
 
-	return vxlan_vs_find_vni(vs, vni);
+	return vxlan_vs_find_vni(vs, ifindex, vni);
 }
 
 /* Fill in neighbour message in skbuff. */
@@ -1342,7 +1353,7 @@  static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 
 	vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
 
-	vxlan = vxlan_vs_find_vni(vs, vni);
+	vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni);
 	if (!vxlan)
 		goto drop;
 
@@ -2006,8 +2017,10 @@  static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
 }
 
 static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
-				 struct vxlan_dev *vxlan, union vxlan_addr *daddr,
-				 __be16 dst_port, __be32 vni, struct dst_entry *dst,
+				 struct vxlan_dev *vxlan,
+				 union vxlan_addr *daddr,
+				 __be16 dst_port, int dst_ifindex, __be32 vni,
+				 struct dst_entry *dst,
 				 u32 rt_flags)
 {
 #if IS_ENABLED(CONFIG_IPV6)
@@ -2023,7 +2036,7 @@  static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
 		struct vxlan_dev *dst_vxlan;
 
 		dst_release(dst);
-		dst_vxlan = vxlan_find_vni(vxlan->net, vni,
+		dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
 					   daddr->sa.sa_family, dst_port,
 					   vxlan->cfg.flags);
 		if (!dst_vxlan) {
@@ -2055,6 +2068,7 @@  static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	struct dst_entry *ndst = NULL;
 	__be32 vni, label;
 	__u8 tos, ttl;
+	int ifindex;
 	int err;
 	u32 flags = vxlan->cfg.flags;
 	bool udp_sum = false;
@@ -2075,6 +2089,7 @@  static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 		dst_port = rdst->remote_port ? rdst->remote_port : vxlan->cfg.dst_port;
 		vni = (rdst->remote_vni) ? : default_vni;
+		ifindex = rdst->remote_ifindex;
 		local_ip = vxlan->cfg.saddr;
 		dst_cache = &rdst->dst_cache;
 		md->gbp = skb->mark;
@@ -2108,6 +2123,7 @@  static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		dst = &remote_ip;
 		dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
 		vni = tunnel_id_to_key32(info->key.tun_id);
+		ifindex = 0;
 		dst_cache = &info->dst_cache;
 		if (info->options_len)
 			md = ip_tunnel_info_opts(info);
@@ -2125,8 +2141,7 @@  static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		struct rtable *rt;
 		__be16 df = 0;
 
-		rt = vxlan_get_route(vxlan, dev, sock4, skb,
-				     rdst ? rdst->remote_ifindex : 0, tos,
+		rt = vxlan_get_route(vxlan, dev, sock4, skb, ifindex, tos,
 				     dst->sin.sin_addr.s_addr,
 				     &local_ip.sin.sin_addr.s_addr,
 				     dst_port, src_port,
@@ -2139,8 +2154,8 @@  static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		/* Bypass encapsulation if the destination is local */
 		if (!info) {
 			err = encap_bypass_if_local(skb, dev, vxlan, dst,
-						    dst_port, vni, &rt->dst,
-						    rt->rt_flags);
+						    dst_port, ifindex, vni,
+						    &rt->dst, rt->rt_flags);
 			if (err)
 				goto out_unlock;
 		} else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) {
@@ -2162,8 +2177,7 @@  static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	} else {
 		struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
 
-		ndst = vxlan6_get_route(vxlan, dev, sock6, skb,
-					rdst ? rdst->remote_ifindex : 0, tos,
+		ndst = vxlan6_get_route(vxlan, dev, sock6, skb, ifindex, tos,
 					label, &dst->sin6.sin6_addr,
 					&local_ip.sin6.sin6_addr,
 					dst_port, src_port,
@@ -2178,8 +2192,8 @@  static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 			u32 rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags;
 
 			err = encap_bypass_if_local(skb, dev, vxlan, dst,
-						    dst_port, vni, ndst,
-						    rt6i_flags);
+						    dst_port, ifindex, vni,
+						    ndst, rt6i_flags);
 			if (err)
 				goto out_unlock;
 		}
@@ -2982,13 +2996,20 @@  static int vxlan_config_validate(struct net *src_net, struct vxlan_config *conf,
 		if (tmp == old)
 			continue;
 
-		if (tmp->cfg.vni == conf->vni &&
-		    tmp->cfg.dst_port == conf->dst_port &&
-		    (tmp->cfg.flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6)) ==
-		    (conf->flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6))) {
-			pr_info("duplicate VNI %u\n", be32_to_cpu(conf->vni));
-			return -EEXIST;
-		}
+		if (tmp->cfg.vni != conf->vni)
+			continue;
+		if (tmp->cfg.dst_port != conf->dst_port)
+			continue;
+		if ((tmp->cfg.flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6)) !=
+		    (conf->flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6)))
+			continue;
+
+		if ((conf->flags & VXLAN_F_IPV6_LINKLOCAL) &&
+		    tmp->cfg.remote_ifindex != conf->remote_ifindex)
+			continue;
+
+		pr_info("duplicate VNI %u\n", be32_to_cpu(conf->vni));
+		return -EEXIST;
 	}
 
 	return 0;