diff mbox

[net-next,v2] net: Add fib rules at vrf device create

Message ID 1449607685-8910-1-git-send-email-dsa@cumulusnetworks.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

David Ahern Dec. 8, 2015, 8:48 p.m. UTC
VRFs require ip rules for route lookups to work properly. Currently
creating a VRF means instantiating a device and then adding the 4 ip
and ip6 rules:

    ip link add vrf-${VRF} type vrf table ${TBID}
    ip ru add oif vrf-${VRF} table ${TBID}
    ip ru add iif vrf-${VRF} table ${TBID}
    ip -6 ru add oif vrf-${VRF} table $TBID
    ip -6 ru add iif vrf-${VRF} table $TBID

Since the table is required when the vrf device is created the rules can
be inserted automatically lightening the overhead and improving the
user experience (only the ip link add is needed).

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v2
- addressed comments from Nik

 drivers/net/vrf.c       | 122 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/net/fib_rules.h |   3 ++
 net/core/fib_rules.c    |   6 ++-
 3 files changed, 128 insertions(+), 3 deletions(-)

Comments

David Miller Dec. 9, 2015, 2:44 a.m. UTC | #1
From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue,  8 Dec 2015 12:48:05 -0800

> VRFs require ip rules for route lookups to work properly. Currently
> creating a VRF means instantiating a device and then adding the 4 ip
> and ip6 rules:
> 
>     ip link add vrf-${VRF} type vrf table ${TBID}
>     ip ru add oif vrf-${VRF} table ${TBID}
>     ip ru add iif vrf-${VRF} table ${TBID}
>     ip -6 ru add oif vrf-${VRF} table $TBID
>     ip -6 ru add iif vrf-${VRF} table $TBID
> 
> Since the table is required when the vrf device is created the rules can
> be inserted automatically lightening the overhead and improving the
> user experience (only the ip link add is needed).
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> v2
> - addressed comments from Nik

Unfortunately it's too late for this, you should have considered this
issue fully when VRF first went into an upstream release.

If I add your change, the user experience is _worse_.

Users on older kernels have to use the full sequence, then if they
upgrade their kernels to one with this patch then the 'ru add' et
al. commands will fail.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Ahern Dec. 9, 2015, 3:08 a.m. UTC | #2
On 12/8/15 7:44 PM, David Miller wrote:
> Unfortunately it's too late for this, you should have considered this
> issue fully when VRF first went into an upstream release.
>
> If I add your change, the user experience is _worse_.
>
> Users on older kernels have to use the full sequence, then if they
> upgrade their kernels to one with this patch then the 'ru add' et
> al. commands will fail.
>

Perhaps this is another fug (feature bug), they do not fail:

root@kenny-jessie2:~# ip ru ls
0:	from all lookup local
32760:	from all iif vrf-green lookup vrf-green
32761:	from all oif vrf-green lookup vrf-green
32762:	from all iif vrf-blue lookup vrf-blue
32763:	from all oif vrf-blue lookup vrf-blue
32764:	from all iif vrf-red lookup vrf-red
32765:	from all oif vrf-red lookup vrf-red
32766:	from all lookup main
32767:	from all lookup default

root@kenny-jessie2:~# ip ru add oif vrf-red lookup vrf-red

root@kenny-jessie2:~# ip ru ls
0:	from all lookup local
32759:	from all oif vrf-red lookup vrf-red
32760:	from all iif vrf-green lookup vrf-green
32761:	from all oif vrf-green lookup vrf-green
32762:	from all iif vrf-blue lookup vrf-blue
32763:	from all oif vrf-blue lookup vrf-blue
32764:	from all iif vrf-red lookup vrf-red
32765:	from all oif vrf-red lookup vrf-red
32766:	from all lookup main
32767:	from all lookup default
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Ahern Dec. 9, 2015, 3:21 a.m. UTC | #3
On 12/8/15 8:08 PM, David Ahern wrote:
> root@kenny-jessie2:~# ip ru add oif vrf-red lookup vrf-red
>
> root@kenny-jessie2:~# ip ru ls
> 0:    from all lookup local
> 32759:    from all oif vrf-red lookup vrf-red
> 32760:    from all iif vrf-green lookup vrf-green
> 32761:    from all oif vrf-green lookup vrf-green
> 32762:    from all iif vrf-blue lookup vrf-blue
> 32763:    from all oif vrf-blue lookup vrf-blue
> 32764:    from all iif vrf-red lookup vrf-red
> 32765:    from all oif vrf-red lookup vrf-red
> 32766:    from all lookup main
> 32767:    from all lookup default

d'oh. They don't fail in the sense of a user getting an error message 
but they add duplicate entries. So, if I fix the duplicity (ie., don't 
add a second one) would the patch be acceptable?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 9, 2015, 3:40 a.m. UTC | #4
From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue, 8 Dec 2015 20:08:31 -0700

> On 12/8/15 7:44 PM, David Miller wrote:
>> Unfortunately it's too late for this, you should have considered this
>> issue fully when VRF first went into an upstream release.
>>
>> If I add your change, the user experience is _worse_.
>>
>> Users on older kernels have to use the full sequence, then if they
>> upgrade their kernels to one with this patch then the 'ru add' et
>> al. commands will fail.
>>
> 
> Perhaps this is another fug (feature bug), they do not fail:
> 
> root@kenny-jessie2:~# ip ru ls
> 0:	from all lookup local
> 32760:	from all iif vrf-green lookup vrf-green
> 32761:	from all oif vrf-green lookup vrf-green
> 32762:	from all iif vrf-blue lookup vrf-blue
> 32763:	from all oif vrf-blue lookup vrf-blue
> 32764:	from all iif vrf-red lookup vrf-red
> 32765:	from all oif vrf-red lookup vrf-red
> 32766:	from all lookup main
> 32767:	from all lookup default
> 
> root@kenny-jessie2:~# ip ru add oif vrf-red lookup vrf-red

Grrr...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 9, 2015, 3:41 a.m. UTC | #5
From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue, 8 Dec 2015 20:21:51 -0700

> On 12/8/15 8:08 PM, David Ahern wrote:
>> root@kenny-jessie2:~# ip ru add oif vrf-red lookup vrf-red
>>
>> root@kenny-jessie2:~# ip ru ls
>> 0:    from all lookup local
>> 32759:    from all oif vrf-red lookup vrf-red
>> 32760:    from all iif vrf-green lookup vrf-green
>> 32761:    from all oif vrf-green lookup vrf-green
>> 32762:    from all iif vrf-blue lookup vrf-blue
>> 32763:    from all oif vrf-blue lookup vrf-blue
>> 32764:    from all iif vrf-red lookup vrf-red
>> 32765:    from all oif vrf-red lookup vrf-red
>> 32766:    from all lookup main
>> 32767:    from all lookup default
> 
> d'oh. They don't fail in the sense of a user getting an error message
> but they add duplicate entries. So, if I fix the duplicity (ie., don't
> add a second one) would the patch be acceptable?

No, people need to issue all the commands in order for the configuration
to work in all kernels.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roopa Prabhu Dec. 9, 2015, 5:17 a.m. UTC | #6
On 12/8/15, 7:41 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Tue, 8 Dec 2015 20:21:51 -0700
>
>> On 12/8/15 8:08 PM, David Ahern wrote:
>>> root@kenny-jessie2:~# ip ru add oif vrf-red lookup vrf-red
>>>
>>> root@kenny-jessie2:~# ip ru ls
>>> 0:    from all lookup local
>>> 32759:    from all oif vrf-red lookup vrf-red
>>> 32760:    from all iif vrf-green lookup vrf-green
>>> 32761:    from all oif vrf-green lookup vrf-green
>>> 32762:    from all iif vrf-blue lookup vrf-blue
>>> 32763:    from all oif vrf-blue lookup vrf-blue
>>> 32764:    from all iif vrf-red lookup vrf-red
>>> 32765:    from all oif vrf-red lookup vrf-red
>>> 32766:    from all lookup main
>>> 32767:    from all lookup default
>> d'oh. They don't fail in the sense of a user getting an error message
>> but they add duplicate entries. So, if I fix the duplicity (ie., don't
>> add a second one) would the patch be acceptable?
> No, people need to issue all the commands in order for the configuration
> to work in all kernels.
>
agreed. In which case, maybe the rules should not be added by default, but added only under a special vrf dev creation attribute or flag
(something like IFLA_VRF_AUTOCREATE_RULES  ?)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Dec. 9, 2015, 1:04 p.m. UTC | #7
On 12/08/15 at 08:08pm, David Ahern wrote:
> On 12/8/15 7:44 PM, David Miller wrote:
> >Unfortunately it's too late for this, you should have considered this
> >issue fully when VRF first went into an upstream release.
> >
> >If I add your change, the user experience is _worse_.
> >
> >Users on older kernels have to use the full sequence, then if they
> >upgrade their kernels to one with this patch then the 'ru add' et
> >al. commands will fail.
> >
> 
> Perhaps this is another fug (feature bug), they do not fail:
> 
> root@kenny-jessie2:~# ip ru ls
> 0:	from all lookup local
> 32760:	from all iif vrf-green lookup vrf-green
> 32761:	from all oif vrf-green lookup vrf-green
> 32762:	from all iif vrf-blue lookup vrf-blue
> 32763:	from all oif vrf-blue lookup vrf-blue
> 32764:	from all iif vrf-red lookup vrf-red
> 32765:	from all oif vrf-red lookup vrf-red
> 32766:	from all lookup main
> 32767:	from all lookup default
> 
> root@kenny-jessie2:~# ip ru add oif vrf-red lookup vrf-red

This is not a bug. FIB rules can include forward jumps so it's
perfectly valid to have the same rule in the entire chain
multiple times. Just like it is for iptables and multiple chains.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 56abdf224d35..a1f9abb1183d 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -36,6 +36,7 @@ 
 #include <net/route.h>
 #include <net/addrconf.h>
 #include <net/l3mdev.h>
+#include <net/fib_rules.h>
 
 #define RT_FL_TOS(oldflp4) \
 	((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
@@ -50,6 +51,7 @@  struct net_vrf {
 	struct rtable           *rth;
 	struct rt6_info		*rt6;
 	u32                     tb_id;
+	u32                     pref;
 };
 
 struct pcpu_dstats {
@@ -809,6 +811,114 @@  static const struct ethtool_ops vrf_ethtool_ops = {
 	.get_drvinfo	= vrf_get_drvinfo,
 };
 
+static inline size_t vrf_fib_rule_nl_size(bool have_pref)
+{
+	size_t sz;
+
+	sz = NLMSG_ALIGN(sizeof(struct fib_rule_hdr))
+			 + nla_total_size(IFNAMSIZ)     /* FRA_{I,O}IFNAME */
+			 + nla_total_size(sizeof(u32)); /* FRA_TABLE */
+
+	if (have_pref)
+		sz += nla_total_size(sizeof(u32));      /* FRA_PRIORITY */
+
+	return sz;
+}
+
+static int vrf_fib_rule(const struct net_device *dev, __u8 family,
+			int if_type, bool add_it)
+{
+	const struct net_vrf *vrf = netdev_priv(dev);
+	struct fib_rule_hdr *frh;
+	struct nlmsghdr *nlh;
+	struct sk_buff *skb;
+	int err;
+
+	skb = nlmsg_new(vrf_fib_rule_nl_size(!!vrf->pref), GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	nlh = nlmsg_put(skb, 0, 0, 0, sizeof(*frh), 0);
+	if (!nlh)
+		goto nla_put_failure;
+
+	frh = nlmsg_data(nlh);
+	memset(frh, 0, sizeof(*frh));
+	frh->family = family;
+	frh->action = FR_ACT_TO_TBL;
+
+	if (nla_put_u32(skb, FRA_TABLE, vrf->tb_id))
+		goto nla_put_failure;
+
+	if (nla_put_string(skb, if_type, dev->name))
+		goto nla_put_failure;
+
+	if (vrf->pref) {
+		if (nla_put_u32(skb, FRA_PRIORITY, vrf->pref))
+			goto nla_put_failure;
+	}
+
+	nlmsg_end(skb, nlh);
+
+	/* fib_nl_{new,del}rule handling looks for net from skb->sk */
+	skb->sk = dev_net(dev)->rtnl;
+	if (add_it) {
+		err = fib_nl_newrule(skb, nlh);
+	} else {
+		err = fib_nl_delrule(skb, nlh);
+		if (err == -ENOENT)
+			err = 0;
+	}
+
+	nlmsg_free(skb);
+
+	return err;
+
+nla_put_failure:
+	nlmsg_free(skb);
+
+	return -EMSGSIZE;
+}
+
+static void vrf_del_fib_rules(const struct net_device *dev)
+{
+	if (vrf_fib_rule(dev, AF_INET,  FRA_IIFNAME, 0) ||
+	    vrf_fib_rule(dev, AF_INET,  FRA_OIFNAME, 0) ||
+	    vrf_fib_rule(dev, AF_INET6, FRA_IIFNAME, 0) ||
+	    vrf_fib_rule(dev, AF_INET6, FRA_OIFNAME, 0)) {
+		netdev_err(dev, "Failed to delete FIB rules.\n");
+	}
+}
+
+static int vrf_add_fib_rules(const struct net_device *dev)
+{
+	int err;
+
+	err = vrf_fib_rule(dev, AF_INET,  FRA_IIFNAME, 1);
+	if (err < 0)
+		goto out_err;
+
+	err = vrf_fib_rule(dev, AF_INET,  FRA_OIFNAME, 1);
+	if (err < 0)
+		goto out_err;
+
+	err = vrf_fib_rule(dev, AF_INET6, FRA_IIFNAME, 1);
+	if (err < 0)
+		goto out_err;
+
+	err = vrf_fib_rule(dev, AF_INET6, FRA_OIFNAME, 1);
+	if (err < 0)
+		goto out_err;
+
+	return 0;
+
+out_err:
+	netdev_err(dev, "Failed to add FIB rules.\n");
+	vrf_del_fib_rules(dev);
+
+	return err;
+}
+
 static void vrf_setup(struct net_device *dev)
 {
 	ether_setup(dev);
@@ -842,6 +952,7 @@  static int vrf_validate(struct nlattr *tb[], struct nlattr *data[])
 
 static void vrf_dellink(struct net_device *dev, struct list_head *head)
 {
+	vrf_del_fib_rules(dev);
 	unregister_netdevice_queue(dev, head);
 }
 
@@ -849,6 +960,7 @@  static int vrf_newlink(struct net *src_net, struct net_device *dev,
 		       struct nlattr *tb[], struct nlattr *data[])
 {
 	struct net_vrf *vrf = netdev_priv(dev);
+	int err;
 
 	if (!data || !data[IFLA_VRF_TABLE])
 		return -EINVAL;
@@ -857,7 +969,15 @@  static int vrf_newlink(struct net *src_net, struct net_device *dev,
 
 	dev->priv_flags |= IFF_L3MDEV_MASTER;
 
-	return register_netdevice(dev);
+	err = register_netdevice(dev);
+	if (err)
+		goto out;
+
+	err = vrf_add_fib_rules(dev);
+	if (err)
+		unregister_netdevice(dev);
+out:
+	return err;
 }
 
 static size_t vrf_nl_getsize(const struct net_device *dev)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 59160de702b6..0b76f81345c9 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -117,4 +117,7 @@  int fib_rules_lookup(struct fib_rules_ops *, struct flowi *, int flags,
 		     struct fib_lookup_arg *);
 int fib_default_rule_add(struct fib_rules_ops *, u32 pref, u32 table,
 			 u32 flags);
+
+int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh);
+int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr *nlh);
 #endif
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 365de66436ac..a5068c558bfb 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -265,7 +265,7 @@  static int validate_rulemsg(struct fib_rule_hdr *frh, struct nlattr **tb,
 	return err;
 }
 
-static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh)
+int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct fib_rule_hdr *frh = nlmsg_data(nlh);
@@ -424,8 +424,9 @@  static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh)
 	rules_ops_put(ops);
 	return err;
 }
+EXPORT_SYMBOL_GPL(fib_nl_newrule);
 
-static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
+int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct fib_rule_hdr *frh = nlmsg_data(nlh);
@@ -536,6 +537,7 @@  static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
 	rules_ops_put(ops);
 	return err;
 }
+EXPORT_SYMBOL_GPL(fib_nl_delrule);
 
 static inline size_t fib_rule_nlmsg_size(struct fib_rules_ops *ops,
 					 struct fib_rule *rule)