[ovs-dev,RFC] netdev-linux: support lwtunnel vxlan interfaces
diff mbox

Message ID 006eee1adf12134893566ede93a18c7bab009729.1443715703.git.jbenc@redhat.com
State RFC
Headers show

Commit Message

Jiri Benc Oct. 1, 2015, 4:08 p.m. UTC
This is for IPv4 only at this point. Only vxlan is recognized for now.

The usage is as follows:

ip l a type vxlan metadata dstport 4789
ovs-vsctl add-port ovsbridge vxlan0

Now, the encapsulation can be specified using ovs-ofctl, e.g.:

ovs-ofctl add-flow ovs0 'in_port=2 actions=set_field:2->tun_id,set_field:192.168.1.1->tun_dst,1'
ovs-ofctl add-flow ovs0 'in_port=1,tun_id=2 actions=2'

Note that this depends on iproute2 patches that are not merged yet to create
the metadata based vxlan interface.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 lib/netdev-linux.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 75 insertions(+), 3 deletions(-)

Comments

Jiri Benc Oct. 1, 2015, 4:12 p.m. UTC | #1
On Thu,  1 Oct 2015 18:08:51 +0200, Jiri Benc wrote:
> This is for IPv4 only at this point. Only vxlan is recognized for now.
> 
> The usage is as follows:
> 
> ip l a type vxlan metadata dstport 4789
> ovs-vsctl add-port ovsbridge vxlan0
                     ^ should be "ovs0"

> Now, the encapsulation can be specified using ovs-ofctl, e.g.:
> 
> ovs-ofctl add-flow ovs0 'in_port=2 actions=set_field:2->tun_id,set_field:192.168.1.1->tun_dst,1'
> ovs-ofctl add-flow ovs0 'in_port=1,tun_id=2 actions=2'
> 
> Note that this depends on iproute2 patches that are not merged yet to create
> the metadata based vxlan interface.
Pravin B Shelar Oct. 2, 2015, 5:19 a.m. UTC | #2
On Thu, Oct 1, 2015 at 9:08 AM, Jiri Benc <jbenc@redhat.com> wrote:
> This is for IPv4 only at this point. Only vxlan is recognized for now.
>
> The usage is as follows:
>
> ip l a type vxlan metadata dstport 4789
> ovs-vsctl add-port ovsbridge vxlan0
>
> Now, the encapsulation can be specified using ovs-ofctl, e.g.:
>
> ovs-ofctl add-flow ovs0 'in_port=2 actions=set_field:2->tun_id,set_field:192.168.1.1->tun_dst,1'
> ovs-ofctl add-flow ovs0 'in_port=1,tun_id=2 actions=2'
>

Existing ovs-vsctl add-port command for tunnels should work with
lwtunnel inetrfaces. So tunnel vport create needs to switch between
compat vport and lwtunnel interface according to the kernel support.


> Note that this depends on iproute2 patches that are not merged yet to create
> the metadata based vxlan interface.
>
Jiri Benc Oct. 2, 2015, 6:22 a.m. UTC | #3
On Thu, 1 Oct 2015 22:19:10 -0700, Pravin Shelar wrote:
> On Thu, Oct 1, 2015 at 9:08 AM, Jiri Benc <jbenc@redhat.com> wrote:
> > This is for IPv4 only at this point. Only vxlan is recognized for now.
> >
> > The usage is as follows:
> >
> > ip l a type vxlan metadata dstport 4789
> > ovs-vsctl add-port ovsbridge vxlan0
> >
> > Now, the encapsulation can be specified using ovs-ofctl, e.g.:
> >
> > ovs-ofctl add-flow ovs0 'in_port=2 actions=set_field:2->tun_id,set_field:192.168.1.1->tun_dst,1'
> > ovs-ofctl add-flow ovs0 'in_port=1,tun_id=2 actions=2'
> >
> 
> Existing ovs-vsctl add-port command for tunnels should work with
> lwtunnel inetrfaces. So tunnel vport create needs to switch between
> compat vport and lwtunnel interface according to the kernel support.

I admit one of the intents of this patch was to provoke a discussion.
I expected a similar comment and thought about this; several things are
not clear to me:

1. Such interface would have completely different lifespan than the
   corresponding ovsdb entry. After reboot, ovsdb entry for the tunnel
   is present but not the interface. Who should create it? It can
   hardly be ovs-vswitchd, because on ovs-vswitchd restart, the
   interface is still there. Trying to recreate it if it doesn't exist
   doesn't work either because of name spaces and because there may be
   an unrelated interface with the same name but different parameters
   (port, especially) created meanwhile. Note this is very different to
   internal ports which are created by the kernel datapath.

2. How to distinguish whether the kernel supports IPv6 metadata based
   vxlan? Kernel 4.3 will include lwtunnel support but ovs IPv6 tunnel
   support will be only in 4.4. That means that under 4.3, the creation
   of the vxlan lwtunnel succeeds (and would actually do a wrong thing
   for IPv6 from the ovs point of view due to patches that are not
   going to be in 4.3), yet it won't work for IPv6.

The second point is solvable by introducing a new netlink attributes
that allow user space to query capabilities of the kernel data path.
The question is whether it's worth it, as I don't currently see any
reliable solution to the first problem.

On the other hand, delegating creation of the vxlan interface to the
user/admin/network startup scripts solves both problems. The current
way of configuration will still work, because the kernel compat nor
user space compat code is not going to go away. The current setups will
still work and those who need new features (like IPv6 support) will
need to switch to the new way of configuration, and that would be
creating the tunnel interface manually. I actually don't see any
problem with that.

But, if you have a good solution to the first problem while letting
ovs create the interface, I'm not at all opposed to it (although I'm
not sure I'll be the one to implement it).

Thanks,

 Jiri
Pravin B Shelar Oct. 2, 2015, 6:43 p.m. UTC | #4
On Thu, Oct 1, 2015 at 11:22 PM, Jiri Benc <jbenc@redhat.com> wrote:
> On Thu, 1 Oct 2015 22:19:10 -0700, Pravin Shelar wrote:
>> On Thu, Oct 1, 2015 at 9:08 AM, Jiri Benc <jbenc@redhat.com> wrote:
>> > This is for IPv4 only at this point. Only vxlan is recognized for now.
>> >
>> > The usage is as follows:
>> >
>> > ip l a type vxlan metadata dstport 4789
>> > ovs-vsctl add-port ovsbridge vxlan0
>> >
>> > Now, the encapsulation can be specified using ovs-ofctl, e.g.:
>> >
>> > ovs-ofctl add-flow ovs0 'in_port=2 actions=set_field:2->tun_id,set_field:192.168.1.1->tun_dst,1'
>> > ovs-ofctl add-flow ovs0 'in_port=1,tun_id=2 actions=2'
>> >
>>
>> Existing ovs-vsctl add-port command for tunnels should work with
>> lwtunnel inetrfaces. So tunnel vport create needs to switch between
>> compat vport and lwtunnel interface according to the kernel support.
>
> I admit one of the intents of this patch was to provoke a discussion.
> I expected a similar comment and thought about this; several things are
> not clear to me:
>
> 1. Such interface would have completely different lifespan than the
>    corresponding ovsdb entry. After reboot, ovsdb entry for the tunnel
>    is present but not the interface. Who should create it? It can
>    hardly be ovs-vswitchd, because on ovs-vswitchd restart, the
>    interface is still there. Trying to recreate it if it doesn't exist
>    doesn't work either because of name spaces and because there may be
>    an unrelated interface with the same name but different parameters
>    (port, especially) created meanwhile. Note this is very different to
>    internal ports which are created by the kernel datapath.
>

I do not think we need to worry about devices in other namespace here.
About name conflict I see it as general issue not specific to
lwtunnels support.
Anyways I do not see problem if interface is already created by
external process. I just wanted to point out that existing tunnel port
should work with new lwtunnel interface.

> 2. How to distinguish whether the kernel supports IPv6 metadata based
>    vxlan? Kernel 4.3 will include lwtunnel support but ovs IPv6 tunnel
>    support will be only in 4.4. That means that under 4.3, the creation
>    of the vxlan lwtunnel succeeds (and would actually do a wrong thing
>    for IPv6 from the ovs point of view due to patches that are not
>    going to be in 4.3), yet it won't work for IPv6.
>
I do not think it is really related to the issue. we need to solve
this in any case.

> The second point is solvable by introducing a new netlink attributes
> that allow user space to query capabilities of the kernel data path.
> The question is whether it's worth it, as I don't currently see any
> reliable solution to the first problem.
>
> On the other hand, delegating creation of the vxlan interface to the
> user/admin/network startup scripts solves both problems. The current
> way of configuration will still work, because the kernel compat nor
> user space compat code is not going to go away. The current setups will
> still work and those who need new features (like IPv6 support) will
> need to switch to the new way of configuration, and that would be
> creating the tunnel interface manually. I actually don't see any
> problem with that.
>

Even though the compat code is not going away, I do not expect new
features additions to new compat interface, e.g. IPv6 support is not
available using compat interface. So we need to use lwtunnel interface
for tunnel port.
second issue with this approach is there would two VXLAN vports in the
datapath if someone configure tunnels using existing and proposed
mechanism. This is really confusing when these two interface do not
have same features.

Patch
diff mbox

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 584e804b6c8a..93e8ba03906e 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -149,6 +149,9 @@  struct tpacket_auxdata {
 #ifndef IFLA_STATS64
 #define IFLA_STATS64 23
 #endif
+#ifndef IFLA_VXLAN_COLLECT_METADATA
+#define IFLA_VXLAN_COLLECT_METADATA 25
+#endif
 #define rtnl_link_stats64 rpl_rtnl_link_stats64
 struct rtnl_link_stats64 {
     uint64_t rx_packets;
@@ -470,6 +473,8 @@  struct netdev_linux {
 
     /* For devices of class netdev_tap_class only. */
     int tap_fd;
+
+    struct netdev_tunnel_config *tnl_cfg;
 };
 
 struct netdev_rxq_linux {
@@ -514,6 +519,8 @@  static bool netdev_linux_miimon_enabled(void);
 static void netdev_linux_miimon_run(void);
 static void netdev_linux_miimon_wait(void);
 static int netdev_linux_get_mtu__(struct netdev_linux *netdev, int *mtup);
+static struct netdev_tunnel_config *
+cache_tunnel_config(const struct netdev_linux *netdev);
 
 static bool
 is_netdev_linux_class(const struct netdev_class *netdev_class)
@@ -756,6 +763,8 @@  netdev_linux_construct(struct netdev *netdev_)
         }
     }
 
+    netdev->tnl_cfg = cache_tunnel_config(netdev);
+
     return 0;
 }
 
@@ -812,6 +821,8 @@  netdev_linux_destruct(struct netdev *netdev_)
 {
     struct netdev_linux *netdev = netdev_linux_cast(netdev_);
 
+    free(netdev->tnl_cfg);
+
     if (netdev->tc && netdev->tc->ops->tc_destroy) {
         netdev->tc->ops->tc_destroy(netdev->tc);
     }
@@ -836,6 +847,12 @@  netdev_linux_dealloc(struct netdev *netdev_)
     free(netdev);
 }
 
+static const struct netdev_tunnel_config *
+netdev_linux_get_tunnel_config(const struct netdev *netdev_)
+{
+    return netdev_linux_cast(netdev_)->tnl_cfg;
+}
+
 static struct netdev_rxq *
 netdev_linux_rxq_alloc(void)
 {
@@ -2771,8 +2788,8 @@  netdev_linux_update_flags(struct netdev *netdev_, enum netdev_flags off,
     return error;
 }
 
-#define NETDEV_LINUX_CLASS(NAME, CONSTRUCT, GET_STATS,          \
-                           GET_FEATURES, GET_STATUS)            \
+#define NETDEV_LINUX_CLASS(NAME, CONSTRUCT, TUNNEL_CONFIG,      \
+                           GET_STATS, GET_FEATURES, GET_STATUS) \
 {                                                               \
     NAME,                                                       \
                                                                 \
@@ -2786,7 +2803,7 @@  netdev_linux_update_flags(struct netdev *netdev_, enum netdev_flags off,
     netdev_linux_dealloc,                                       \
     NULL,                       /* get_config */                \
     NULL,                       /* set_config */                \
-    NULL,                       /* get_tunnel_config */         \
+    TUNNEL_CONFIG,                                              \
     NULL,                       /* build header */              \
     NULL,                       /* push header */               \
     NULL,                       /* pop header */                \
@@ -2846,6 +2863,7 @@  const struct netdev_class netdev_linux_class =
     NETDEV_LINUX_CLASS(
         "system",
         netdev_linux_construct,
+        netdev_linux_get_tunnel_config,
         netdev_linux_get_stats,
         netdev_linux_get_features,
         netdev_linux_get_status);
@@ -2854,6 +2872,7 @@  const struct netdev_class netdev_tap_class =
     NETDEV_LINUX_CLASS(
         "tap",
         netdev_linux_construct_tap,
+        NULL,
         netdev_tap_get_stats,
         netdev_linux_get_features,
         netdev_linux_get_status);
@@ -2862,6 +2881,7 @@  const struct netdev_class netdev_internal_class =
     NETDEV_LINUX_CLASS(
         "internal",
         netdev_linux_construct,
+        NULL,
         netdev_internal_get_stats,
         NULL,                  /* get_features */
         netdev_internal_get_status);
@@ -5431,6 +5451,58 @@  get_stats_via_netlink(const struct netdev *netdev_, struct netdev_stats *stats)
     return error;
 }
 
+static struct netdev_tunnel_config *
+cache_tunnel_config(const struct netdev_linux *netdev)
+{
+    struct ofpbuf request;
+    struct ofpbuf *reply;
+    struct netdev_tunnel_config *result = NULL;
+    int error;
+
+    ofpbuf_init(&request, 0);
+    nl_msg_put_nlmsghdr(&request,
+                        sizeof(struct ifinfomsg) + NL_ATTR_SIZE(IFNAMSIZ),
+                        RTM_GETLINK, NLM_F_REQUEST);
+    ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
+    nl_msg_put_string(&request, IFLA_IFNAME, netdev_get_name(&netdev->up));
+    error = nl_transact(NETLINK_ROUTE, &request, &reply);
+    ofpbuf_uninit(&request);
+    if (error) {
+        return NULL;
+    }
+
+    if (ofpbuf_try_pull(reply, NLMSG_HDRLEN + sizeof(struct ifinfomsg))) {
+        const struct nlattr *linkinfo = nl_attr_find(reply, 0, IFLA_LINKINFO);
+        const struct nlattr *a, *infodata;
+        if (linkinfo) {
+            a = nl_attr_find_nested(linkinfo, IFLA_INFO_KIND);
+            if (a && !strcmp(nl_attr_get(a), "vxlan")) {
+                infodata = nl_attr_find_nested(linkinfo, IFLA_INFO_DATA);
+                if (infodata)
+                    a = nl_attr_find_nested(infodata, IFLA_VXLAN_COLLECT_METADATA);
+                if (infodata && a && nl_attr_get_u8(a) != 0) {
+                    a = nl_attr_find_nested(infodata, IFLA_VXLAN_PORT);
+                    result = xzalloc(sizeof *result);
+                    result->dst_port = nl_attr_get_be16(a);
+                    result->in_key_flow = true;
+                    result->out_key_flow = true;
+                    result->ip_src_flow = true;
+                    result->ip_dst_flow = true;
+                    result->ttl_inherit = true;
+                    result->tos_inherit = true;
+                } else {
+                    VLOG_WARN_RL(&rl, "vxlan interface not in metadata mode");
+                }
+            }
+        }
+    } else {
+        VLOG_WARN_RL(&rl, "short RTM_GETLINK reply");
+    }
+
+    ofpbuf_delete(reply);
+    return result;
+}
+
 static int
 get_flags(const struct netdev *dev, unsigned int *flags)
 {