diff mbox series

[ovs-dev,v8] Bareudp Tunnel Support

Message ID 20201207033211.2621-1-martinvarghesenokia@gmail.com
State Changes Requested
Headers show
Series [ovs-dev,v8] Bareudp Tunnel Support | expand

Commit Message

Martin Varghese Dec. 7, 2020, 3:32 a.m. UTC
From: Martin Varghese <martin.varghese@nokia.com>

There are various L3 encapsulation standards using UDP being discussed to
leverage the UDP based load balancing capability of different networks.
MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.

The Bareudp tunnel provides a generic L3 encapsulation support for
tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP
tunnel.

An example to create bareudp device to tunnel MPLS traffic is
given

$ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \
             type=bareudp options:remote_ip=2.1.1.3
             options:local_ip=2.1.1.2 \
             options:payload_type=0x8847 options:dst_port=6635 \
             options:packet_type="legacy_l3" \
             ofport_request=$bareudp_egress_port

The bareudp device supports special handling for MPLS & IP as
they can have multiple ethertypes. MPLS procotcol can have ethertypes
ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have
ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).

The bareudp device to tunnel L3 traffic with multiple ethertypes
(MPLS & IP) can be created by passing the L3 protocol name as string in
the field payload_type. An example to create bareudp device to tunnel
MPLS unicast & multicast traffic is given below.::

$ ovs-vsctl add-port  br_mpls udp_port -- set interface
            udp_port \
            type=bareudp options:remote_ip=2.1.1.3
            options:local_ip=2.1.1.2 \
            options:payload_type=mpls options:dst_port=6635 \
            options:packet_type="legacy_l3"

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-By: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>

---
Changes in v2:
    - Removed vport-bareudp module.

Changes in v3:
    - Added net-next upstream commit id and message to commit message.

Changes in v4:
    - Removed kernel datapath changes.

Changes in v5:
    - Fixed release notes errors.
    - Fixed coding errors in dpif-nelink-rtnl.c.

Changes in v6:
    - Added code to enable rx metadata collection in the kernel device.
    - Added version history.

Changes in v7
    - Fixed release notes errors.
    - Added Skip tests for older kernels.
    - Changes bareudp ovs_vport_type to 111.
    - Added Acked-by & tested by from gvrose8192@gmail.com

Changes in v8
    - The code added in v6 to enable rx metadata collection in
      the kernel device is removed. This flag was never added to any of
      the kernel release. The rx metadata collection is always enabled in
      kernel bareudp module.
   

 Documentation/automake.mk                     |  1 +
 Documentation/faq/bareudp.rst                 | 62 +++++++++++++++++++
 Documentation/faq/index.rst                   |  1 +
 Documentation/faq/releases.rst                |  1 +
 NEWS                                          |  5 +-
 .../linux/compat/include/linux/openvswitch.h  |  9 +++
 lib/dpif-netlink-rtnl.c                       | 53 ++++++++++++++++
 lib/dpif-netlink.c                            |  5 ++
 lib/netdev-vport.c                            | 27 +++++++-
 lib/netdev.h                                  |  1 +
 ofproto/ofproto-dpif-xlate.c                  |  1 +
 tests/system-layer3-tunnels.at                | 48 ++++++++++++++
 12 files changed, 211 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/faq/bareudp.rst

Comments

Eelco Chaudron Dec. 8, 2020, 1:42 p.m. UTC | #1
Hi Martin,

Did some basic testing, and it all works fine. See some comments inline 
below.

Cheers,

Eelco

On 7 Dec 2020, at 4:32, Martin Varghese wrote:

> From: Martin Varghese <martin.varghese@nokia.com>
>
> There are various L3 encapsulation standards using UDP being discussed 
> to
> leverage the UDP based load balancing capability of different 
> networks.
> MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
>
> The Bareudp tunnel provides a generic L3 encapsulation support for
> tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP
> tunnel.
>
> An example to create bareudp device to tunnel MPLS traffic is
> given
>
> $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \
>              type=bareudp options:remote_ip=2.1.1.3
>              options:local_ip=2.1.1.2 \
>              options:payload_type=0x8847 options:dst_port=6635 \
>              options:packet_type="legacy_l3" \
>              ofport_request=$bareudp_egress_port
>
> The bareudp device supports special handling for MPLS & IP as
> they can have multiple ethertypes. MPLS procotcol can have ethertypes
> ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can 
> have
> ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
>
> The bareudp device to tunnel L3 traffic with multiple ethertypes
> (MPLS & IP) can be created by passing the L3 protocol name as string 
> in
> the field payload_type. An example to create bareudp device to tunnel
> MPLS unicast & multicast traffic is given below.::
>
> $ ovs-vsctl add-port  br_mpls udp_port -- set interface
>             udp_port \
>             type=bareudp options:remote_ip=2.1.1.3
>             options:local_ip=2.1.1.2 \
>             options:payload_type=mpls options:dst_port=6635 \
>             options:packet_type="legacy_l3"
>
> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> Acked-By: Greg Rose <gvrose8192@gmail.com>
> Tested-by: Greg Rose <gvrose8192@gmail.com>
>
> ---
> Changes in v2:
>     - Removed vport-bareudp module.
>
> Changes in v3:
>     - Added net-next upstream commit id and message to commit message.
>
> Changes in v4:
>     - Removed kernel datapath changes.
>
> Changes in v5:
>     - Fixed release notes errors.
>     - Fixed coding errors in dpif-nelink-rtnl.c.
>
> Changes in v6:
>     - Added code to enable rx metadata collection in the kernel 
> device.
>     - Added version history.
>
> Changes in v7
>     - Fixed release notes errors.
>     - Added Skip tests for older kernels.
>     - Changes bareudp ovs_vport_type to 111.
>     - Added Acked-by & tested by from gvrose8192@gmail.com
>
> Changes in v8
>     - The code added in v6 to enable rx metadata collection in
>       the kernel device is removed. This flag was never added to any 
> of
>       the kernel release. The rx metadata collection is always enabled 
> in
>       kernel bareudp module.
>
>
>  Documentation/automake.mk                     |  1 +
>  Documentation/faq/bareudp.rst                 | 62 
> +++++++++++++++++++
>  Documentation/faq/index.rst                   |  1 +
>  Documentation/faq/releases.rst                |  1 +
>  NEWS                                          |  5 +-
>  .../linux/compat/include/linux/openvswitch.h  |  9 +++
>  lib/dpif-netlink-rtnl.c                       | 53 ++++++++++++++++
>  lib/dpif-netlink.c                            |  5 ++
>  lib/netdev-vport.c                            | 27 +++++++-
>  lib/netdev.h                                  |  1 +
>  ofproto/ofproto-dpif-xlate.c                  |  1 +
>  tests/system-layer3-tunnels.at                | 48 ++++++++++++++
>  12 files changed, 211 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/faq/bareudp.rst
>
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index f85c4320e..ea3475f35 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -88,6 +88,7 @@ DOC_SOURCE = \
>  	Documentation/faq/terminology.rst \
>  	Documentation/faq/vlan.rst \
>  	Documentation/faq/vxlan.rst \
> +	Documentation/faq/bareudp.rst \
>  	Documentation/internals/index.rst \
>  	Documentation/internals/authors.rst \
>  	Documentation/internals/bugs.rst \
> diff --git a/Documentation/faq/bareudp.rst 
> b/Documentation/faq/bareudp.rst
> new file mode 100644
> index 000000000..ef437631c
> --- /dev/null
> +++ b/Documentation/faq/bareudp.rst
> @@ -0,0 +1,62 @@
> +..
> +      Licensed under the Apache License, Version 2.0 (the "License"); 
> you may
> +      not use this file except in compliance with the License. You 
> may obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, 
> software
> +      distributed under the License is distributed on an "AS IS" 
> BASIS, WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
> implied. See the
> +      License for the specific language governing permissions and 
> limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +=======
> +Bareudp
> +=======
> +
> +Q: What is Bareudp?
> +
> +    A: There are various L3 encapsulation standards using UDP being 
> discussed
> +       to leverage the UDP based load balancing capability of 
> different
> +       networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is 
> one among
> +       them.
> +
> +       The Bareudp tunnel provides a generic L3 encapsulation support 
> for
> +       tunnelling different L3 protocols like MPLS, IP, NSH etc. 
> inside a UDP
> +       tunnel.
> +
> +       An example to create bareudp device to tunnel MPLS traffic is 
> given
> +       below.::
> +
> +           $ ovs-vsctl add-port br_mpls udp_port -- set interface 
> udp_port \
> +             type=bareudp options:remote_ip=2.1.1.3 
> options:local_ip=2.1.1.2 \
> +             options:payload_type=0x8847 options:dst_port=6635 \

I think it would be good to explain what the payload_type is used for as 
it's not clear from this text, and I had to read the kernel code to 
understand.
Maybe add an example on how to redirect traffic to this tunnel, as it 
will only accept the specific ethertype.

> +             options:packet_type="legacy_l3" \

Looking at the code, it seems we only support packet_type=legacy_l3 (or 
ptap), so we could remove it in the examples as it will default to L3.

> +             ofport_request=$bareudp_egress_port
> +

Maybe also the ofport_request option can be removed, as it adds no value 
here.

> +       The bareudp device supports special handling for MPLS & IP as 
> they can
> +       have multiple ethertypes.
> +       MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) &
> +       ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes 
> ETH_P_IP (v4)
> +       & ETH_P_IPV6 (v6).
> +
> +       The bareudp device to tunnel L3 traffic with multiple 
> ethertypes
> +       (MPLS & IP) can be created by passing the L3 protocol name as 
> string in
> +       the field payload_type. An example to create bareudp device to 
> tunnel
> +       MPLS unicast & multicast traffic is given below.::
> +
> +           $ ovs-vsctl add-port  br_mpls udp_port -- set interface 
> udp_port \
> +             type=bareudp options:remote_ip=2.1.1.3 
> options:local_ip=2.1.1.2 \
> +             options:payload_type=mpls options:dst_port=6635 \
> +             options:packet_type="legacy_l3"

Same as above on packet_type.

Maybe also add an example for IP over UDP?

> diff --git a/Documentation/faq/index.rst b/Documentation/faq/index.rst
> index 334b828b2..1dd29986a 100644
> --- a/Documentation/faq/index.rst
> +++ b/Documentation/faq/index.rst
> @@ -30,6 +30,7 @@ Open vSwitch FAQ
>  .. toctree::
>     :maxdepth: 2
>
> +   bareudp
>     configuration
>     contributing
>     design
> diff --git a/Documentation/faq/releases.rst 
> b/Documentation/faq/releases.rst
> index 3623e3f40..68cbf1dbc 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -138,6 +138,7 @@ Q: Are all features available with all datapaths?
>      Tunnel - ERSPAN                 4.18           2.10         2.10  
>    NO
>      Tunnel - ERSPAN-IPv6            4.18           2.10         2.10  
>    NO
>      Tunnel - GTP-U                  NO             NO           2.14  
>    NO
> +    Tunnel - Bareudp                5.7            NO           NO    
>    NO
>      QoS - Policing                  YES            1.1          2.6   
>    NO
>      QoS - Shaping                   YES            1.1          NO    
>    NO
>      sFlow                           YES            1.0          1.0   
>    NO
> diff --git a/NEWS b/NEWS
> index 7e291a180..e3bc34a3f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -75,7 +75,10 @@ v2.14.0 - 17 Aug 2020
>     - GTP-U Tunnel Protocol
>       * Add two new fields: tun_gtpu_flags, tun_gtpu_msgtype.
>       * Only support for userspace datapath.
> -
> +   - Bareudp Tunnel
> +     * Bareudp device support is present in linux kernel from version 
> 5.7
> +     * Kernel bareudp device is not backported to ovs tree.
> +     * Userspace datapath support is not added

Any plans on adding this?
>
>  static const char *
>  vport_type_to_kind(enum ovs_vport_type type,
> @@ -113,6 +129,8 @@ vport_type_to_kind(enum ovs_vport_type type,
>          }
>      case OVS_VPORT_TYPE_GTPU:
>          return NULL;
> +    case OVS_VPORT_TYPE_BAREUDP:
> +        return "bareudp";
>      case OVS_VPORT_TYPE_NETDEV:
>      case OVS_VPORT_TYPE_INTERNAL:
>      case OVS_VPORT_TYPE_LISP:
> @@ -243,6 +261,24 @@ dpif_netlink_rtnl_geneve_verify(const struct 
> netdev_tunnel_config *tnl_cfg,
>
>      return err;
>  }
> +static int
> +dpif_netlink_rtnl_bareudp_verify(const struct netdev_tunnel_config 
> *tnl_cfg,
> +                                const char *kind, struct ofpbuf 
> *reply)
> +{
> +    struct nlattr *bareudp[ARRAY_SIZE(bareudp_policy)];
> +    int err;
> +
> +    err = rtnl_policy_parse(kind, reply, bareudp_policy, bareudp,
> +                            ARRAY_SIZE(bareudp_policy));
> +    if (!err) {
> +        if ((tnl_cfg->dst_port != 
> nl_attr_get_be16(bareudp[IFLA_BAREUDP_PORT]))
> +            || (tnl_cfg->payload_ethertype
> +                != 
> nl_attr_get_be16(bareudp[IFLA_BAREUDP_ETHERTYPE]))) {
> +            err = EINVAL;
> +        }
> +    }
> +    return err;
> +}
>
>  static int
>  dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg,
> @@ -275,6 +311,9 @@ dpif_netlink_rtnl_verify(const struct 
> netdev_tunnel_config *tnl_cfg,
>      case OVS_VPORT_TYPE_GENEVE:
>          err = dpif_netlink_rtnl_geneve_verify(tnl_cfg, kind, reply);
>          break;
> +    case OVS_VPORT_TYPE_BAREUDP:
> +        err = dpif_netlink_rtnl_bareudp_verify(tnl_cfg, kind, reply);
> +        break;
>      case OVS_VPORT_TYPE_NETDEV:
>      case OVS_VPORT_TYPE_INTERNAL:
>      case OVS_VPORT_TYPE_LISP:
> @@ -357,6 +396,19 @@ dpif_netlink_rtnl_create(const struct 
> netdev_tunnel_config *tnl_cfg,
>          nl_msg_put_u8(&request, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, 1);
>          nl_msg_put_be16(&request, IFLA_GENEVE_PORT, 
> tnl_cfg->dst_port);
>          break;
> +    case OVS_VPORT_TYPE_BAREUDP:
> +        nl_msg_put_be16(&request, IFLA_BAREUDP_ETHERTYPE,
> +                        tnl_cfg->payload_ethertype);
> +        if ((tnl_cfg->payload_ethertype == htons(ETH_TYPE_MPLS)) ||
> +            (tnl_cfg->payload_ethertype ==  
> htons(ETH_TYPE_MPLS_MCAST))) {
> +            nl_msg_put_u16(&request, IFLA_BAREUDP_SRCPORT_MIN,
> +                           BAREUDP_MPLS_SRCPORT_MIN);

So why do we set this for MPLS only? All other proposals have the same 
min port guidance:
   - https://tools.ietf.org/html/draft-xu-intarea-ip-in-udp-09
   - https://tools.ietf.org/html/rfc8086

> +        }
> +        nl_msg_put_be16(&request, IFLA_BAREUDP_PORT, 
> tnl_cfg->dst_port);
> +        if (tnl_cfg->exts & (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE)) {
> +            nl_msg_put_flag(&request, IFLA_BAREUDP_MULTIPROTO_MODE);
> +        }
> +        break;
>      case OVS_VPORT_TYPE_NETDEV:
>      case OVS_VPORT_TYPE_INTERNAL:
>      case OVS_VPORT_TYPE_LISP:
> @@ -470,6 +522,7 @@ dpif_netlink_rtnl_port_destroy(const char *name, 
> const char *type)
>      case OVS_VPORT_TYPE_ERSPAN:
>      case OVS_VPORT_TYPE_IP6ERSPAN:
>      case OVS_VPORT_TYPE_IP6GRE:
> +    case OVS_VPORT_TYPE_BAREUDP:
>          return dpif_netlink_rtnl_destroy(name);
>      case OVS_VPORT_TYPE_NETDEV:
>      case OVS_VPORT_TYPE_INTERNAL:
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 2f881e4fa..ceb56c685 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -749,6 +749,9 @@ get_vport_type(const struct dpif_netlink_vport 
> *vport)
>      case OVS_VPORT_TYPE_GTPU:
>          return "gtpu";
>
> +    case OVS_VPORT_TYPE_BAREUDP:
> +        return "bareudp";
> +
>      case OVS_VPORT_TYPE_UNSPEC:
>      case __OVS_VPORT_TYPE_MAX:
>          break;
> @@ -784,6 +787,8 @@ netdev_to_ovs_vport_type(const char *type)
>          return OVS_VPORT_TYPE_GRE;
>      } else if (!strcmp(type, "gtpu")) {
>          return OVS_VPORT_TYPE_GTPU;
> +    } else if (!strcmp(type, "bareudp")) {
> +        return OVS_VPORT_TYPE_BAREUDP;
>      } else {
>          return OVS_VPORT_TYPE_UNSPEC;
>      }
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 0252b61de..c86d420d7 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -112,7 +112,7 @@ netdev_vport_needs_dst_port(const struct netdev 
> *dev)
>      return (class->get_config == get_tunnel_config &&
>              (!strcmp("geneve", type) || !strcmp("vxlan", type) ||
>               !strcmp("lisp", type) || !strcmp("stt", type) ||
> -             !strcmp("gtpu", type)));
> +             !strcmp("gtpu", type) || !strcmp("bareudp",type)));
>  }
>
>  const char *
> @@ -219,6 +219,8 @@ netdev_vport_construct(struct netdev *netdev_)
>          dev->tnl_cfg.dst_port = port ? htons(port) : 
> htons(STT_DST_PORT);
>      } else if (!strcmp(type, "gtpu")) {
>          dev->tnl_cfg.dst_port = port ? htons(port) : 
> htons(GTPU_DST_PORT);
> +    } else if (!strcmp(type, "bareudp")) {
> +        dev->tnl_cfg.dst_port = htons(port);
>      }
>
>      dev->tnl_cfg.dont_fragment = true;
> @@ -438,6 +440,8 @@ tunnel_supported_layers(const char *type,
>          return TNL_L2 | TNL_L3;
>      } else if (!strcmp(type, "gtpu")) {
>          return TNL_L3;
> +    } else if (!strcmp(type, "bareudp")) {
> +        return TNL_L3;
>      } else {
>          return TNL_L2;
>      }
> @@ -745,6 +749,16 @@ set_tunnel_config(struct netdev *dev_, const 
> struct smap *args, char **errp)
>                      goto out;
>                  }
>              }
> +        } else if (!strcmp(node->key, "payload_type")) {
> +            if (strcmp(node->key, "mpls")) {
> +                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
> +                 tnl_cfg.exts |= (1 << 
> OVS_BAREUDP_EXT_MULTIPROTO_MODE);
> +            } else if ((strcmp(node->key, "ip"))) {
> +                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_IP);
> +                 tnl_cfg.exts |= (1 << 
> OVS_BAREUDP_EXT_MULTIPROTO_MODE);
> +            } else {
> +                 tnl_cfg.payload_ethertype = 
> htons(atoi(node->value));

As the kernel only supports IPv4, IPv6, MPLS, and MPLS_MULTI, why not 
return an error here if it's not one of these four?

> +            }
>          } else {
>              ds_put_format(&errors, "%s: unknown %s argument '%s'\n", 
> name,
>                            type, node->key);
> @@ -917,7 +931,8 @@ get_tunnel_config(const struct netdev *dev, struct 
> smap *args)
>              (!strcmp("vxlan", type) && dst_port != VXLAN_DST_PORT) ||
>              (!strcmp("lisp", type) && dst_port != LISP_DST_PORT) ||
>              (!strcmp("stt", type) && dst_port != STT_DST_PORT) ||
> -            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT)) {
> +            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT) ||
> +            !strcmp("bareudp", type)) {
>              smap_add_format(args, "dst_port", "%d", dst_port);
>          }
>      }
> @@ -1243,6 +1258,14 @@ netdev_vport_tunnel_register(void)
>            },
>            {{NULL, NULL, 0, 0}}
>          },
> +        { "udp_sys",
> +          {
> +              TUNNEL_FUNCTIONS_COMMON,
> +              .type = "bareudp",
> +              .get_ifindex = NETDEV_VPORT_GET_IFINDEX,
> +          },
> +          {{NULL, NULL, 0, 0}}
> +        },
>
>      };
>      static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> diff --git a/lib/netdev.h b/lib/netdev.h
> index fb5073056..b705a9e56 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -107,6 +107,7 @@ struct netdev_tunnel_config {
>      bool out_key_flow;
>      ovs_be64 out_key;
>
> +    ovs_be16 payload_ethertype;
>      ovs_be16 dst_port;
>
>      bool ip_src_flow;
> diff --git a/ofproto/ofproto-dpif-xlate.c 
> b/ofproto/ofproto-dpif-xlate.c
> index 11aa20754..7eeff14f6 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3573,6 +3573,7 @@ propagate_tunnel_data_to_flow(struct xlate_ctx 
> *ctx, struct eth_addr dmac,
>      case OVS_VPORT_TYPE_VXLAN:
>      case OVS_VPORT_TYPE_GENEVE:
>      case OVS_VPORT_TYPE_GTPU:
> +    case OVS_VPORT_TYPE_BAREUDP:
>          nw_proto = IPPROTO_UDP;
>          break;
>      case OVS_VPORT_TYPE_LISP:
> diff --git a/tests/system-layer3-tunnels.at 
> b/tests/system-layer3-tunnels.at
> index 1232964bb..8423add2b 100644
> --- a/tests/system-layer3-tunnels.at
> +++ b/tests/system-layer3-tunnels.at

These tests also get executed for the userspace test set, 
system-userspace-testsuite.at, which will fail, so it needs to be 
excluded.

> @@ -152,3 +152,51 @@ AT_CHECK([tail -1 stdout], [0],
>
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([layer3 - ping over MPLS Bareudp])
> +OVS_CHECK_MIN_KERNEL(5, 7)
> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "36:b1:ee:7c:01:01")
> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24", "36:b1:ee:7c:01:02")
> +
> +ADD_OVS_TUNNEL([bareudp], [br0], [at_bareudp0], [8.1.1.3], 
> [8.1.1.2/24],
> +               [ options:local_ip=8.1.1.2 
> options:packet_type="legacy_l3" options:payload_type=mpls 
> options:dst_port=6635])
> +
> +ADD_OVS_TUNNEL([bareudp], [br1], [at_bareudp1], [8.1.1.2], 
> [8.1.1.3/24],
> +               [options:local_ip=8.1.1.3 
> options:packet_type="legacy_l3" options:payload_type=mpls 
> options:dst_port=6635])
> +
> +AT_DATA([flows0.txt], [dnl
> +table=0,priority=100,dl_type=0x0800 
> actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp0
> +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp0 
> actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:01->dl_dst,set_field:36:b1:ee:7c:01:02->dl_src,output:ovs-p0
> +table=0,priority=10 actions=normal
> +])

Maybe it would be good to also have an IP test case?

> +AT_DATA([flows1.txt], [dnl
> +table=0,priority=100,dl_type=0x0800 
> actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp1
> +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp1 
> actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:02->dl_dst,set_field:36:b1:ee:7c:01:01->dl_src,output:ovs-p1
> +table=0,priority=10 actions=normal
> +])
> +
> +AT_CHECK([ip link add patch0 type veth peer name patch1])
> +on_exit 'ip link del patch0'
> +
> +AT_CHECK([ip link set dev patch0 up])
> +AT_CHECK([ip link set dev patch1 up])
> +AT_CHECK([ovs-vsctl add-port br0 patch0])
> +AT_CHECK([ovs-vsctl add-port br1 patch1])
> +
> +
> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br0 flows0.txt])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br1 flows1.txt])
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | 
> FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | 
> FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> -- 
> 2.18.4

Can you also update the vswitchd/ovs-vswitchd.conf.db.5 man page with 
the new tunnel and options?
Martin Varghese Dec. 9, 2020, 1:23 p.m. UTC | #2
On Tue, Dec 08, 2020 at 02:42:42PM +0100, Eelco Chaudron wrote:
> Hi Martin,
> 
> Did some basic testing, and it all works fine. See some comments inline
> below.
> 
> Cheers,
> 
> Eelco
> 
> On 7 Dec 2020, at 4:32, Martin Varghese wrote:
> 
> > From: Martin Varghese <martin.varghese@nokia.com>
> > 
> > There are various L3 encapsulation standards using UDP being discussed
> > to
> > leverage the UDP based load balancing capability of different networks.
> > MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
> > 
> > The Bareudp tunnel provides a generic L3 encapsulation support for
> > tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP
> > tunnel.
> > 
> > An example to create bareudp device to tunnel MPLS traffic is
> > given
> > 
> > $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \
> >              type=bareudp options:remote_ip=2.1.1.3
> >              options:local_ip=2.1.1.2 \
> >              options:payload_type=0x8847 options:dst_port=6635 \
> >              options:packet_type="legacy_l3" \
> >              ofport_request=$bareudp_egress_port
> > 
> > The bareudp device supports special handling for MPLS & IP as
> > they can have multiple ethertypes. MPLS procotcol can have ethertypes
> > ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can
> > have
> > ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
> > 
> > The bareudp device to tunnel L3 traffic with multiple ethertypes
> > (MPLS & IP) can be created by passing the L3 protocol name as string in
> > the field payload_type. An example to create bareudp device to tunnel
> > MPLS unicast & multicast traffic is given below.::
> > 
> > $ ovs-vsctl add-port  br_mpls udp_port -- set interface
> >             udp_port \
> >             type=bareudp options:remote_ip=2.1.1.3
> >             options:local_ip=2.1.1.2 \
> >             options:payload_type=mpls options:dst_port=6635 \
> >             options:packet_type="legacy_l3"
> > 
> > Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> > Acked-By: Greg Rose <gvrose8192@gmail.com>
> > Tested-by: Greg Rose <gvrose8192@gmail.com>
> > 
> > ---
> > Changes in v2:
> >     - Removed vport-bareudp module.
> > 
> > Changes in v3:
> >     - Added net-next upstream commit id and message to commit message.
> > 
> > Changes in v4:
> >     - Removed kernel datapath changes.
> > 
> > Changes in v5:
> >     - Fixed release notes errors.
> >     - Fixed coding errors in dpif-nelink-rtnl.c.
> > 
> > Changes in v6:
> >     - Added code to enable rx metadata collection in the kernel device.
> >     - Added version history.
> > 
> > Changes in v7
> >     - Fixed release notes errors.
> >     - Added Skip tests for older kernels.
> >     - Changes bareudp ovs_vport_type to 111.
> >     - Added Acked-by & tested by from gvrose8192@gmail.com
> > 
> > Changes in v8
> >     - The code added in v6 to enable rx metadata collection in
> >       the kernel device is removed. This flag was never added to any of
> >       the kernel release. The rx metadata collection is always enabled
> > in
> >       kernel bareudp module.
> > 
> > 
> >  Documentation/automake.mk                     |  1 +
> >  Documentation/faq/bareudp.rst                 | 62 +++++++++++++++++++
> >  Documentation/faq/index.rst                   |  1 +
> >  Documentation/faq/releases.rst                |  1 +
> >  NEWS                                          |  5 +-
> >  .../linux/compat/include/linux/openvswitch.h  |  9 +++
> >  lib/dpif-netlink-rtnl.c                       | 53 ++++++++++++++++
> >  lib/dpif-netlink.c                            |  5 ++
> >  lib/netdev-vport.c                            | 27 +++++++-
> >  lib/netdev.h                                  |  1 +
> >  ofproto/ofproto-dpif-xlate.c                  |  1 +
> >  tests/system-layer3-tunnels.at                | 48 ++++++++++++++
> >  12 files changed, 211 insertions(+), 3 deletions(-)
> >  create mode 100644 Documentation/faq/bareudp.rst
> > 
> > diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> > index f85c4320e..ea3475f35 100644
> > --- a/Documentation/automake.mk
> > +++ b/Documentation/automake.mk
> > @@ -88,6 +88,7 @@ DOC_SOURCE = \
> >  	Documentation/faq/terminology.rst \
> >  	Documentation/faq/vlan.rst \
> >  	Documentation/faq/vxlan.rst \
> > +	Documentation/faq/bareudp.rst \
> >  	Documentation/internals/index.rst \
> >  	Documentation/internals/authors.rst \
> >  	Documentation/internals/bugs.rst \
> > diff --git a/Documentation/faq/bareudp.rst
> > b/Documentation/faq/bareudp.rst
> > new file mode 100644
> > index 000000000..ef437631c
> > --- /dev/null
> > +++ b/Documentation/faq/bareudp.rst
> > @@ -0,0 +1,62 @@
> > +..
> > +      Licensed under the Apache License, Version 2.0 (the "License");
> > you may
> > +      not use this file except in compliance with the License. You may
> > obtain
> > +      a copy of the License at
> > +
> > +          http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +      Unless required by applicable law or agreed to in writing,
> > software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> > WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > See the
> > +      License for the specific language governing permissions and
> > limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +=======
> > +Bareudp
> > +=======
> > +
> > +Q: What is Bareudp?
> > +
> > +    A: There are various L3 encapsulation standards using UDP being
> > discussed
> > +       to leverage the UDP based load balancing capability of different
> > +       networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is
> > one among
> > +       them.
> > +
> > +       The Bareudp tunnel provides a generic L3 encapsulation support
> > for
> > +       tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
> > a UDP
> > +       tunnel.
> > +
> > +       An example to create bareudp device to tunnel MPLS traffic is
> > given
> > +       below.::
> > +
> > +           $ ovs-vsctl add-port br_mpls udp_port -- set interface
> > udp_port \
> > +             type=bareudp options:remote_ip=2.1.1.3
> > options:local_ip=2.1.1.2 \
> > +             options:payload_type=0x8847 options:dst_port=6635 \
> 
> I think it would be good to explain what the payload_type is used for as
> it's not clear from this text, and I had to read the kernel code to
> understand.
> Maybe add an example on how to redirect traffic to this tunnel, as it will
> only accept the specific ethertype.
> 
I will explain the payload_type.
One can refer the tests to see how the traffic is directed towards
tunnel. I propose not to mention that here.
But if you insist we can add a sample rule which has an action to push a
MPLS label and output to a bareudp port or do you suggest something
else ?
> > +             options:packet_type="legacy_l3" \
> 
> Looking at the code, it seems we only support packet_type=legacy_l3 (or
> ptap), so we could remove it in the examples as it will default to L3.
>
Yes we could remove it

> > +             ofport_request=$bareudp_egress_port
> > +
> 
> Maybe also the ofport_request option can be removed, as it adds no value
> here.

Noted
> 
> > +       The bareudp device supports special handling for MPLS & IP as
> > they can
> > +       have multiple ethertypes.
> > +       MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) &
> > +       ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes
> > ETH_P_IP (v4)
> > +       & ETH_P_IPV6 (v6).
> > +
> > +       The bareudp device to tunnel L3 traffic with multiple ethertypes
> > +       (MPLS & IP) can be created by passing the L3 protocol name as
> > string in
> > +       the field payload_type. An example to create bareudp device to
> > tunnel
> > +       MPLS unicast & multicast traffic is given below.::
> > +
> > +           $ ovs-vsctl add-port  br_mpls udp_port -- set interface
> > udp_port \
> > +             type=bareudp options:remote_ip=2.1.1.3
> > options:local_ip=2.1.1.2 \
> > +             options:payload_type=mpls options:dst_port=6635 \
> > +             options:packet_type="legacy_l3"
> 
> Same as above on packet_type.

Noted.
> 
> Maybe also add an example for IP over UDP?
> 
Yes. I will add one.
> > diff --git a/Documentation/faq/index.rst b/Documentation/faq/index.rst
> > index 334b828b2..1dd29986a 100644
> > --- a/Documentation/faq/index.rst
> > +++ b/Documentation/faq/index.rst
> > @@ -30,6 +30,7 @@ Open vSwitch FAQ
> >  .. toctree::
> >     :maxdepth: 2
> > 
> > +   bareudp
> >     configuration
> >     contributing
> >     design
> > diff --git a/Documentation/faq/releases.rst
> > b/Documentation/faq/releases.rst
> > index 3623e3f40..68cbf1dbc 100644
> > --- a/Documentation/faq/releases.rst
> > +++ b/Documentation/faq/releases.rst
> > @@ -138,6 +138,7 @@ Q: Are all features available with all datapaths?
> >      Tunnel - ERSPAN                 4.18           2.10         2.10
> > NO
> >      Tunnel - ERSPAN-IPv6            4.18           2.10         2.10
> > NO
> >      Tunnel - GTP-U                  NO             NO           2.14
> > NO
> > +    Tunnel - Bareudp                5.7            NO           NO
> > NO
> >      QoS - Policing                  YES            1.1          2.6
> > NO
> >      QoS - Shaping                   YES            1.1          NO
> > NO
> >      sFlow                           YES            1.0          1.0
> > NO
> > diff --git a/NEWS b/NEWS
> > index 7e291a180..e3bc34a3f 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -75,7 +75,10 @@ v2.14.0 - 17 Aug 2020
> >     - GTP-U Tunnel Protocol
> >       * Add two new fields: tun_gtpu_flags, tun_gtpu_msgtype.
> >       * Only support for userspace datapath.
> > -
> > +   - Bareudp Tunnel
> > +     * Bareudp device support is present in linux kernel from version
> > 5.7
> > +     * Kernel bareudp device is not backported to ovs tree.
> > +     * Userspace datapath support is not added
> 
> Any plans on adding this?
It will come as a subsequent patch
> > 
> >  static const char *
> >  vport_type_to_kind(enum ovs_vport_type type,
> > @@ -113,6 +129,8 @@ vport_type_to_kind(enum ovs_vport_type type,
> >          }
> >      case OVS_VPORT_TYPE_GTPU:
> >          return NULL;
> > +    case OVS_VPORT_TYPE_BAREUDP:
> > +        return "bareudp";
> >      case OVS_VPORT_TYPE_NETDEV:
> >      case OVS_VPORT_TYPE_INTERNAL:
> >      case OVS_VPORT_TYPE_LISP:
> > @@ -243,6 +261,24 @@ dpif_netlink_rtnl_geneve_verify(const struct
> > netdev_tunnel_config *tnl_cfg,
> > 
> >      return err;
> >  }
> > +static int
> > +dpif_netlink_rtnl_bareudp_verify(const struct netdev_tunnel_config
> > *tnl_cfg,
> > +                                const char *kind, struct ofpbuf *reply)
> > +{
> > +    struct nlattr *bareudp[ARRAY_SIZE(bareudp_policy)];
> > +    int err;
> > +
> > +    err = rtnl_policy_parse(kind, reply, bareudp_policy, bareudp,
> > +                            ARRAY_SIZE(bareudp_policy));
> > +    if (!err) {
> > +        if ((tnl_cfg->dst_port !=
> > nl_attr_get_be16(bareudp[IFLA_BAREUDP_PORT]))
> > +            || (tnl_cfg->payload_ethertype
> > +                != nl_attr_get_be16(bareudp[IFLA_BAREUDP_ETHERTYPE])))
> > {
> > +            err = EINVAL;
> > +        }
> > +    }
> > +    return err;
> > +}
> > 
> >  static int
> >  dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg,
> > @@ -275,6 +311,9 @@ dpif_netlink_rtnl_verify(const struct
> > netdev_tunnel_config *tnl_cfg,
> >      case OVS_VPORT_TYPE_GENEVE:
> >          err = dpif_netlink_rtnl_geneve_verify(tnl_cfg, kind, reply);
> >          break;
> > +    case OVS_VPORT_TYPE_BAREUDP:
> > +        err = dpif_netlink_rtnl_bareudp_verify(tnl_cfg, kind, reply);
> > +        break;
> >      case OVS_VPORT_TYPE_NETDEV:
> >      case OVS_VPORT_TYPE_INTERNAL:
> >      case OVS_VPORT_TYPE_LISP:
> > @@ -357,6 +396,19 @@ dpif_netlink_rtnl_create(const struct
> > netdev_tunnel_config *tnl_cfg,
> >          nl_msg_put_u8(&request, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, 1);
> >          nl_msg_put_be16(&request, IFLA_GENEVE_PORT, tnl_cfg->dst_port);
> >          break;
> > +    case OVS_VPORT_TYPE_BAREUDP:
> > +        nl_msg_put_be16(&request, IFLA_BAREUDP_ETHERTYPE,
> > +                        tnl_cfg->payload_ethertype);
> > +        if ((tnl_cfg->payload_ethertype == htons(ETH_TYPE_MPLS)) ||
> > +            (tnl_cfg->payload_ethertype ==
> > htons(ETH_TYPE_MPLS_MCAST))) {
> > +            nl_msg_put_u16(&request, IFLA_BAREUDP_SRCPORT_MIN,
> > +                           BAREUDP_MPLS_SRCPORT_MIN);
> 
> So why do we set this for MPLS only? All other proposals have the same min
> port guidance:
>   - https://tools.ietf.org/html/draft-xu-intarea-ip-in-udp-09
>   - https://tools.ietf.org/html/rfc8086
> 
We could pass the ephmeneral starting port for all the payload types. 
> > +        }
> > +        nl_msg_put_be16(&request, IFLA_BAREUDP_PORT,
> > tnl_cfg->dst_port);
> > +        if (tnl_cfg->exts & (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE)) {
> > +            nl_msg_put_flag(&request, IFLA_BAREUDP_MULTIPROTO_MODE);
> > +        }
> > +        break;
> >      case OVS_VPORT_TYPE_NETDEV:
> >      case OVS_VPORT_TYPE_INTERNAL:
> >      case OVS_VPORT_TYPE_LISP:
> > @@ -470,6 +522,7 @@ dpif_netlink_rtnl_port_destroy(const char *name,
> > const char *type)
> >      case OVS_VPORT_TYPE_ERSPAN:
> >      case OVS_VPORT_TYPE_IP6ERSPAN:
> >      case OVS_VPORT_TYPE_IP6GRE:
> > +    case OVS_VPORT_TYPE_BAREUDP:
> >          return dpif_netlink_rtnl_destroy(name);
> >      case OVS_VPORT_TYPE_NETDEV:
> >      case OVS_VPORT_TYPE_INTERNAL:
> > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> > index 2f881e4fa..ceb56c685 100644
> > --- a/lib/dpif-netlink.c
> > +++ b/lib/dpif-netlink.c
> > @@ -749,6 +749,9 @@ get_vport_type(const struct dpif_netlink_vport
> > *vport)
> >      case OVS_VPORT_TYPE_GTPU:
> >          return "gtpu";
> > 
> > +    case OVS_VPORT_TYPE_BAREUDP:
> > +        return "bareudp";
> > +
> >      case OVS_VPORT_TYPE_UNSPEC:
> >      case __OVS_VPORT_TYPE_MAX:
> >          break;
> > @@ -784,6 +787,8 @@ netdev_to_ovs_vport_type(const char *type)
> >          return OVS_VPORT_TYPE_GRE;
> >      } else if (!strcmp(type, "gtpu")) {
> >          return OVS_VPORT_TYPE_GTPU;
> > +    } else if (!strcmp(type, "bareudp")) {
> > +        return OVS_VPORT_TYPE_BAREUDP;
> >      } else {
> >          return OVS_VPORT_TYPE_UNSPEC;
> >      }
> > diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> > index 0252b61de..c86d420d7 100644
> > --- a/lib/netdev-vport.c
> > +++ b/lib/netdev-vport.c
> > @@ -112,7 +112,7 @@ netdev_vport_needs_dst_port(const struct netdev
> > *dev)
> >      return (class->get_config == get_tunnel_config &&
> >              (!strcmp("geneve", type) || !strcmp("vxlan", type) ||
> >               !strcmp("lisp", type) || !strcmp("stt", type) ||
> > -             !strcmp("gtpu", type)));
> > +             !strcmp("gtpu", type) || !strcmp("bareudp",type)));
> >  }
> > 
> >  const char *
> > @@ -219,6 +219,8 @@ netdev_vport_construct(struct netdev *netdev_)
> >          dev->tnl_cfg.dst_port = port ? htons(port) :
> > htons(STT_DST_PORT);
> >      } else if (!strcmp(type, "gtpu")) {
> >          dev->tnl_cfg.dst_port = port ? htons(port) :
> > htons(GTPU_DST_PORT);
> > +    } else if (!strcmp(type, "bareudp")) {
> > +        dev->tnl_cfg.dst_port = htons(port);
> >      }
> > 
> >      dev->tnl_cfg.dont_fragment = true;
> > @@ -438,6 +440,8 @@ tunnel_supported_layers(const char *type,
> >          return TNL_L2 | TNL_L3;
> >      } else if (!strcmp(type, "gtpu")) {
> >          return TNL_L3;
> > +    } else if (!strcmp(type, "bareudp")) {
> > +        return TNL_L3;
> >      } else {
> >          return TNL_L2;
> >      }
> > @@ -745,6 +749,16 @@ set_tunnel_config(struct netdev *dev_, const struct
> > smap *args, char **errp)
> >                      goto out;
> >                  }
> >              }
> > +        } else if (!strcmp(node->key, "payload_type")) {
> > +            if (strcmp(node->key, "mpls")) {
> > +                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
> > +                 tnl_cfg.exts |= (1 <<
> > OVS_BAREUDP_EXT_MULTIPROTO_MODE);
> > +            } else if ((strcmp(node->key, "ip"))) {
> > +                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_IP);
> > +                 tnl_cfg.exts |= (1 <<
> > OVS_BAREUDP_EXT_MULTIPROTO_MODE);
> > +            } else {
> > +                 tnl_cfg.payload_ethertype = htons(atoi(node->value));
> 
> As the kernel only supports IPv4, IPv6, MPLS, and MPLS_MULTI, why not return
> an error here if it's not one of these four?
>
The kernel accepts all the ethertypes even the custom ones.The bareudp
device can be used to tunnel a proprietary protocol with a custom
ethertype (eg 0x123). But i agree that we must return a error if we are passed a unknown string (other than ip or mpls)
> > +            }
> >          } else {
> >              ds_put_format(&errors, "%s: unknown %s argument '%s'\n",
> > name,
> >                            type, node->key);
> > @@ -917,7 +931,8 @@ get_tunnel_config(const struct netdev *dev, struct
> > smap *args)
> >              (!strcmp("vxlan", type) && dst_port != VXLAN_DST_PORT) ||
> >              (!strcmp("lisp", type) && dst_port != LISP_DST_PORT) ||
> >              (!strcmp("stt", type) && dst_port != STT_DST_PORT) ||
> > -            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT)) {
> > +            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT) ||
> > +            !strcmp("bareudp", type)) {
> >              smap_add_format(args, "dst_port", "%d", dst_port);
> >          }
> >      }
> > @@ -1243,6 +1258,14 @@ netdev_vport_tunnel_register(void)
> >            },
> >            {{NULL, NULL, 0, 0}}
> >          },
> > +        { "udp_sys",
> > +          {
> > +              TUNNEL_FUNCTIONS_COMMON,
> > +              .type = "bareudp",
> > +              .get_ifindex = NETDEV_VPORT_GET_IFINDEX,
> > +          },
> > +          {{NULL, NULL, 0, 0}}
> > +        },
> > 
> >      };
> >      static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> > diff --git a/lib/netdev.h b/lib/netdev.h
> > index fb5073056..b705a9e56 100644
> > --- a/lib/netdev.h
> > +++ b/lib/netdev.h
> > @@ -107,6 +107,7 @@ struct netdev_tunnel_config {
> >      bool out_key_flow;
> >      ovs_be64 out_key;
> > 
> > +    ovs_be16 payload_ethertype;
> >      ovs_be16 dst_port;
> > 
> >      bool ip_src_flow;
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 11aa20754..7eeff14f6 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -3573,6 +3573,7 @@ propagate_tunnel_data_to_flow(struct xlate_ctx
> > *ctx, struct eth_addr dmac,
> >      case OVS_VPORT_TYPE_VXLAN:
> >      case OVS_VPORT_TYPE_GENEVE:
> >      case OVS_VPORT_TYPE_GTPU:
> > +    case OVS_VPORT_TYPE_BAREUDP:
> >          nw_proto = IPPROTO_UDP;
> >          break;
> >      case OVS_VPORT_TYPE_LISP:
> > diff --git a/tests/system-layer3-tunnels.at
> > b/tests/system-layer3-tunnels.at
> > index 1232964bb..8423add2b 100644
> > --- a/tests/system-layer3-tunnels.at
> > +++ b/tests/system-layer3-tunnels.at
> 
> These tests also get executed for the userspace test set,
> system-userspace-testsuite.at, which will fail, so it needs to be excluded.
> 
The check_min_kernel takes care of it? I see these tests are getting
skipped for userspace tests
> > @@ -152,3 +152,51 @@ AT_CHECK([tail -1 stdout], [0],
> > 
> >  OVS_VSWITCHD_STOP
> >  AT_CLEANUP
> > +
> > +AT_SETUP([layer3 - ping over MPLS Bareudp])
> > +OVS_CHECK_MIN_KERNEL(5, 7)
> > +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "36:b1:ee:7c:01:01")
> > +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24", "36:b1:ee:7c:01:02")
> > +
> > +ADD_OVS_TUNNEL([bareudp], [br0], [at_bareudp0], [8.1.1.3],
> > [8.1.1.2/24],
> > +               [ options:local_ip=8.1.1.2
> > options:packet_type="legacy_l3" options:payload_type=mpls
> > options:dst_port=6635])
> > +
> > +ADD_OVS_TUNNEL([bareudp], [br1], [at_bareudp1], [8.1.1.2],
> > [8.1.1.3/24],
> > +               [options:local_ip=8.1.1.3
> > options:packet_type="legacy_l3" options:payload_type=mpls
> > options:dst_port=6635])
> > +
> > +AT_DATA([flows0.txt], [dnl
> > +table=0,priority=100,dl_type=0x0800
> > actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp0
> > +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp0 actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:01->dl_dst,set_field:36:b1:ee:7c:01:02->dl_src,output:ovs-p0
> > +table=0,priority=10 actions=normal
> > +])
> 
> Maybe it would be good to also have an IP test case?
> 
i will add

> > +AT_DATA([flows1.txt], [dnl
> > +table=0,priority=100,dl_type=0x0800
> > actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp1
> > +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp1 actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:02->dl_dst,set_field:36:b1:ee:7c:01:01->dl_src,output:ovs-p1
> > +table=0,priority=10 actions=normal
> > +])
> > +
> > +AT_CHECK([ip link add patch0 type veth peer name patch1])
> > +on_exit 'ip link del patch0'
> > +
> > +AT_CHECK([ip link set dev patch0 up])
> > +AT_CHECK([ip link set dev patch1 up])
> > +AT_CHECK([ovs-vsctl add-port br0 patch0])
> > +AT_CHECK([ovs-vsctl add-port br1 patch1])
> > +
> > +
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br0 flows0.txt])
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br1 flows1.txt])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> > FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
> > FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > -- 
> > 2.18.4
> 
> Can you also update the vswitchd/ovs-vswitchd.conf.db.5 man page with the
> new tunnel and options?
> 

I will add that.
Eelco Chaudron Dec. 10, 2020, 7:51 a.m. UTC | #3
On 9 Dec 2020, at 14:23, Martin Varghese wrote:

> On Tue, Dec 08, 2020 at 02:42:42PM +0100, Eelco Chaudron wrote:
>> Hi Martin,
>>
>> Did some basic testing, and it all works fine. See some comments 
>> inline
>> below.
>>
>> Cheers,
>>
>> Eelco
>>
>> On 7 Dec 2020, at 4:32, Martin Varghese wrote:
>>
>>> From: Martin Varghese <martin.varghese@nokia.com>
>>>
>>> There are various L3 encapsulation standards using UDP being 
>>> discussed
>>> to
>>> leverage the UDP based load balancing capability of different 
>>> networks.
>>> MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
>>>
>>> The Bareudp tunnel provides a generic L3 encapsulation support for
>>> tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a 
>>> UDP
>>> tunnel.
>>>
>>> An example to create bareudp device to tunnel MPLS traffic is
>>> given
>>>
>>> $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \
>>>              type=bareudp options:remote_ip=2.1.1.3
>>>              options:local_ip=2.1.1.2 \
>>>              options:payload_type=0x8847 options:dst_port=6635 \
>>>              options:packet_type="legacy_l3" \
>>>              ofport_request=$bareudp_egress_port
>>>
>>> The bareudp device supports special handling for MPLS & IP as
>>> they can have multiple ethertypes. MPLS procotcol can have 
>>> ethertypes
>>> ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can
>>> have
>>> ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
>>>
>>> The bareudp device to tunnel L3 traffic with multiple ethertypes
>>> (MPLS & IP) can be created by passing the L3 protocol name as string 
>>> in
>>> the field payload_type. An example to create bareudp device to 
>>> tunnel
>>> MPLS unicast & multicast traffic is given below.::
>>>
>>> $ ovs-vsctl add-port  br_mpls udp_port -- set interface
>>>             udp_port \
>>>             type=bareudp options:remote_ip=2.1.1.3
>>>             options:local_ip=2.1.1.2 \
>>>             options:payload_type=mpls options:dst_port=6635 \
>>>             options:packet_type="legacy_l3"
>>>
>>> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
>>> Acked-By: Greg Rose <gvrose8192@gmail.com>
>>> Tested-by: Greg Rose <gvrose8192@gmail.com>
>>>
>>> ---
>>> Changes in v2:
>>>     - Removed vport-bareudp module.
>>>
>>> Changes in v3:
>>>     - Added net-next upstream commit id and message to commit 
>>> message.
>>>
>>> Changes in v4:
>>>     - Removed kernel datapath changes.
>>>
>>> Changes in v5:
>>>     - Fixed release notes errors.
>>>     - Fixed coding errors in dpif-nelink-rtnl.c.
>>>
>>> Changes in v6:
>>>     - Added code to enable rx metadata collection in the kernel 
>>> device.
>>>     - Added version history.
>>>
>>> Changes in v7
>>>     - Fixed release notes errors.
>>>     - Added Skip tests for older kernels.
>>>     - Changes bareudp ovs_vport_type to 111.
>>>     - Added Acked-by & tested by from gvrose8192@gmail.com
>>>
>>> Changes in v8
>>>     - The code added in v6 to enable rx metadata collection in
>>>       the kernel device is removed. This flag was never added to any 
>>> of
>>>       the kernel release. The rx metadata collection is always 
>>> enabled
>>> in
>>>       kernel bareudp module.
>>>
>>>
>>>  Documentation/automake.mk                     |  1 +
>>>  Documentation/faq/bareudp.rst                 | 62 
>>> +++++++++++++++++++
>>>  Documentation/faq/index.rst                   |  1 +
>>>  Documentation/faq/releases.rst                |  1 +
>>>  NEWS                                          |  5 +-
>>>  .../linux/compat/include/linux/openvswitch.h  |  9 +++
>>>  lib/dpif-netlink-rtnl.c                       | 53 ++++++++++++++++
>>>  lib/dpif-netlink.c                            |  5 ++
>>>  lib/netdev-vport.c                            | 27 +++++++-
>>>  lib/netdev.h                                  |  1 +
>>>  ofproto/ofproto-dpif-xlate.c                  |  1 +
>>>  tests/system-layer3-tunnels.at                | 48 ++++++++++++++
>>>  12 files changed, 211 insertions(+), 3 deletions(-)
>>>  create mode 100644 Documentation/faq/bareudp.rst
>>>
>>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>>> index f85c4320e..ea3475f35 100644
>>> --- a/Documentation/automake.mk
>>> +++ b/Documentation/automake.mk
>>> @@ -88,6 +88,7 @@ DOC_SOURCE = \
>>>  	Documentation/faq/terminology.rst \
>>>  	Documentation/faq/vlan.rst \
>>>  	Documentation/faq/vxlan.rst \
>>> +	Documentation/faq/bareudp.rst \
>>>  	Documentation/internals/index.rst \
>>>  	Documentation/internals/authors.rst \
>>>  	Documentation/internals/bugs.rst \
>>> diff --git a/Documentation/faq/bareudp.rst
>>> b/Documentation/faq/bareudp.rst
>>> new file mode 100644
>>> index 000000000..ef437631c
>>> --- /dev/null
>>> +++ b/Documentation/faq/bareudp.rst
>>> @@ -0,0 +1,62 @@
>>> +..
>>> +      Licensed under the Apache License, Version 2.0 (the 
>>> "License");
>>> you may
>>> +      not use this file except in compliance with the License. You 
>>> may
>>> obtain
>>> +      a copy of the License at
>>> +
>>> +          http://www.apache.org/licenses/LICENSE-2.0
>>> +
>>> +      Unless required by applicable law or agreed to in writing,
>>> software
>>> +      distributed under the License is distributed on an "AS IS" 
>>> BASIS,
>>> WITHOUT
>>> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
>>> implied.
>>> See the
>>> +      License for the specific language governing permissions and
>>> limitations
>>> +      under the License.
>>> +
>>> +      Convention for heading levels in Open vSwitch documentation:
>>> +
>>> +      =======  Heading 0 (reserved for the title in a document)
>>> +      -------  Heading 1
>>> +      ~~~~~~~  Heading 2
>>> +      +++++++  Heading 3
>>> +      '''''''  Heading 4
>>> +
>>> +      Avoid deeper levels because they do not render well.
>>> +
>>> +=======
>>> +Bareudp
>>> +=======
>>> +
>>> +Q: What is Bareudp?
>>> +
>>> +    A: There are various L3 encapsulation standards using UDP being
>>> discussed
>>> +       to leverage the UDP based load balancing capability of 
>>> different
>>> +       networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) 
>>> is
>>> one among
>>> +       them.
>>> +
>>> +       The Bareudp tunnel provides a generic L3 encapsulation 
>>> support
>>> for
>>> +       tunnelling different L3 protocols like MPLS, IP, NSH etc. 
>>> inside
>>> a UDP
>>> +       tunnel.
>>> +
>>> +       An example to create bareudp device to tunnel MPLS traffic 
>>> is
>>> given
>>> +       below.::
>>> +
>>> +           $ ovs-vsctl add-port br_mpls udp_port -- set interface
>>> udp_port \
>>> +             type=bareudp options:remote_ip=2.1.1.3
>>> options:local_ip=2.1.1.2 \
>>> +             options:payload_type=0x8847 options:dst_port=6635 \
>>
>> I think it would be good to explain what the payload_type is used for 
>> as
>> it's not clear from this text, and I had to read the kernel code to
>> understand.
>> Maybe add an example on how to redirect traffic to this tunnel, as it 
>> will
>> only accept the specific ethertype.
>>
> I will explain the payload_type.
> One can refer the tests to see how the traffic is directed towards
> tunnel. I propose not to mention that here.
> But if you insist we can add a sample rule which has an action to push 
> a
> MPLS label and output to a bareudp port or do you suggest something
> else ?

That's what I suggest, end-users do not tend to look at the test cases, 
so a good example here would be appreciated. Also as the payload_type 
dictates the correct header.

>>> +             options:packet_type="legacy_l3" \
>>
>> Looking at the code, it seems we only support packet_type=legacy_l3 
>> (or
>> ptap), so we could remove it in the examples as it will default to 
>> L3.
>>
> Yes we could remove it
>
>>> +             ofport_request=$bareudp_egress_port
>>> +
>>
>> Maybe also the ofport_request option can be removed, as it adds no 
>> value
>> here.
>
> Noted
>>
>>> +       The bareudp device supports special handling for MPLS & IP 
>>> as
>>> they can
>>> +       have multiple ethertypes.
>>> +       MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) &
>>> +       ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes
>>> ETH_P_IP (v4)
>>> +       & ETH_P_IPV6 (v6).
>>> +
>>> +       The bareudp device to tunnel L3 traffic with multiple 
>>> ethertypes
>>> +       (MPLS & IP) can be created by passing the L3 protocol name 
>>> as
>>> string in
>>> +       the field payload_type. An example to create bareudp device 
>>> to
>>> tunnel
>>> +       MPLS unicast & multicast traffic is given below.::
>>> +
>>> +           $ ovs-vsctl add-port  br_mpls udp_port -- set interface
>>> udp_port \
>>> +             type=bareudp options:remote_ip=2.1.1.3
>>> options:local_ip=2.1.1.2 \
>>> +             options:payload_type=mpls options:dst_port=6635 \
>>> +             options:packet_type="legacy_l3"
>>
>> Same as above on packet_type.
>
> Noted.
>>
>> Maybe also add an example for IP over UDP?
>>
> Yes. I will add one.
>>> diff --git a/Documentation/faq/index.rst 
>>> b/Documentation/faq/index.rst
>>> index 334b828b2..1dd29986a 100644
>>> --- a/Documentation/faq/index.rst
>>> +++ b/Documentation/faq/index.rst
>>> @@ -30,6 +30,7 @@ Open vSwitch FAQ
>>>  .. toctree::
>>>     :maxdepth: 2
>>>
>>> +   bareudp
>>>     configuration
>>>     contributing
>>>     design
>>> diff --git a/Documentation/faq/releases.rst
>>> b/Documentation/faq/releases.rst
>>> index 3623e3f40..68cbf1dbc 100644
>>> --- a/Documentation/faq/releases.rst
>>> +++ b/Documentation/faq/releases.rst
>>> @@ -138,6 +138,7 @@ Q: Are all features available with all 
>>> datapaths?
>>>      Tunnel - ERSPAN                 4.18           2.10         
>>> 2.10
>>> NO
>>>      Tunnel - ERSPAN-IPv6            4.18           2.10         
>>> 2.10
>>> NO
>>>      Tunnel - GTP-U                  NO             NO           
>>> 2.14
>>> NO
>>> +    Tunnel - Bareudp                5.7            NO           NO
>>> NO
>>>      QoS - Policing                  YES            1.1          2.6
>>> NO
>>>      QoS - Shaping                   YES            1.1          NO
>>> NO
>>>      sFlow                           YES            1.0          1.0
>>> NO
>>> diff --git a/NEWS b/NEWS
>>> index 7e291a180..e3bc34a3f 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -75,7 +75,10 @@ v2.14.0 - 17 Aug 2020
>>>     - GTP-U Tunnel Protocol
>>>       * Add two new fields: tun_gtpu_flags, tun_gtpu_msgtype.
>>>       * Only support for userspace datapath.
>>> -
>>> +   - Bareudp Tunnel
>>> +     * Bareudp device support is present in linux kernel from 
>>> version
>>> 5.7
>>> +     * Kernel bareudp device is not backported to ovs tree.
>>> +     * Userspace datapath support is not added
>>
>> Any plans on adding this?
> It will come as a subsequent patch
>>>
>>>  static const char *
>>>  vport_type_to_kind(enum ovs_vport_type type,
>>> @@ -113,6 +129,8 @@ vport_type_to_kind(enum ovs_vport_type type,
>>>          }
>>>      case OVS_VPORT_TYPE_GTPU:
>>>          return NULL;
>>> +    case OVS_VPORT_TYPE_BAREUDP:
>>> +        return "bareudp";
>>>      case OVS_VPORT_TYPE_NETDEV:
>>>      case OVS_VPORT_TYPE_INTERNAL:
>>>      case OVS_VPORT_TYPE_LISP:
>>> @@ -243,6 +261,24 @@ dpif_netlink_rtnl_geneve_verify(const struct
>>> netdev_tunnel_config *tnl_cfg,
>>>
>>>      return err;
>>>  }
>>> +static int
>>> +dpif_netlink_rtnl_bareudp_verify(const struct netdev_tunnel_config
>>> *tnl_cfg,
>>> +                                const char *kind, struct ofpbuf 
>>> *reply)
>>> +{
>>> +    struct nlattr *bareudp[ARRAY_SIZE(bareudp_policy)];
>>> +    int err;
>>> +
>>> +    err = rtnl_policy_parse(kind, reply, bareudp_policy, bareudp,
>>> +                            ARRAY_SIZE(bareudp_policy));
>>> +    if (!err) {
>>> +        if ((tnl_cfg->dst_port !=
>>> nl_attr_get_be16(bareudp[IFLA_BAREUDP_PORT]))
>>> +            || (tnl_cfg->payload_ethertype
>>> +                != 
>>> nl_attr_get_be16(bareudp[IFLA_BAREUDP_ETHERTYPE])))
>>> {
>>> +            err = EINVAL;
>>> +        }
>>> +    }
>>> +    return err;
>>> +}
>>>
>>>  static int
>>>  dpif_netlink_rtnl_verify(const struct netdev_tunnel_config 
>>> *tnl_cfg,
>>> @@ -275,6 +311,9 @@ dpif_netlink_rtnl_verify(const struct
>>> netdev_tunnel_config *tnl_cfg,
>>>      case OVS_VPORT_TYPE_GENEVE:
>>>          err = dpif_netlink_rtnl_geneve_verify(tnl_cfg, kind, 
>>> reply);
>>>          break;
>>> +    case OVS_VPORT_TYPE_BAREUDP:
>>> +        err = dpif_netlink_rtnl_bareudp_verify(tnl_cfg, kind, 
>>> reply);
>>> +        break;
>>>      case OVS_VPORT_TYPE_NETDEV:
>>>      case OVS_VPORT_TYPE_INTERNAL:
>>>      case OVS_VPORT_TYPE_LISP:
>>> @@ -357,6 +396,19 @@ dpif_netlink_rtnl_create(const struct
>>> netdev_tunnel_config *tnl_cfg,
>>>          nl_msg_put_u8(&request, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, 1);
>>>          nl_msg_put_be16(&request, IFLA_GENEVE_PORT, 
>>> tnl_cfg->dst_port);
>>>          break;
>>> +    case OVS_VPORT_TYPE_BAREUDP:
>>> +        nl_msg_put_be16(&request, IFLA_BAREUDP_ETHERTYPE,
>>> +                        tnl_cfg->payload_ethertype);
>>> +        if ((tnl_cfg->payload_ethertype == htons(ETH_TYPE_MPLS)) ||
>>> +            (tnl_cfg->payload_ethertype ==
>>> htons(ETH_TYPE_MPLS_MCAST))) {
>>> +            nl_msg_put_u16(&request, IFLA_BAREUDP_SRCPORT_MIN,
>>> +                           BAREUDP_MPLS_SRCPORT_MIN);
>>
>> So why do we set this for MPLS only? All other proposals have the 
>> same min
>> port guidance:
>>   - https://tools.ietf.org/html/draft-xu-intarea-ip-in-udp-09
>>   - https://tools.ietf.org/html/rfc8086
>>
> We could pass the ephmeneral starting port for all the payload types.

I think that would be good, please add it for all, and we can add an 
exception if needed, but I do not see a use case for it now.

>>> +        }
>>> +        nl_msg_put_be16(&request, IFLA_BAREUDP_PORT,
>>> tnl_cfg->dst_port);
>>> +        if (tnl_cfg->exts & (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE)) 
>>> {
>>> +            nl_msg_put_flag(&request, 
>>> IFLA_BAREUDP_MULTIPROTO_MODE);
>>> +        }
>>> +        break;
>>>      case OVS_VPORT_TYPE_NETDEV:
>>>      case OVS_VPORT_TYPE_INTERNAL:
>>>      case OVS_VPORT_TYPE_LISP:
>>> @@ -470,6 +522,7 @@ dpif_netlink_rtnl_port_destroy(const char *name,
>>> const char *type)
>>>      case OVS_VPORT_TYPE_ERSPAN:
>>>      case OVS_VPORT_TYPE_IP6ERSPAN:
>>>      case OVS_VPORT_TYPE_IP6GRE:
>>> +    case OVS_VPORT_TYPE_BAREUDP:
>>>          return dpif_netlink_rtnl_destroy(name);
>>>      case OVS_VPORT_TYPE_NETDEV:
>>>      case OVS_VPORT_TYPE_INTERNAL:
>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>>> index 2f881e4fa..ceb56c685 100644
>>> --- a/lib/dpif-netlink.c
>>> +++ b/lib/dpif-netlink.c
>>> @@ -749,6 +749,9 @@ get_vport_type(const struct dpif_netlink_vport
>>> *vport)
>>>      case OVS_VPORT_TYPE_GTPU:
>>>          return "gtpu";
>>>
>>> +    case OVS_VPORT_TYPE_BAREUDP:
>>> +        return "bareudp";
>>> +
>>>      case OVS_VPORT_TYPE_UNSPEC:
>>>      case __OVS_VPORT_TYPE_MAX:
>>>          break;
>>> @@ -784,6 +787,8 @@ netdev_to_ovs_vport_type(const char *type)
>>>          return OVS_VPORT_TYPE_GRE;
>>>      } else if (!strcmp(type, "gtpu")) {
>>>          return OVS_VPORT_TYPE_GTPU;
>>> +    } else if (!strcmp(type, "bareudp")) {
>>> +        return OVS_VPORT_TYPE_BAREUDP;
>>>      } else {
>>>          return OVS_VPORT_TYPE_UNSPEC;
>>>      }
>>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>>> index 0252b61de..c86d420d7 100644
>>> --- a/lib/netdev-vport.c
>>> +++ b/lib/netdev-vport.c
>>> @@ -112,7 +112,7 @@ netdev_vport_needs_dst_port(const struct netdev
>>> *dev)
>>>      return (class->get_config == get_tunnel_config &&
>>>              (!strcmp("geneve", type) || !strcmp("vxlan", type) ||
>>>               !strcmp("lisp", type) || !strcmp("stt", type) ||
>>> -             !strcmp("gtpu", type)));
>>> +             !strcmp("gtpu", type) || !strcmp("bareudp",type)));
>>>  }
>>>
>>>  const char *
>>> @@ -219,6 +219,8 @@ netdev_vport_construct(struct netdev *netdev_)
>>>          dev->tnl_cfg.dst_port = port ? htons(port) :
>>> htons(STT_DST_PORT);
>>>      } else if (!strcmp(type, "gtpu")) {
>>>          dev->tnl_cfg.dst_port = port ? htons(port) :
>>> htons(GTPU_DST_PORT);
>>> +    } else if (!strcmp(type, "bareudp")) {
>>> +        dev->tnl_cfg.dst_port = htons(port);
>>>      }
>>>
>>>      dev->tnl_cfg.dont_fragment = true;
>>> @@ -438,6 +440,8 @@ tunnel_supported_layers(const char *type,
>>>          return TNL_L2 | TNL_L3;
>>>      } else if (!strcmp(type, "gtpu")) {
>>>          return TNL_L3;
>>> +    } else if (!strcmp(type, "bareudp")) {
>>> +        return TNL_L3;
>>>      } else {
>>>          return TNL_L2;
>>>      }
>>> @@ -745,6 +749,16 @@ set_tunnel_config(struct netdev *dev_, const 
>>> struct
>>> smap *args, char **errp)
>>>                      goto out;
>>>                  }
>>>              }
>>> +        } else if (!strcmp(node->key, "payload_type")) {
>>> +            if (strcmp(node->key, "mpls")) {
>>> +                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
>>> +                 tnl_cfg.exts |= (1 <<
>>> OVS_BAREUDP_EXT_MULTIPROTO_MODE);
>>> +            } else if ((strcmp(node->key, "ip"))) {
>>> +                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_IP);
>>> +                 tnl_cfg.exts |= (1 <<
>>> OVS_BAREUDP_EXT_MULTIPROTO_MODE);
>>> +            } else {
>>> +                 tnl_cfg.payload_ethertype = 
>>> htons(atoi(node->value));
>>
>> As the kernel only supports IPv4, IPv6, MPLS, and MPLS_MULTI, why not 
>> return
>> an error here if it's not one of these four?
>>
> The kernel accepts all the ethertypes even the custom ones.The bareudp
> device can be used to tunnel a proprietary protocol with a custom
> ethertype (eg 0x123). But i agree that we must return a error if we 
> are passed a unknown string (other than ip or mpls)

Good catch, taking any number will make it future proof :) As you 
mentioned, I think it would be good to make sure it’s really a number 
that is passed.

>>> +            }
>>>          } else {
>>>              ds_put_format(&errors, "%s: unknown %s argument 
>>> '%s'\n",
>>> name,
>>>                            type, node->key);
>>> @@ -917,7 +931,8 @@ get_tunnel_config(const struct netdev *dev, 
>>> struct
>>> smap *args)
>>>              (!strcmp("vxlan", type) && dst_port != VXLAN_DST_PORT) 
>>> ||
>>>              (!strcmp("lisp", type) && dst_port != LISP_DST_PORT) ||
>>>              (!strcmp("stt", type) && dst_port != STT_DST_PORT) ||
>>> -            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT)) {
>>> +            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT) ||
>>> +            !strcmp("bareudp", type)) {
>>>              smap_add_format(args, "dst_port", "%d", dst_port);
>>>          }
>>>      }
>>> @@ -1243,6 +1258,14 @@ netdev_vport_tunnel_register(void)
>>>            },
>>>            {{NULL, NULL, 0, 0}}
>>>          },
>>> +        { "udp_sys",
>>> +          {
>>> +              TUNNEL_FUNCTIONS_COMMON,
>>> +              .type = "bareudp",
>>> +              .get_ifindex = NETDEV_VPORT_GET_IFINDEX,
>>> +          },
>>> +          {{NULL, NULL, 0, 0}}
>>> +        },
>>>
>>>      };
>>>      static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>> diff --git a/lib/netdev.h b/lib/netdev.h
>>> index fb5073056..b705a9e56 100644
>>> --- a/lib/netdev.h
>>> +++ b/lib/netdev.h
>>> @@ -107,6 +107,7 @@ struct netdev_tunnel_config {
>>>      bool out_key_flow;
>>>      ovs_be64 out_key;
>>>
>>> +    ovs_be16 payload_ethertype;
>>>      ovs_be16 dst_port;
>>>
>>>      bool ip_src_flow;
>>> diff --git a/ofproto/ofproto-dpif-xlate.c 
>>> b/ofproto/ofproto-dpif-xlate.c
>>> index 11aa20754..7eeff14f6 100644
>>> --- a/ofproto/ofproto-dpif-xlate.c
>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>> @@ -3573,6 +3573,7 @@ propagate_tunnel_data_to_flow(struct xlate_ctx
>>> *ctx, struct eth_addr dmac,
>>>      case OVS_VPORT_TYPE_VXLAN:
>>>      case OVS_VPORT_TYPE_GENEVE:
>>>      case OVS_VPORT_TYPE_GTPU:
>>> +    case OVS_VPORT_TYPE_BAREUDP:
>>>          nw_proto = IPPROTO_UDP;
>>>          break;
>>>      case OVS_VPORT_TYPE_LISP:
>>> diff --git a/tests/system-layer3-tunnels.at
>>> b/tests/system-layer3-tunnels.at
>>> index 1232964bb..8423add2b 100644
>>> --- a/tests/system-layer3-tunnels.at
>>> +++ b/tests/system-layer3-tunnels.at
>>
>> These tests also get executed for the userspace test set,
>> system-userspace-testsuite.at, which will fail, so it needs to be 
>> excluded.
>>
> The check_min_kernel takes care of it? I see these tests are getting
> skipped for userspace tests

My bad, looked at the wrong macro definition :) It’s all good for now.

>>> @@ -152,3 +152,51 @@ AT_CHECK([tail -1 stdout], [0],
>>>
>>>  OVS_VSWITCHD_STOP
>>>  AT_CLEANUP
>>> +
>>> +AT_SETUP([layer3 - ping over MPLS Bareudp])
>>> +OVS_CHECK_MIN_KERNEL(5, 7)
>>> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
>>> +ADD_NAMESPACES(at_ns0, at_ns1)
>>> +
>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "36:b1:ee:7c:01:01")
>>> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24", "36:b1:ee:7c:01:02")
>>> +
>>> +ADD_OVS_TUNNEL([bareudp], [br0], [at_bareudp0], [8.1.1.3],
>>> [8.1.1.2/24],
>>> +               [ options:local_ip=8.1.1.2
>>> options:packet_type="legacy_l3" options:payload_type=mpls
>>> options:dst_port=6635])
>>> +
>>> +ADD_OVS_TUNNEL([bareudp], [br1], [at_bareudp1], [8.1.1.2],
>>> [8.1.1.3/24],
>>> +               [options:local_ip=8.1.1.3
>>> options:packet_type="legacy_l3" options:payload_type=mpls
>>> options:dst_port=6635])
>>> +
>>> +AT_DATA([flows0.txt], [dnl
>>> +table=0,priority=100,dl_type=0x0800
>>> actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp0
>>> +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp0 
>>> actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:01->dl_dst,set_field:36:b1:ee:7c:01:02->dl_src,output:ovs-p0
>>> +table=0,priority=10 actions=normal
>>> +])
>>
>> Maybe it would be good to also have an IP test case?
>>
> i will add
>
>>> +AT_DATA([flows1.txt], [dnl
>>> +table=0,priority=100,dl_type=0x0800
>>> actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp1
>>> +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp1 
>>> actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:02->dl_dst,set_field:36:b1:ee:7c:01:01->dl_src,output:ovs-p1
>>> +table=0,priority=10 actions=normal
>>> +])
>>> +
>>> +AT_CHECK([ip link add patch0 type veth peer name patch1])
>>> +on_exit 'ip link del patch0'
>>> +
>>> +AT_CHECK([ip link set dev patch0 up])
>>> +AT_CHECK([ip link set dev patch1 up])
>>> +AT_CHECK([ovs-vsctl add-port br0 patch0])
>>> +AT_CHECK([ovs-vsctl add-port br1 patch1])
>>> +
>>> +
>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br0 flows0.txt])
>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br1 flows1.txt])
>>> +
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
>>> FORMAT_PING], [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +
>>> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
>>> FORMAT_PING], [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +OVS_TRAFFIC_VSWITCHD_STOP
>>> +AT_CLEANUP
>>> -- 
>>> 2.18.4
>>
>> Can you also update the vswitchd/ovs-vswitchd.conf.db.5 man page with 
>> the
>> new tunnel and options?
>>
>
> I will add that.

Thanks, looking forward to your next rev.

//Eelco
diff mbox series

Patch

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index f85c4320e..ea3475f35 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -88,6 +88,7 @@  DOC_SOURCE = \
 	Documentation/faq/terminology.rst \
 	Documentation/faq/vlan.rst \
 	Documentation/faq/vxlan.rst \
+	Documentation/faq/bareudp.rst \
 	Documentation/internals/index.rst \
 	Documentation/internals/authors.rst \
 	Documentation/internals/bugs.rst \
diff --git a/Documentation/faq/bareudp.rst b/Documentation/faq/bareudp.rst
new file mode 100644
index 000000000..ef437631c
--- /dev/null
+++ b/Documentation/faq/bareudp.rst
@@ -0,0 +1,62 @@ 
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+=======
+Bareudp
+=======
+
+Q: What is Bareudp?
+
+    A: There are various L3 encapsulation standards using UDP being discussed
+       to leverage the UDP based load balancing capability of different
+       networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among
+       them.
+
+       The Bareudp tunnel provides a generic L3 encapsulation support for
+       tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP
+       tunnel.
+
+       An example to create bareudp device to tunnel MPLS traffic is given
+       below.::
+
+           $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \
+             type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \
+             options:payload_type=0x8847 options:dst_port=6635 \
+             options:packet_type="legacy_l3" \
+             ofport_request=$bareudp_egress_port
+
+       The bareudp device supports special handling for MPLS & IP as they can
+       have multiple ethertypes.
+       MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) &
+       ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes ETH_P_IP (v4)
+       & ETH_P_IPV6 (v6).
+
+       The bareudp device to tunnel L3 traffic with multiple ethertypes
+       (MPLS & IP) can be created by passing the L3 protocol name as string in
+       the field payload_type. An example to create bareudp device to tunnel
+       MPLS unicast & multicast traffic is given below.::
+
+           $ ovs-vsctl add-port  br_mpls udp_port -- set interface udp_port \
+             type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \
+             options:payload_type=mpls options:dst_port=6635 \
+             options:packet_type="legacy_l3"
diff --git a/Documentation/faq/index.rst b/Documentation/faq/index.rst
index 334b828b2..1dd29986a 100644
--- a/Documentation/faq/index.rst
+++ b/Documentation/faq/index.rst
@@ -30,6 +30,7 @@  Open vSwitch FAQ
 .. toctree::
    :maxdepth: 2
 
+   bareudp
    configuration
    contributing
    design
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 3623e3f40..68cbf1dbc 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -138,6 +138,7 @@  Q: Are all features available with all datapaths?
     Tunnel - ERSPAN                 4.18           2.10         2.10     NO
     Tunnel - ERSPAN-IPv6            4.18           2.10         2.10     NO
     Tunnel - GTP-U                  NO             NO           2.14     NO
+    Tunnel - Bareudp                5.7            NO           NO       NO
     QoS - Policing                  YES            1.1          2.6      NO
     QoS - Shaping                   YES            1.1          NO       NO
     sFlow                           YES            1.0          1.0      NO
diff --git a/NEWS b/NEWS
index 7e291a180..e3bc34a3f 100644
--- a/NEWS
+++ b/NEWS
@@ -75,7 +75,10 @@  v2.14.0 - 17 Aug 2020
    - GTP-U Tunnel Protocol
      * Add two new fields: tun_gtpu_flags, tun_gtpu_msgtype.
      * Only support for userspace datapath.
-
+   - Bareudp Tunnel
+     * Bareudp device support is present in linux kernel from version 5.7
+     * Kernel bareudp device is not backported to ovs tree.
+     * Userspace datapath support is not added
 
 v2.13.0 - 14 Feb 2020
 ---------------------
diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h
index 2d884312f..53d4225ec 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -246,6 +246,7 @@  enum ovs_vport_type {
 	OVS_VPORT_TYPE_IP6ERSPAN = 108, /* ERSPAN tunnel. */
 	OVS_VPORT_TYPE_IP6GRE = 109,
 	OVS_VPORT_TYPE_GTPU = 110,
+	OVS_VPORT_TYPE_BAREUDP = 111,  /* Bareudp tunnel. */
 	__OVS_VPORT_TYPE_MAX
 };
 
@@ -308,6 +309,14 @@  enum {
 
 #define OVS_VXLAN_EXT_MAX (__OVS_VXLAN_EXT_MAX - 1)
 
+enum {
+        OVS_BAREUDP_EXT_UNSPEC,
+        OVS_BAREUDP_EXT_MULTIPROTO_MODE,
+        __OVS_BAREUDP_EXT_MAX,
+};
+
+#define OVS_BAREUDP_EXT_MAX (__OVS_BAREUDP_EXT_MAX - 1)
+
 /* OVS_VPORT_ATTR_OPTIONS attributes for tunnels.
  */
 enum {
diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
index fd157ce2d..3e308e13e 100644
--- a/lib/dpif-netlink-rtnl.c
+++ b/lib/dpif-netlink-rtnl.c
@@ -58,6 +58,18 @@  VLOG_DEFINE_THIS_MODULE(dpif_netlink_rtnl);
 #define IFLA_GENEVE_UDP_ZERO_CSUM6_RX 10
 #endif
 
+#ifndef __IFLA_BAREUDP_MAX
+#define IFLA_BAREUDP_MAX 0
+#endif
+#if IFLA_BAREUDP_MAX < 4
+#define IFLA_BAREUDP_PORT 1
+#define IFLA_BAREUDP_ETHERTYPE 2
+#define IFLA_BAREUDP_SRCPORT_MIN 3
+#define IFLA_BAREUDP_MULTIPROTO_MODE 4
+#endif
+
+#define BAREUDP_MPLS_SRCPORT_MIN 49153
+
 static const struct nl_policy rtlink_policy[] = {
     [IFLA_LINKINFO] = { .type = NL_A_NESTED },
 };
@@ -81,6 +93,10 @@  static const struct nl_policy geneve_policy[] = {
     [IFLA_GENEVE_UDP_ZERO_CSUM6_RX] = { .type = NL_A_U8 },
     [IFLA_GENEVE_PORT] = { .type = NL_A_U16 },
 };
+static const struct nl_policy bareudp_policy[] = {
+    [IFLA_BAREUDP_PORT] = { .type = NL_A_U16 },
+    [IFLA_BAREUDP_ETHERTYPE] = { .type = NL_A_U16 },
+};
 
 static const char *
 vport_type_to_kind(enum ovs_vport_type type,
@@ -113,6 +129,8 @@  vport_type_to_kind(enum ovs_vport_type type,
         }
     case OVS_VPORT_TYPE_GTPU:
         return NULL;
+    case OVS_VPORT_TYPE_BAREUDP:
+        return "bareudp";
     case OVS_VPORT_TYPE_NETDEV:
     case OVS_VPORT_TYPE_INTERNAL:
     case OVS_VPORT_TYPE_LISP:
@@ -243,6 +261,24 @@  dpif_netlink_rtnl_geneve_verify(const struct netdev_tunnel_config *tnl_cfg,
 
     return err;
 }
+static int
+dpif_netlink_rtnl_bareudp_verify(const struct netdev_tunnel_config *tnl_cfg,
+                                const char *kind, struct ofpbuf *reply)
+{
+    struct nlattr *bareudp[ARRAY_SIZE(bareudp_policy)];
+    int err;
+
+    err = rtnl_policy_parse(kind, reply, bareudp_policy, bareudp,
+                            ARRAY_SIZE(bareudp_policy));
+    if (!err) {
+        if ((tnl_cfg->dst_port != nl_attr_get_be16(bareudp[IFLA_BAREUDP_PORT]))
+            || (tnl_cfg->payload_ethertype
+                != nl_attr_get_be16(bareudp[IFLA_BAREUDP_ETHERTYPE]))) {
+            err = EINVAL;
+        }
+    }
+    return err;
+}
 
 static int
 dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg,
@@ -275,6 +311,9 @@  dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg,
     case OVS_VPORT_TYPE_GENEVE:
         err = dpif_netlink_rtnl_geneve_verify(tnl_cfg, kind, reply);
         break;
+    case OVS_VPORT_TYPE_BAREUDP:
+        err = dpif_netlink_rtnl_bareudp_verify(tnl_cfg, kind, reply);
+        break;
     case OVS_VPORT_TYPE_NETDEV:
     case OVS_VPORT_TYPE_INTERNAL:
     case OVS_VPORT_TYPE_LISP:
@@ -357,6 +396,19 @@  dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
         nl_msg_put_u8(&request, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, 1);
         nl_msg_put_be16(&request, IFLA_GENEVE_PORT, tnl_cfg->dst_port);
         break;
+    case OVS_VPORT_TYPE_BAREUDP:
+        nl_msg_put_be16(&request, IFLA_BAREUDP_ETHERTYPE,
+                        tnl_cfg->payload_ethertype);
+        if ((tnl_cfg->payload_ethertype == htons(ETH_TYPE_MPLS)) ||
+            (tnl_cfg->payload_ethertype ==  htons(ETH_TYPE_MPLS_MCAST))) {
+            nl_msg_put_u16(&request, IFLA_BAREUDP_SRCPORT_MIN,
+                           BAREUDP_MPLS_SRCPORT_MIN);
+        }
+        nl_msg_put_be16(&request, IFLA_BAREUDP_PORT, tnl_cfg->dst_port);
+        if (tnl_cfg->exts & (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE)) {
+            nl_msg_put_flag(&request, IFLA_BAREUDP_MULTIPROTO_MODE);
+        }
+        break;
     case OVS_VPORT_TYPE_NETDEV:
     case OVS_VPORT_TYPE_INTERNAL:
     case OVS_VPORT_TYPE_LISP:
@@ -470,6 +522,7 @@  dpif_netlink_rtnl_port_destroy(const char *name, const char *type)
     case OVS_VPORT_TYPE_ERSPAN:
     case OVS_VPORT_TYPE_IP6ERSPAN:
     case OVS_VPORT_TYPE_IP6GRE:
+    case OVS_VPORT_TYPE_BAREUDP:
         return dpif_netlink_rtnl_destroy(name);
     case OVS_VPORT_TYPE_NETDEV:
     case OVS_VPORT_TYPE_INTERNAL:
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 2f881e4fa..ceb56c685 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -749,6 +749,9 @@  get_vport_type(const struct dpif_netlink_vport *vport)
     case OVS_VPORT_TYPE_GTPU:
         return "gtpu";
 
+    case OVS_VPORT_TYPE_BAREUDP:
+        return "bareudp";
+
     case OVS_VPORT_TYPE_UNSPEC:
     case __OVS_VPORT_TYPE_MAX:
         break;
@@ -784,6 +787,8 @@  netdev_to_ovs_vport_type(const char *type)
         return OVS_VPORT_TYPE_GRE;
     } else if (!strcmp(type, "gtpu")) {
         return OVS_VPORT_TYPE_GTPU;
+    } else if (!strcmp(type, "bareudp")) {
+        return OVS_VPORT_TYPE_BAREUDP;
     } else {
         return OVS_VPORT_TYPE_UNSPEC;
     }
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 0252b61de..c86d420d7 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -112,7 +112,7 @@  netdev_vport_needs_dst_port(const struct netdev *dev)
     return (class->get_config == get_tunnel_config &&
             (!strcmp("geneve", type) || !strcmp("vxlan", type) ||
              !strcmp("lisp", type) || !strcmp("stt", type) ||
-             !strcmp("gtpu", type)));
+             !strcmp("gtpu", type) || !strcmp("bareudp",type)));
 }
 
 const char *
@@ -219,6 +219,8 @@  netdev_vport_construct(struct netdev *netdev_)
         dev->tnl_cfg.dst_port = port ? htons(port) : htons(STT_DST_PORT);
     } else if (!strcmp(type, "gtpu")) {
         dev->tnl_cfg.dst_port = port ? htons(port) : htons(GTPU_DST_PORT);
+    } else if (!strcmp(type, "bareudp")) {
+        dev->tnl_cfg.dst_port = htons(port);
     }
 
     dev->tnl_cfg.dont_fragment = true;
@@ -438,6 +440,8 @@  tunnel_supported_layers(const char *type,
         return TNL_L2 | TNL_L3;
     } else if (!strcmp(type, "gtpu")) {
         return TNL_L3;
+    } else if (!strcmp(type, "bareudp")) {
+        return TNL_L3;
     } else {
         return TNL_L2;
     }
@@ -745,6 +749,16 @@  set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
                     goto out;
                 }
             }
+        } else if (!strcmp(node->key, "payload_type")) {
+            if (strcmp(node->key, "mpls")) {
+                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
+                 tnl_cfg.exts |= (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE);
+            } else if ((strcmp(node->key, "ip"))) {
+                 tnl_cfg.payload_ethertype = htons(ETH_TYPE_IP);
+                 tnl_cfg.exts |= (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE);
+            } else {
+                 tnl_cfg.payload_ethertype = htons(atoi(node->value));
+            }
         } else {
             ds_put_format(&errors, "%s: unknown %s argument '%s'\n", name,
                           type, node->key);
@@ -917,7 +931,8 @@  get_tunnel_config(const struct netdev *dev, struct smap *args)
             (!strcmp("vxlan", type) && dst_port != VXLAN_DST_PORT) ||
             (!strcmp("lisp", type) && dst_port != LISP_DST_PORT) ||
             (!strcmp("stt", type) && dst_port != STT_DST_PORT) ||
-            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT)) {
+            (!strcmp("gtpu", type) && dst_port != GTPU_DST_PORT) ||
+            !strcmp("bareudp", type)) {
             smap_add_format(args, "dst_port", "%d", dst_port);
         }
     }
@@ -1243,6 +1258,14 @@  netdev_vport_tunnel_register(void)
           },
           {{NULL, NULL, 0, 0}}
         },
+        { "udp_sys",
+          {
+              TUNNEL_FUNCTIONS_COMMON,
+              .type = "bareudp",
+              .get_ifindex = NETDEV_VPORT_GET_IFINDEX,
+          },
+          {{NULL, NULL, 0, 0}}
+        },
 
     };
     static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
diff --git a/lib/netdev.h b/lib/netdev.h
index fb5073056..b705a9e56 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -107,6 +107,7 @@  struct netdev_tunnel_config {
     bool out_key_flow;
     ovs_be64 out_key;
 
+    ovs_be16 payload_ethertype;
     ovs_be16 dst_port;
 
     bool ip_src_flow;
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 11aa20754..7eeff14f6 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3573,6 +3573,7 @@  propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, struct eth_addr dmac,
     case OVS_VPORT_TYPE_VXLAN:
     case OVS_VPORT_TYPE_GENEVE:
     case OVS_VPORT_TYPE_GTPU:
+    case OVS_VPORT_TYPE_BAREUDP:
         nw_proto = IPPROTO_UDP;
         break;
     case OVS_VPORT_TYPE_LISP:
diff --git a/tests/system-layer3-tunnels.at b/tests/system-layer3-tunnels.at
index 1232964bb..8423add2b 100644
--- a/tests/system-layer3-tunnels.at
+++ b/tests/system-layer3-tunnels.at
@@ -152,3 +152,51 @@  AT_CHECK([tail -1 stdout], [0],
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([layer3 - ping over MPLS Bareudp])
+OVS_CHECK_MIN_KERNEL(5, 7)
+OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "36:b1:ee:7c:01:01")
+ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24", "36:b1:ee:7c:01:02")
+
+ADD_OVS_TUNNEL([bareudp], [br0], [at_bareudp0], [8.1.1.3], [8.1.1.2/24],
+               [ options:local_ip=8.1.1.2 options:packet_type="legacy_l3" options:payload_type=mpls options:dst_port=6635])
+
+ADD_OVS_TUNNEL([bareudp], [br1], [at_bareudp1], [8.1.1.2], [8.1.1.3/24],
+               [options:local_ip=8.1.1.3 options:packet_type="legacy_l3" options:payload_type=mpls options:dst_port=6635])
+
+AT_DATA([flows0.txt], [dnl
+table=0,priority=100,dl_type=0x0800 actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp0
+table=0,priority=100,dl_type=0x8847 in_port=at_bareudp0 actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:01->dl_dst,set_field:36:b1:ee:7c:01:02->dl_src,output:ovs-p0
+table=0,priority=10 actions=normal
+])
+
+AT_DATA([flows1.txt], [dnl
+table=0,priority=100,dl_type=0x0800 actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp1
+table=0,priority=100,dl_type=0x8847 in_port=at_bareudp1 actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:02->dl_dst,set_field:36:b1:ee:7c:01:01->dl_src,output:ovs-p1
+table=0,priority=10 actions=normal
+])
+
+AT_CHECK([ip link add patch0 type veth peer name patch1])
+on_exit 'ip link del patch0'
+
+AT_CHECK([ip link set dev patch0 up])
+AT_CHECK([ip link set dev patch1 up])
+AT_CHECK([ovs-vsctl add-port br0 patch0])
+AT_CHECK([ovs-vsctl add-port br1 patch1])
+
+
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br0 flows0.txt])
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br1 flows1.txt])
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP