From patchwork Thu May 14 16:49:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Varghese X-Patchwork-Id: 1290489 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=AxT7d0X0; dkim-atps=neutral Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49NHbn5DYMz9sSr for ; Fri, 15 May 2020 02:50:45 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id DEE262044E; Thu, 14 May 2020 16:50:43 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ro6ImMhIyKo9; Thu, 14 May 2020 16:50:32 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 2BBEB221B5; Thu, 14 May 2020 16:50:32 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 112F9C088E; Thu, 14 May 2020 16:50:32 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 12F06C016F for ; Thu, 14 May 2020 16:50:31 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id EAA7B2044E for ; Thu, 14 May 2020 16:50:30 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ysGX+Wo3WCvp for ; Thu, 14 May 2020 16:50:17 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by silver.osuosl.org (Postfix) with ESMTPS id B053C221B5 for ; Thu, 14 May 2020 16:50:13 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id x13so1538823pfn.11 for ; Thu, 14 May 2020 09:50:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=1efpOloLHJjNeTcqlJfGbO5YLvEn6AX+nPd1MUHd+Z8=; b=AxT7d0X0gSlMZsT7cMO9sZbA7usa98H10Zt4raKqO3iBPmVnvycAhytiyXALbf+WnG r7AmRq3NT9bhbeXW/S5v+JPYFi82xK9P+JeG5R8Uva05XCvfLh8TWqpRIpKw5ERHr5mm bDGgxJwYtNMzqmqNy2ZvyXBWiOAPGBxSYqLRYflUnX+/rSSpaC8/t1hAipKLX7Opy7OO tBOuztBFis5oMupw0ouVAiqzOT1Xwp+7cq+DHf4feVpnPBxab4AfA3bd7u7Z94W+Axox v/HZ7VNRxZ7gg9PUgi/FqEJsVeNImh7k5kHmg2Zy9fLZ+fFbfyqhqcmGXbGw+gHu9rQx vDaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=1efpOloLHJjNeTcqlJfGbO5YLvEn6AX+nPd1MUHd+Z8=; b=TuaNuG4hV/BZNHHPoQtogeCf2o7svZ7fKEyG4dobPSfLJzu/gpJ9bUmcGesQHF1sSI WNkBNcsUzod+01GkuvoygLwdSPgB0QNxw93VhZf93n2UXfOqfyY6iLXGLWBVNxcUMr6Q CgC7Ez4BLLURQax1rlAc3C2HoYosNzYsa4oYE52dZWT5toJfWwgPSoERjW7dIIO5ibV6 CuP23qc1XovI6hXLvnmS6XU5vjvoSzus1Sx+znvqjUxQGCgFICp4qStM6I5VW/a5O0lZ e1/JJAhkWiJ26yMv+lLkiZ+UWXxgaeQJ0q4vUPqfzunmsyz7ECIYPBvM9KTPNEeNwWlK aJww== X-Gm-Message-State: AOAM533x+wNlhh48U6yivo2F+aJGX1b6eLWg+DD+y/b5b0C9/5LBypzW xeRPgxpIMzLSCLmhZdqDSIUZuPUl X-Google-Smtp-Source: ABdhPJyxQQRvAIfmEtXof/Oibi4OIMr68hW6gtHb6cgMW+Iswf5XlKfR9Te2T0tEctdLErhhBjl+JA== X-Received: by 2002:a62:1452:: with SMTP id 79mr5270165pfu.108.1589474998127; Thu, 14 May 2020 09:49:58 -0700 (PDT) Received: from martin-VirtualBox.apac.nsn-net.net ([137.97.64.128]) by smtp.gmail.com with ESMTPSA id p1sm19181725pjk.50.2020.05.14.09.49.53 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 May 2020 09:49:56 -0700 (PDT) From: Martin Varghese To: dev@openvswitch.org Date: Thu, 14 May 2020 22:19:41 +0530 Message-Id: <1589474981-3542-1-git-send-email-martinvarghesenokia@gmail.com> X-Mailer: git-send-email 1.9.1 Cc: Martin Varghese Subject: [ovs-dev] [PATCH v3] Bareudp Tunnel Support X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Martin Varghese UDP encapsulation support for tunnelling different protocols like MPLS, IP, NSH etc. Upstream commit: commit 571912c69f0ed731bd1e071ade9dc7ca4aa52065 Author: Martin Varghese Date: Mon Feb 24 10:57:50 2020 +0530 net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc. The Bareudp tunnel module provides a generic L3 encapsulation tunnelling module for tunnelling different protocols like MPLS, IP,NSH etc inside a UDP tunnel. Signed-off-by: Martin Varghese Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Martin Varghese --- Documentation/automake.mk | 1 + Documentation/faq/bareudp.rst | 62 ++ Documentation/faq/index.rst | 1 + Documentation/faq/releases.rst | 1 + NEWS | 3 +- datapath/linux/Modules.mk | 2 + datapath/linux/compat/bareudp.c | 978 ++++++++++++++++++++++ datapath/linux/compat/include/linux/if_link.h | 11 + datapath/linux/compat/include/linux/openvswitch.h | 11 + datapath/linux/compat/include/net/bareudp.h | 59 ++ datapath/linux/compat/include/net/ip6_tunnel.h | 9 + datapath/linux/compat/include/net/ip_tunnels.h | 7 + datapath/linux/compat/ip6_tunnel.c | 60 ++ datapath/linux/compat/ip_tunnel.c | 47 ++ datapath/vport.c | 11 +- lib/dpif-netlink-rtnl.c | 53 ++ lib/dpif-netlink.c | 10 + lib/netdev-vport.c | 25 +- lib/netdev.h | 1 + ofproto/ofproto-dpif-xlate.c | 1 + tests/system-layer3-tunnels.at | 47 ++ 21 files changed, 1396 insertions(+), 4 deletions(-) create mode 100644 Documentation/faq/bareudp.rst create mode 100644 datapath/linux/compat/bareudp.c create mode 100644 datapath/linux/compat/include/net/bareudp.h diff --git a/Documentation/automake.mk b/Documentation/automake.mk index f85c432..ea3475f 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -88,6 +88,7 @@ DOC_SOURCE = \ Documentation/faq/terminology.rst \ Documentation/faq/vlan.rst \ Documentation/faq/vxlan.rst \ + Documentation/faq/bareudp.rst \ Documentation/internals/index.rst \ Documentation/internals/authors.rst \ Documentation/internals/bugs.rst \ diff --git a/Documentation/faq/bareudp.rst b/Documentation/faq/bareudp.rst new file mode 100644 index 0000000..7fdf05d --- /dev/null +++ b/Documentation/faq/bareudp.rst @@ -0,0 +1,62 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +======= +Bareudp +======= + +Q: What is Bareudp? + + A: There are various L3 encapsulation standards using UDP being discussed + to leverage the UDP based load balancing capability of different + networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among + them. + + The Bareudp tunnel provides a generic L3 encapsulation tunnelling + support for tunnelling different L3 protocols like MPLS, IP, NSH etc. + inside a UDP tunnel. + + The bareudp device supports special handling for MPLS & IP as they can + have multiple ethertypes. + MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & + ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes ETH_P_IP (v4) + & ETH_P_IPV6 (v6). + + An example to create bareudp device to tunnel MPLS traffic is given + below.:: + + $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ + type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ + options:payload_type=0x8847 options:dst_port=6635 \ + options:packet_type="legacy_l3" \ + ofport_request=$bareudp_egress_port + + The bareudp device to tunnel L3 traffic with muptiple ethertypes + (MPLS & IP) can be created by passing the L3 protocol name as string in + the field payload_type. An example to create bareudp device to tunnel + MPLS unicast & multicast traffic is given below.:: + + $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ + type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ + options:payload_type=mpls options:dst_port=6635 \ + options:packet_type="legacy_l3" diff --git a/Documentation/faq/index.rst b/Documentation/faq/index.rst index 334b828..1dd2998 100644 --- a/Documentation/faq/index.rst +++ b/Documentation/faq/index.rst @@ -30,6 +30,7 @@ Open vSwitch FAQ .. toctree:: :maxdepth: 2 + bareudp configuration contributing design diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst index 3903e59..4abc824 100644 --- a/Documentation/faq/releases.rst +++ b/Documentation/faq/releases.rst @@ -132,6 +132,7 @@ Q: Are all features available with all datapaths? Tunnel - ERSPAN 4.18 2.10 2.10 NO Tunnel - ERSPAN-IPv6 4.18 2.10 2.10 NO Tunnel - GTP-U NO NO 2.14 NO + Tunnel - Bareudp 5.6 2.14 2.14 NO QoS - Policing YES 1.1 2.6 NO QoS - Shaping YES 1.1 NO NO sFlow YES 1.0 1.0 NO diff --git a/NEWS b/NEWS index 3dbd8ec..0d5bc25 100644 --- a/NEWS +++ b/NEWS @@ -16,7 +16,8 @@ Post-v2.13.0 by enabling interrupt mode. - Userspace datapath: * Add support for conntrack zone-based timeout policy. - + - Bareudp Tunnel + * Userspace datapath support is not added. v2.13.0 - 14 Feb 2020 --------------------- diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk index 63a5cba..2028afc 100644 --- a/datapath/linux/Modules.mk +++ b/datapath/linux/Modules.mk @@ -1,4 +1,5 @@ openvswitch_sources += \ + linux/compat/bareudp.c \ linux/compat/dev-openvswitch.c \ linux/compat/dst_cache.c \ linux/compat/exthdrs_core.c \ @@ -77,6 +78,7 @@ openvswitch_headers += \ linux/compat/include/net/dst_metadata.h \ linux/compat/include/net/genetlink.h \ linux/compat/include/net/geneve.h \ + linux/compat/include/net/bareudp.h \ linux/compat/include/net/gre.h \ linux/compat/include/net/inet_ecn.h \ linux/compat/include/net/inet_frag.h \ diff --git a/datapath/linux/compat/bareudp.c b/datapath/linux/compat/bareudp.c new file mode 100644 index 0000000..c432d79 --- /dev/null +++ b/datapath/linux/compat/bareudp.c @@ -0,0 +1,978 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Bareudp: UDP tunnel encasulation for different Payload types like + * MPLS, NSH, IP, etc. + * Copyright (c) 2019 Nokia, Inc. + * Authors: Martin Varghese, + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "compat.h" +#include "vport-netdev.h" + +#ifndef USE_UPSTREAM_TUNNEL + +#define BAREUDP_BASE_HLEN sizeof(struct udphdr) +#define BAREUDP_IPV4_HLEN (sizeof(struct iphdr) + \ + sizeof(struct udphdr)) +#define BAREUDP_IPV6_HLEN (sizeof(struct ipv6hdr) + \ + sizeof(struct udphdr)) + +static bool log_ecn_error = true; +module_param(log_ecn_error, bool, 0644); +MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); + +/* per-network namespace private data for this module */ + +static unsigned int bareudp_net_id; + +struct bareudp_net { + struct list_head bareudp_list; +}; + +/* Pseudo network device */ +struct bareudp_dev { + struct net *net; /* netns for packet i/o */ + struct net_device *dev; /* netdev for bareudp tunnel */ + __be16 ethertype; + __be16 port; + u16 sport_min; + bool multi_proto_mode; + struct socket __rcu *sock; + struct list_head next; /* bareudp node on namespace list */ +}; + +static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{ + struct metadata_dst *tun_dst = NULL; + struct pcpu_sw_netstats *stats; + struct bareudp_dev *bareudp; + unsigned short family; + unsigned int len; + __be16 proto; + void *oiph; + int err; + union { + struct metadata_dst dst; + char buf[sizeof(struct metadata_dst) + 256]; + } buf; + + bareudp = rcu_dereference_sk_user_data(sk); + if (!bareudp) + goto drop; + + if (skb->protocol == htons(ETH_P_IP)) + family = AF_INET; + else + family = AF_INET6; + + if (bareudp->ethertype == htons(ETH_P_IP)) { + struct iphdr *iphdr; + + iphdr = (struct iphdr *)(skb->data + BAREUDP_BASE_HLEN); + if (iphdr->version == 4) { + proto = bareudp->ethertype; + } else if (bareudp->multi_proto_mode && (iphdr->version == 6)) { + proto = htons(ETH_P_IPV6); + } else { + bareudp->dev->stats.rx_dropped++; + goto drop; + } + } else if (bareudp->ethertype == htons(ETH_P_MPLS_UC)) { + struct iphdr *tunnel_hdr; + + tunnel_hdr = (struct iphdr *)skb_network_header(skb); + if (tunnel_hdr->version == 4) { + if (!ipv4_is_multicast(tunnel_hdr->daddr)) { + proto = bareudp->ethertype; + } else if (bareudp->multi_proto_mode && + ipv4_is_multicast(tunnel_hdr->daddr)) { + proto = htons(ETH_P_MPLS_MC); + } else { + bareudp->dev->stats.rx_dropped++; + goto drop; + } + } else { + int addr_type; + struct ipv6hdr *tunnel_hdr_v6; + + tunnel_hdr_v6 = (struct ipv6hdr *)skb_network_header(skb); + addr_type = + ipv6_addr_type((struct in6_addr *)&tunnel_hdr_v6->daddr); + if (!(addr_type & IPV6_ADDR_MULTICAST)) { + proto = bareudp->ethertype; + } else if (bareudp->multi_proto_mode && + (addr_type & IPV6_ADDR_MULTICAST)) { + proto = htons(ETH_P_MPLS_MC); + } else { + bareudp->dev->stats.rx_dropped++; + goto drop; + } + } + } else { + proto = bareudp->ethertype; + } + + if (iptunnel_pull_header(skb, BAREUDP_BASE_HLEN, + proto, + !net_eq(bareudp->net, + dev_net(bareudp->dev)))) { + bareudp->dev->stats.rx_dropped++; + goto drop; + } + tun_dst = &buf.dst; + ovs_udp_tun_rx_dst(tun_dst, skb, family, TUNNEL_KEY, 0, 0); + if (!tun_dst) { + bareudp->dev->stats.rx_dropped++; + goto drop; + } + ovs_skb_dst_set(skb, &tun_dst->dst); + + skb->dev = bareudp->dev; + oiph = skb_network_header(skb); + skb_reset_network_header(skb); + + if (family == AF_INET) + err = IP_ECN_decapsulate(oiph, skb); +#if IS_ENABLED(CONFIG_IPV6) + else + err = IP6_ECN_decapsulate(oiph, skb); +#endif + + if (unlikely(err)) { + if (log_ecn_error) { + if (family == AF_INET) + net_info_ratelimited("non-ECT from %pI4 " + "with TOS=%#x\n", + &((struct iphdr *)oiph)->saddr, + ((struct iphdr *)oiph)->tos); +#if IS_ENABLED(CONFIG_IPV6) + else + net_info_ratelimited("non-ECT from %pI6\n", + &((struct ipv6hdr *)oiph)->saddr); +#endif + } + if (err > 1) { + ++bareudp->dev->stats.rx_frame_errors; + ++bareudp->dev->stats.rx_errors; + goto drop; + } + } + + len = skb->len; + netdev_port_receive(skb, skb_tunnel_info(skb)); + if (likely(err == NET_RX_SUCCESS)) { + stats = this_cpu_ptr(bareudp->dev->tstats); + u64_stats_update_begin(&stats->syncp); + stats->rx_packets++; + stats->rx_bytes += len; + u64_stats_update_end(&stats->syncp); + } + return 0; +drop: + /* Consume bad packet */ + kfree_skb(skb); + + return 0; +} + +static int bareudp_init(struct net_device *dev) +{ + dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); + if (!dev->tstats) + return -ENOMEM; + + return 0; +} + +static void bareudp_uninit(struct net_device *dev) +{ + free_percpu(dev->tstats); +} + +static struct socket *bareudp_create_sock(struct net *net, __be16 port) +{ + struct udp_port_cfg udp_conf; + struct socket *sock; + int err; + + memset(&udp_conf, 0, sizeof(udp_conf)); +#if IS_ENABLED(CONFIG_IPV6) + udp_conf.family = AF_INET6; +#else + udp_conf.family = AF_INET; +#endif + udp_conf.local_udp_port = port; + /* Open UDP socket */ + err = udp_sock_create(net, &udp_conf, &sock); + if (err < 0) + return ERR_PTR(err); + + return sock; +} + +/* Create new listen socket if needed */ +static int bareudp_socket_create(struct bareudp_dev *bareudp, __be16 port) +{ + struct udp_tunnel_sock_cfg tunnel_cfg; + struct socket *sock; + + sock = bareudp_create_sock(bareudp->net, port); + if (IS_ERR(sock)) + return PTR_ERR(sock); + + /* Mark socket as an encapsulation socket */ + memset(&tunnel_cfg, 0, sizeof(tunnel_cfg)); + tunnel_cfg.sk_user_data = bareudp; + tunnel_cfg.encap_type = 1; + tunnel_cfg.encap_rcv = bareudp_udp_encap_recv; + tunnel_cfg.encap_destroy = NULL; + setup_udp_tunnel_sock(bareudp->net, sock, &tunnel_cfg); + + /* As the setup_udp_tunnel_sock does not call udp_encap_enable if the + * socket type is v6 an explicit call to udp_encap_enable is needed. + */ + if (sock->sk->sk_family == AF_INET6) + udp_encap_enable(); + + rcu_assign_pointer(bareudp->sock, sock); + return 0; +} + +static int bareudp_open(struct net_device *dev) +{ + struct bareudp_dev *bareudp = netdev_priv(dev); + int ret = 0; + + ret = bareudp_socket_create(bareudp, bareudp->port); + return ret; +} + +static void bareudp_sock_release(struct bareudp_dev *bareudp) +{ + struct socket *sock; + + sock = bareudp->sock; + rcu_assign_pointer(bareudp->sock, NULL); + synchronize_net(); + udp_tunnel_sock_release(sock); +} + +static int bareudp_stop(struct net_device *dev) +{ + struct bareudp_dev *bareudp = netdev_priv(dev); + + bareudp_sock_release(bareudp); + return 0; +} + +static int bareudp_xmit_skb(struct sk_buff *skb, struct net_device *dev, + struct bareudp_dev *bareudp, + const struct ip_tunnel_info *info) +{ + bool xnet = !net_eq(bareudp->net, dev_net(bareudp->dev)); + bool use_cache = ip_tunnel_dst_cache_usable(skb, info); + struct socket *sock = rcu_dereference(bareudp->sock); + bool udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM); + const struct ip_tunnel_key *key = &info->key; + struct rtable *rt; + __be16 sport, df; + int min_headroom; + __u8 tos, ttl; + __be32 saddr; + int err; + + if (!sock) + return -ESHUTDOWN; + + rt = ip_route_output_tunnel(skb, dev, bareudp->net, &saddr, info, + IPPROTO_UDP, use_cache); + + if (IS_ERR(rt)) + return PTR_ERR(rt); + + sport = udp_flow_src_port(bareudp->net, skb, + bareudp->sport_min, USHRT_MAX, + true); + tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb); + ttl = key->ttl; + df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; + skb_scrub_packet(skb, xnet); + + err = -ENOSPC; + if (!skb_pull(skb, skb_network_offset(skb))) + goto free_dst; + + min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len + + BAREUDP_BASE_HLEN + info->options_len + sizeof(struct iphdr); + + err = skb_cow_head(skb, min_headroom); + if (unlikely(err)) + goto free_dst; + + err = udp_tunnel_handle_offloads(skb, udp_sum); + if (err) + goto free_dst; + + skb_set_inner_protocol(skb, bareudp->ethertype); + udp_tunnel_xmit_skb(rt, sock->sk, skb, saddr, info->key.u.ipv4.dst, + tos, ttl, df, sport, bareudp->port, + !net_eq(bareudp->net, dev_net(bareudp->dev)), + !(info->key.tun_flags & TUNNEL_CSUM)); + return 0; + +free_dst: + dst_release(&rt->dst); + return err; +} + +#if IS_ENABLED(CONFIG_IPV6) +static int bareudp6_xmit_skb(struct sk_buff *skb, struct net_device *dev, + struct bareudp_dev *bareudp, + const struct ip_tunnel_info *info) +{ + bool xnet = !net_eq(bareudp->net, dev_net(bareudp->dev)); + bool use_cache = ip_tunnel_dst_cache_usable(skb, info); + struct socket *sock = rcu_dereference(bareudp->sock); + bool udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM); + const struct ip_tunnel_key *key = &info->key; + struct dst_entry *dst = NULL; + struct in6_addr saddr, daddr; + int min_headroom; + __u8 prio, ttl; + __be16 sport; + int err; + + if (!sock) + return -ESHUTDOWN; + + dst = ip6_dst_lookup_tunnel(skb, dev, bareudp->net, sock, &saddr, info, + IPPROTO_UDP, use_cache); + if (IS_ERR(dst)) + return PTR_ERR(dst); + + sport = udp_flow_src_port(bareudp->net, skb, + bareudp->sport_min, USHRT_MAX, + true); + prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb); + ttl = key->ttl; + + skb_scrub_packet(skb, xnet); + + err = -ENOSPC; + if (!skb_pull(skb, skb_network_offset(skb))) + goto free_dst; + + min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len + + BAREUDP_BASE_HLEN + info->options_len + sizeof(struct iphdr); + + err = skb_cow_head(skb, min_headroom); + if (unlikely(err)) + goto free_dst; + + err = udp_tunnel_handle_offloads(skb, udp_sum); + if (err) + goto free_dst; + + daddr = info->key.u.ipv6.dst; + udp_tunnel6_xmit_skb(dst, sock->sk, skb, dev, + &saddr, &daddr, prio, ttl, + info->key.label, sport, bareudp->port, + !(info->key.tun_flags & TUNNEL_CSUM)); + return 0; + +free_dst: + dst_release(dst); + return err; +} +#endif + +netdev_tx_t rpl_bareudp_xmit(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct bareudp_dev *bareudp = netdev_priv(dev); + struct ip_tunnel_info *info = NULL; + int err; + + if (skb->protocol != bareudp->ethertype) { + if (!bareudp->multi_proto_mode || + (skb->protocol != htons(ETH_P_MPLS_MC) && + skb->protocol != htons(ETH_P_IPV6))) { + err = -EINVAL; + goto tx_error; + } + } + + info = skb_tunnel_info(skb); + if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) { + err = -EINVAL; + goto tx_error; + } + + rcu_read_lock(); +#if IS_ENABLED(CONFIG_IPV6) + if (info->mode & IP_TUNNEL_INFO_IPV6) + err = bareudp6_xmit_skb(skb, dev, bareudp, info); + else +#endif + err = bareudp_xmit_skb(skb, dev, bareudp, info); + + rcu_read_unlock(); + + if (likely(!err)) + return NETDEV_TX_OK; +tx_error: + dev_kfree_skb(skb); + + if (err == -ELOOP) + dev->stats.collisions++; + else if (err == -ENETUNREACH) + dev->stats.tx_carrier_errors++; + + dev->stats.tx_errors++; + return NETDEV_TX_OK; +} +EXPORT_SYMBOL_GPL(rpl_bareudp_xmit); + +static netdev_tx_t bareudp_dev_xmit(struct sk_buff *skb, struct net_device *dev) +{ + /* Drop All packets coming from networking stack. OVS-CB is + * not initialized for these packets. + */ + dev_kfree_skb(skb); + dev->stats.tx_dropped++; + return NETDEV_TX_OK; +} + + +int ovs_bareudp_fill_metadata_dst(struct net_device *dev, + struct sk_buff *skb) +{ + struct ip_tunnel_info *info = skb_tunnel_info(skb); + struct bareudp_dev *bareudp = netdev_priv(dev); + bool use_cache; + + use_cache = ip_tunnel_dst_cache_usable(skb, info); + + if (ip_tunnel_info_af(info) == AF_INET) { + struct rtable *rt; + __be32 saddr; + + rt = ip_route_output_tunnel(skb, dev, bareudp->net, &saddr, + info, IPPROTO_UDP, use_cache); + if (IS_ERR(rt)) + return PTR_ERR(rt); + + ip_rt_put(rt); + info->key.u.ipv4.src = saddr; +#if IS_ENABLED(CONFIG_IPV6) + } else if (ip_tunnel_info_af(info) == AF_INET6) { + struct dst_entry *dst; + struct in6_addr saddr; + struct socket *sock = rcu_dereference(bareudp->sock); + + dst = ip6_dst_lookup_tunnel(skb, dev, bareudp->net, sock, + &saddr, info, IPPROTO_UDP, + use_cache); + if (IS_ERR(dst)) + return PTR_ERR(dst); + + dst_release(dst); + info->key.u.ipv6.src = saddr; +#endif + } else { + return -EINVAL; + } + + info->key.tp_src = udp_flow_src_port(bareudp->net, skb, + bareudp->sport_min, + USHRT_MAX, true); + info->key.tp_dst = bareudp->port; + return 0; +} +EXPORT_SYMBOL_GPL(ovs_bareudp_fill_metadata_dst); + +static const struct net_device_ops bareudp_netdev_ops = { + .ndo_init = bareudp_init, + .ndo_uninit = bareudp_uninit, + .ndo_open = bareudp_open, + .ndo_stop = bareudp_stop, + .ndo_start_xmit = bareudp_dev_xmit, + .ndo_get_stats64 = ip_tunnel_get_stats64, + .ndo_fill_metadata_dst = bareudp_fill_metadata_dst, +}; + +static const struct nla_policy bareudp_policy[IFLA_BAREUDP_MAX + 1] = { + [IFLA_BAREUDP_PORT] = { .type = NLA_U16 }, + [IFLA_BAREUDP_ETHERTYPE] = { .type = NLA_U16 }, + [IFLA_BAREUDP_SRCPORT_MIN] = { .type = NLA_U16 }, + [IFLA_BAREUDP_MULTIPROTO_MODE] = { .type = NLA_FLAG }, +}; + +/* Info for udev, that this is a virtual tunnel endpoint */ +static struct device_type bareudp_type = { + .name = "bareudp", +}; + +/* Initialize the device structure. */ +static void bareudp_setup(struct net_device *dev) +{ + dev->netdev_ops = &bareudp_netdev_ops; + SET_NETDEV_DEVTYPE(dev, &bareudp_type); + dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM; + dev->features |= NETIF_F_RXCSUM; + dev->features |= NETIF_F_GSO_SOFTWARE; + dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM; + dev->hw_features |= NETIF_F_GSO_SOFTWARE; + dev->hard_header_len = 0; + dev->addr_len = 0; + dev->mtu = IP_MAX_MTU - BAREUDP_BASE_HLEN; + dev->type = ARPHRD_NONE; + netif_keep_dst(dev); + dev->priv_flags |= IFF_NO_QUEUE; + dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST; +} +#ifdef HAVE_EXT_ACK_IN_RTNL_LINKOPS +static int bareudp_validate(struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) +#else +static int bareudp_validate(struct nlattr *tb[], struct nlattr *data[]) +#endif +{ + if (!data) { + return -EINVAL; + } + return 0; +} +#ifdef HAVE_EXT_ACK_IN_RTNL_LINKOPS +static int bareudp2info(struct nlattr *data[], struct bareudp_conf *conf, + struct netlink_ext_ack *extack) +#else +static int bareudp2info(struct nlattr *data[], struct bareudp_conf *conf) +#endif +{ + if (!data[IFLA_BAREUDP_PORT]) { + return -EINVAL; + } + if (!data[IFLA_BAREUDP_ETHERTYPE]) { + return -EINVAL; + } + + if (data[IFLA_BAREUDP_PORT]) + conf->port = nla_get_u16(data[IFLA_BAREUDP_PORT]); + + if (data[IFLA_BAREUDP_ETHERTYPE]) + conf->ethertype = nla_get_u16(data[IFLA_BAREUDP_ETHERTYPE]); + + if (data[IFLA_BAREUDP_SRCPORT_MIN]) + conf->sport_min = nla_get_u16(data[IFLA_BAREUDP_SRCPORT_MIN]); + + return 0; +} + +static struct bareudp_dev *bareudp_find_dev(struct bareudp_net *bn, + const struct bareudp_conf *conf) +{ + struct bareudp_dev *bareudp, *t = NULL; + + list_for_each_entry(bareudp, &bn->bareudp_list, next) { + if (conf->port == bareudp->port) + t = bareudp; + } + return t; +} + +static int bareudp_configure(struct net *net, struct net_device *dev, + struct bareudp_conf *conf) +{ + struct bareudp_net *bn = net_generic(net, bareudp_net_id); + struct bareudp_dev *t, *bareudp = netdev_priv(dev); + int err; + + bareudp->net = net; + bareudp->dev = dev; + t = bareudp_find_dev(bn, conf); + if (t) + return -EBUSY; + + if (conf->multi_proto_mode && + (conf->ethertype != htons(ETH_P_MPLS_UC) && + conf->ethertype != htons(ETH_P_IP))) + return -EINVAL; + + bareudp->port = conf->port; + bareudp->ethertype = conf->ethertype; + bareudp->sport_min = conf->sport_min; + bareudp->multi_proto_mode = conf->multi_proto_mode; + err = register_netdevice(dev); + if (err) + return err; + + list_add(&bareudp->next, &bn->bareudp_list); + return 0; +} + +static int bareudp_link_config(struct net_device *dev, + struct nlattr *tb[]) +{ + int err; + + if (tb[IFLA_MTU]) { + err = dev_set_mtu(dev, nla_get_u32(tb[IFLA_MTU])); + if (err) + return err; + } + return 0; +} +#ifdef HAVE_EXT_ACK_IN_RTNL_LINKOPS +static int bareudp_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) +#else +static int bareudp_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +#endif + +{ + struct bareudp_conf conf; + int err; +#ifdef HAVE_EXT_ACK_IN_RTNL_LINKOPS + err = bareudp2info(data, &conf, extack); +#else + err = bareudp2info(data, &conf); +#endif + if (err) + return err; + + err = bareudp_configure(net, dev, &conf); + if (err) + return err; + + err = bareudp_link_config(dev, tb); + if (err) + return err; + + return 0; +} + +static void bareudp_dellink(struct net_device *dev, struct list_head *head) +{ + struct bareudp_dev *bareudp = netdev_priv(dev); + + list_del(&bareudp->next); + unregister_netdevice_queue(dev, head); +} + +static size_t bareudp_get_size(const struct net_device *dev) +{ + return nla_total_size(sizeof(__be16)) + /* IFLA_BAREUDP_PORT */ + nla_total_size(sizeof(__be16)) + /* IFLA_BAREUDP_ETHERTYPE */ + nla_total_size(sizeof(__u16)) + /* IFLA_BAREUDP_SRCPORT_MIN */ + nla_total_size(0) + /* IFLA_BAREUDP_MULTIPROTO_MODE */ + 0; +} + +static int bareudp_fill_info(struct sk_buff *skb, const struct net_device *dev) +{ + struct bareudp_dev *bareudp = netdev_priv(dev); + + if (nla_put_be16(skb, IFLA_BAREUDP_PORT, bareudp->port)) + goto nla_put_failure; + if (nla_put_be16(skb, IFLA_BAREUDP_ETHERTYPE, bareudp->ethertype)) + goto nla_put_failure; + if (nla_put_u16(skb, IFLA_BAREUDP_SRCPORT_MIN, bareudp->sport_min)) + goto nla_put_failure; + if (bareudp->multi_proto_mode && + nla_put_flag(skb, IFLA_BAREUDP_MULTIPROTO_MODE)) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + +static struct rtnl_link_ops bareudp_link_ops __read_mostly = { + .kind = "ovs_bareudp", + .maxtype = IFLA_BAREUDP_MAX, + .policy = bareudp_policy, + .priv_size = sizeof(struct bareudp_dev), + .setup = bareudp_setup, + .validate = bareudp_validate, + .newlink = bareudp_newlink, + .dellink = bareudp_dellink, + .get_size = bareudp_get_size, + .fill_info = bareudp_fill_info, +}; + +struct net_device *rpl_bareudp_dev_create(struct net *net, const char *name, + u8 name_assign_type, + struct bareudp_conf *conf) +{ + struct nlattr *tb[IFLA_MAX + 1]; + struct net_device *dev; + LIST_HEAD(list_kill); + int err; + + memset(tb, 0, sizeof(tb)); + dev = rtnl_create_link(net, name, name_assign_type, + &bareudp_link_ops, tb); + if (IS_ERR(dev)) + return dev; + + err = bareudp_configure(net, dev, conf); + if (err) { + free_netdev(dev); + return ERR_PTR(err); + } + err = dev_set_mtu(dev, IP_MAX_MTU - BAREUDP_BASE_HLEN); + if (err) + goto err; + + err = rtnl_configure_link(dev, NULL); + if (err < 0) + goto err; + + return dev; +err: + bareudp_dellink(dev, &list_kill); + unregister_netdevice_many(&list_kill); + return ERR_PTR(err); +} +EXPORT_SYMBOL_GPL(rpl_bareudp_dev_create); + +static __net_init int bareudp_init_net(struct net *net) +{ + struct bareudp_net *bn = net_generic(net, bareudp_net_id); + + INIT_LIST_HEAD(&bn->bareudp_list); + return 0; +} + +static void bareudp_destroy_tunnels(struct net *net, struct list_head *head) +{ + struct bareudp_net *bn = net_generic(net, bareudp_net_id); + struct bareudp_dev *bareudp, *next; + + list_for_each_entry_safe(bareudp, next, &bn->bareudp_list, next) + unregister_netdevice_queue(bareudp->dev, head); +} + +static void __net_exit bareudp_exit_batch_net(struct list_head *net_list) +{ + struct net *net; + LIST_HEAD(list); + + rtnl_lock(); + list_for_each_entry(net, net_list, exit_list) + bareudp_destroy_tunnels(net, &list); + + /* unregister the devices gathered above */ + unregister_netdevice_many(&list); + rtnl_unlock(); +} + +static struct pernet_operations bareudp_net_ops = { + .init = bareudp_init_net, + .exit_batch = bareudp_exit_batch_net, + .id = &bareudp_net_id, + .size = sizeof(struct bareudp_net), +}; + +static struct vport_ops ovs_bareudp_vport_ops; + +/** + * struct bareudp_port - Keeps track of open UDP ports + * @dst_port: destination port. + * @payload_ethertype: ethertype of the l3 traffic tunnelled + */ +struct bareudp_port { + u16 dst_port; + u16 payload_ethertype; +}; + +static inline struct bareudp_port *bareudp_vport(const struct vport *vport) +{ + return vport_priv(vport); +} + +static int bareudp_get_options(const struct vport *vport, + struct sk_buff *skb) +{ + struct bareudp_port *bareudp_port = bareudp_vport(vport); + + if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, bareudp_port->dst_port)) + return -EMSGSIZE; + + if (nla_put_u16(skb, OVS_TUNNEL_ATTR_PAYLOAD_ETHERTYPE, bareudp_port->dst_port)) + return -EMSGSIZE; + + return 0; +} + +static const struct nla_policy exts_policy[OVS_BAREUDP_EXT_MAX + 1] = { + [OVS_BAREUDP_EXT_MULTIPROTO_MODE] = { .type = NLA_FLAG, }, +}; + +static int bareudp_configure_exts(struct vport *vport, struct nlattr *attr, + struct bareudp_conf *conf) +{ + struct nlattr *exts[OVS_BAREUDP_EXT_MAX + 1]; + int err; + + if (nla_len(attr) < sizeof(struct nlattr)) + return -EINVAL; + + err = nla_parse_nested_deprecated(exts, OVS_BAREUDP_EXT_MAX, attr, + exts_policy, NULL); + if (err < 0) + return err; + + if (exts[OVS_BAREUDP_EXT_MULTIPROTO_MODE]) + conf->multi_proto_mode = true; + + return 0; +} + +static struct vport *bareudp_tnl_create(const struct vport_parms *parms) +{ + struct net *net = ovs_dp_get_net(parms->dp); + struct nlattr *options = parms->options; + struct bareudp_port *bareudp_port; + struct net_device *dev; + struct vport *vport; + struct bareudp_conf conf; + struct nlattr *a; + u16 ethertype; + u16 dst_port; + int err; + + if (!options) { + err = -EINVAL; + goto error; + } + + a = nla_find_nested(options, OVS_TUNNEL_ATTR_DST_PORT); + if (a && nla_len(a) == sizeof(u16)) { + dst_port = nla_get_u16(a); + } else { + /* Require destination port from userspace. */ + err = -EINVAL; + goto error; + } + + a = nla_find_nested(options, OVS_TUNNEL_ATTR_PAYLOAD_ETHERTYPE); + if (a && nla_len(a) == sizeof(u16)) { + ethertype = nla_get_u16(a); + } else { + /* Require destination port from userspace. */ + err = -EINVAL; + goto error; + } + + vport = ovs_vport_alloc(sizeof(struct bareudp_port), + &ovs_bareudp_vport_ops, parms); + if (IS_ERR(vport)) + return vport; + + a = nla_find_nested(options, OVS_TUNNEL_ATTR_EXTENSION); + if (a) { + err = bareudp_configure_exts(vport, a, &conf); + if (err) { + ovs_vport_free(vport); + goto error; + } + } + + bareudp_port = bareudp_vport(vport); + bareudp_port->dst_port = dst_port; + bareudp_port->payload_ethertype = ethertype; + + conf.ethertype = htons(ethertype); + conf.port = htons(dst_port); + + rtnl_lock(); + dev = bareudp_dev_create(net, parms->name, NET_NAME_USER, &conf); + if (IS_ERR(dev)) { + rtnl_unlock(); + ovs_vport_free(vport); + return ERR_CAST(dev); + } + + err = dev_change_flags(dev, dev->flags | IFF_UP, NULL); + if (err < 0) { + rtnl_delete_link(dev); + rtnl_unlock(); + ovs_vport_free(vport); + goto error; + } + + rtnl_unlock(); + return vport; +error: + return ERR_PTR(err); +} + +static struct vport *bareudp_create(const struct vport_parms *parms) +{ + struct vport *vport; + + vport = bareudp_tnl_create(parms); + if (IS_ERR(vport)) + return vport; + + return ovs_netdev_link(vport, parms->name); +} + +static struct vport_ops ovs_bareudp_vport_ops = { + .type = OVS_VPORT_TYPE_BAREUDP, + .create = bareudp_create, + .destroy = ovs_netdev_tunnel_destroy, + .get_options = bareudp_get_options, +#ifndef USE_UPSTREAM_TUNNEL + .fill_metadata_dst = bareudp_fill_metadata_dst, +#endif + .send = bareudp_xmit, +}; + +int rpl_bareudp_init_module(void) +{ + int rc; + + rc = register_pernet_subsys(&bareudp_net_ops); + if (rc) + goto out1; + + rc = rtnl_link_register(&bareudp_link_ops); + if (rc) + goto out2; + + pr_info("Bareudp tunneling driver\n"); + ovs_vport_ops_register(&ovs_bareudp_vport_ops); + return 0; +out2: + unregister_pernet_subsys(&bareudp_net_ops); +out1: + return rc; +} + +void rpl_bareudp_cleanup_module(void) +{ + ovs_vport_ops_unregister(&ovs_bareudp_vport_ops); + rtnl_link_unregister(&bareudp_link_ops); + unregister_pernet_subsys(&bareudp_net_ops); +} +#endif diff --git a/datapath/linux/compat/include/linux/if_link.h b/datapath/linux/compat/include/linux/if_link.h index bd77e33..d180085 100644 --- a/datapath/linux/compat/include/linux/if_link.h +++ b/datapath/linux/compat/include/linux/if_link.h @@ -61,6 +61,17 @@ enum { }; #define IFLA_LISP_MAX (__IFLA_LISP_MAX - 1) +enum { + IFLA_BAREUDP_UNSPEC, + IFLA_BAREUDP_PORT, + IFLA_BAREUDP_ETHERTYPE, + IFLA_BAREUDP_SRCPORT_MIN, + IFLA_BAREUDP_MULTIPROTO_MODE, + __IFLA_BAREUDP_MAX +}; + +#define IFLA_BAREUDP_MAX (__IFLA_BAREUDP_MAX - 1) + /* VXLAN section */ enum { #define IFLA_VXLAN_UNSPEC rpl_IFLA_VXLAN_UNSPEC diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index f7c3b2e..6b5b4d0 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -240,6 +240,7 @@ enum ovs_vport_type { OVS_VPORT_TYPE_GRE, /* GRE tunnel. */ OVS_VPORT_TYPE_VXLAN, /* VXLAN tunnel. */ OVS_VPORT_TYPE_GENEVE, /* Geneve tunnel. */ + OVS_VPORT_TYPE_BAREUDP, /* Bareudp tunnel. */ OVS_VPORT_TYPE_LISP = 105, /* LISP tunnel */ OVS_VPORT_TYPE_STT = 106, /* STT tunnel */ OVS_VPORT_TYPE_ERSPAN = 107, /* ERSPAN tunnel. */ @@ -308,12 +309,22 @@ enum { #define OVS_VXLAN_EXT_MAX (__OVS_VXLAN_EXT_MAX - 1) +enum { + OVS_BAREUDP_EXT_UNSPEC, + OVS_BAREUDP_EXT_MULTIPROTO_MODE, + /* place new values here to fill gap. */ + __OVS_BAREUDP_EXT_MAX, +}; + +#define OVS_BAREUDP_EXT_MAX (__OVS_BAREUDP_EXT_MAX - 1) + /* OVS_VPORT_ATTR_OPTIONS attributes for tunnels. */ enum { OVS_TUNNEL_ATTR_UNSPEC, OVS_TUNNEL_ATTR_DST_PORT, /* 16-bit UDP port, used by L4 tunnels. */ OVS_TUNNEL_ATTR_EXTENSION, + OVS_TUNNEL_ATTR_PAYLOAD_ETHERTYPE, /*Ethertype of l3 packet tunnelled */ __OVS_TUNNEL_ATTR_MAX }; diff --git a/datapath/linux/compat/include/net/bareudp.h b/datapath/linux/compat/include/net/bareudp.h new file mode 100644 index 0000000..888194f --- /dev/null +++ b/datapath/linux/compat/include/net/bareudp.h @@ -0,0 +1,59 @@ +#ifndef __NET_BAREUDP_WRAPPER_H +#define __NET_BAREUDP_WRAPPER_H 1 + +#ifdef CONFIG_INET +#include +#endif + + +#ifdef USE_UPSTREAM_TUNNEL +#include_next + +static inline int rpl_bareudp_init_module(void) +{ + return 0; +} +static inline void rpl_bareudp_cleanup_module(void) +{} + +#define bareudp_xmit dev_queue_xmit + +#ifdef CONFIG_INET +#ifdef HAVE_NAME_ASSIGN_TYPE +static inline struct net_device *rpl_bareudp_dev_create( + struct net *net, const char *name, u8 name_assign_type, struct bareudp_conf *conf) { + return bareudp_dev_create(net, name,name_assign_type, conf); +} +#define bareudp_dev_create rpl_bareudp_dev_create +#endif +#endif + +#else + +struct bareudp_conf { + __be16 ethertype; + __be16 port; + u16 sport_min; + bool multi_proto_mode; +}; + +#ifdef CONFIG_INET +#define bareudp_dev_create rpl_bareudp_dev_create +struct net_device *rpl_bareudp_dev_create(struct net *net, const char *name, + u8 name_assign_type, struct bareudp_conf *conf); +#endif /*ifdef CONFIG_INET */ + +int rpl_bareudp_init_module(void); +void rpl_bareudp_cleanup_module(void); + +#define bareudp_xmit rpl_bareudp_xmit +netdev_tx_t rpl_bareudp_xmit(struct sk_buff *skb); + +#endif +#define bareudp_init_module rpl_bareudp_init_module +#define bareudp_cleanup_module rpl_bareudp_cleanup_module + +#define bareudp_fill_metadata_dst ovs_bareudp_fill_metadata_dst +int ovs_bareudp_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb); + +#endif /*ifdef__NET_BAREUDP_H */ diff --git a/datapath/linux/compat/include/net/ip6_tunnel.h b/datapath/linux/compat/include/net/ip6_tunnel.h index e0a33a6..02e5713 100644 --- a/datapath/linux/compat/include/net/ip6_tunnel.h +++ b/datapath/linux/compat/include/net/ip6_tunnel.h @@ -188,6 +188,15 @@ int rpl_ip6_tnl_get_iflink(const struct net_device *dev); #define ip6_tnl_get_iflink rpl_ip6_tnl_get_iflink int rpl_ip6_tnl_change_mtu(struct net_device *dev, int new_mtu); #define ip6_tnl_change_mtu rpl_ip6_tnl_change_mtu +struct dst_entry *rpl_ip6_dst_lookup_tunnel(struct sk_buff *skb, + struct net_device *dev, + struct net *net, + struct socket *sock, + struct in6_addr *saddr, + const struct ip_tunnel_info *info, + u8 protocol, + bool use_cache); +#define ip6_dst_lookup_tunnel rpl_ip6_dst_lookup_tunnel static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb, struct net_device *dev) diff --git a/datapath/linux/compat/include/net/ip_tunnels.h b/datapath/linux/compat/include/net/ip_tunnels.h index 617a753..94db865 100644 --- a/datapath/linux/compat/include/net/ip_tunnels.h +++ b/datapath/linux/compat/include/net/ip_tunnels.h @@ -490,6 +490,13 @@ struct ip_tunnel *rpl_ip_tunnel_lookup(struct ip_tunnel_net *itn, __be32 remote, __be32 local, __be32 key); +#define ip_route_output_tunnel rpl_ip_route_output_tunnel +struct rtable *rpl_ip_route_output_tunnel(struct sk_buff *skb, + struct net_device *dev, + struct net *net, __be32 *saddr, + const struct ip_tunnel_info *info, + u8 protocol, bool use_cache); + static inline int iptunnel_pull_offloads(struct sk_buff *skb) { if (skb_is_gso(skb)) { diff --git a/datapath/linux/compat/ip6_tunnel.c b/datapath/linux/compat/ip6_tunnel.c index 984a51b..3b60505 100644 --- a/datapath/linux/compat/ip6_tunnel.c +++ b/datapath/linux/compat/ip6_tunnel.c @@ -175,6 +175,66 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev) return &dev->stats; } +struct dst_entry *rpl_ip6_dst_lookup_tunnel(struct sk_buff *skb, + struct net_device *dev, + struct net *net, + struct socket *sock, + struct in6_addr *saddr, + const struct ip_tunnel_info *info, + u8 protocol, + bool use_cache) +{ + struct dst_entry *dst = NULL; +#ifdef CONFIG_DST_CACHE + struct dst_cache *dst_cache; +#endif + struct flowi6 fl6; + __u8 prio; + +#ifdef CONFIG_DST_CACHE + dst_cache = (struct dst_cache *)&info->dst_cache; + if (use_cache) { + dst = dst_cache_get_ip6(dst_cache, saddr); + if (dst) + return dst; + } +#endif + memset(&fl6, 0, sizeof(fl6)); + fl6.flowi6_mark = skb->mark; + fl6.flowi6_proto = protocol; + fl6.daddr = info->key.u.ipv6.dst; + fl6.saddr = info->key.u.ipv6.src; + prio = info->key.tos; + fl6.flowlabel = ip6_make_flowinfo(RT_TOS(prio), + info->key.label); + +#ifdef HAVE_IPV6_DST_LOOKUP_NET + if (ipv6_stub->ipv6_dst_lookup(net, sock->sk, &dst, &fl6)) { +#else +#ifdef HAVE_IPV6_STUB + if (ipv6_stub->ipv6_dst_lookup(sock->sk, &dst, &fl6)) { +#else + if (ip6_dst_lookup(sock->sk, &dst, &fl6)) { +#endif +#endif + netdev_dbg(dev, "no route to %pI6\n", &fl6.daddr); + return ERR_PTR(-ENETUNREACH); + } + + if (dst->dev == dev) { /* is this necessary? */ + netdev_dbg(dev, "circular route to %pI6\n", &fl6.daddr); + dst_release(dst); + return ERR_PTR(-ELOOP); + } +#ifdef CONFIG_DST_CACHE + if (use_cache) + dst_cache_set_ip6(dst_cache, dst, &fl6.saddr); +#endif + *saddr = fl6.saddr; + return dst; +} +EXPORT_SYMBOL_GPL(rpl_ip6_dst_lookup_tunnel); + /** * ip6_tnl_lookup - fetch tunnel matching the end-point addresses * @remote: the address of the tunnel exit-point diff --git a/datapath/linux/compat/ip_tunnel.c b/datapath/linux/compat/ip_tunnel.c index e7a0393..85b9812 100644 --- a/datapath/linux/compat/ip_tunnel.c +++ b/datapath/linux/compat/ip_tunnel.c @@ -773,4 +773,51 @@ skip_key_lookup: } EXPORT_SYMBOL_GPL(rpl_ip_tunnel_lookup); +struct rtable *rpl_ip_route_output_tunnel(struct sk_buff *skb, + struct net_device *dev, + struct net *net, __be32 *saddr, + const struct ip_tunnel_info *info, + u8 protocol, bool use_cache) +{ +#ifdef CONFIG_DST_CACHE + struct dst_cache *dst_cache; +#endif + struct rtable *rt = NULL; + struct flowi4 fl4; + __u8 tos; + +#ifdef CONFIG_DST_CACHE + dst_cache = (struct dst_cache *)&info->dst_cache; + if (use_cache) { + rt = dst_cache_get_ip4(dst_cache, saddr); + if (rt) + return rt; + } +#endif + memset(&fl4, 0, sizeof(fl4)); + fl4.flowi4_mark = skb->mark; + fl4.flowi4_proto = protocol; + fl4.daddr = info->key.u.ipv4.dst; + fl4.saddr = info->key.u.ipv4.src; + tos = info->key.tos; + fl4.flowi4_tos = RT_TOS(tos); + + rt = ip_route_output_key(net, &fl4); + if (IS_ERR(rt)) { + netdev_dbg(dev, "no route to %pI4\n", &fl4.daddr); + return ERR_PTR(-ENETUNREACH); + } + if (rt->dst.dev == dev) { /* is this necessary? */ + netdev_dbg(dev, "circular route to %pI4\n", &fl4.daddr); + ip_rt_put(rt); + return ERR_PTR(-ELOOP); + } +#ifdef CONFIG_DST_CACHE + if (use_cache) + dst_cache_set_ip4(dst_cache, &rt->dst, fl4.saddr); +#endif + *saddr = fl4.saddr; + return rt; +} +EXPORT_SYMBOL_GPL(rpl_ip_route_output_tunnel); #endif diff --git a/datapath/vport.c b/datapath/vport.c index f929282..84c95d3 100644 --- a/datapath/vport.c +++ b/datapath/vport.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "datapath.h" #include "gso.h" @@ -77,7 +78,7 @@ int ovs_vport_init(void) } err = ipgre_init(); - if (err && err != -EEXIST) + if (err && err != -EEXIST) goto err_ipgre; compat_gre_loaded = true; } @@ -108,7 +109,14 @@ skip_ip6_tunnel_init: if (err) goto err_stt; + err = bareudp_init_module(); + if (err) + goto err_bareudp; + return 0; + bareudp_cleanup_module(); + +err_bareudp: ovs_stt_cleanup_module(); err_stt: vxlan_cleanup_module(); @@ -140,6 +148,7 @@ void ovs_vport_exit(void) gre_exit(); ipgre_fini(); } + bareudp_cleanup_module(); ovs_stt_cleanup_module(); vxlan_cleanup_module(); geneve_cleanup_module(); diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c index fd157ce..283f32a 100644 --- a/lib/dpif-netlink-rtnl.c +++ b/lib/dpif-netlink-rtnl.c @@ -58,6 +58,18 @@ VLOG_DEFINE_THIS_MODULE(dpif_netlink_rtnl); #define IFLA_GENEVE_UDP_ZERO_CSUM6_RX 10 #endif +#ifndef __IFLA_BAREUDP_MAX +#define IFLA_BAREUDP_MAX 0 +#endif +#if IFLA_BAREUDP_MAX < 4 +#define IFLA_BAREUDP_PORT 1 +#define IFLA_BAREUDP_ETHERTYPE 2 +#define IFLA_BAREUDP_SRCPORT_MIN 3 +#define IFLA_BAREUDP_MULTIPROTO_MODE 4 +#endif + +#define BAREUDP_MPLS_SRCPORT_MIN 49153 + static const struct nl_policy rtlink_policy[] = { [IFLA_LINKINFO] = { .type = NL_A_NESTED }, }; @@ -81,6 +93,10 @@ static const struct nl_policy geneve_policy[] = { [IFLA_GENEVE_UDP_ZERO_CSUM6_RX] = { .type = NL_A_U8 }, [IFLA_GENEVE_PORT] = { .type = NL_A_U16 }, }; +static const struct nl_policy bareudp_policy[] = { + [IFLA_BAREUDP_PORT] = { .type = NL_A_U16 }, + [IFLA_BAREUDP_ETHERTYPE] = { .type = NL_A_U16 }, +}; static const char * vport_type_to_kind(enum ovs_vport_type type, @@ -113,6 +129,8 @@ vport_type_to_kind(enum ovs_vport_type type, } case OVS_VPORT_TYPE_GTPU: return NULL; + case OVS_VPORT_TYPE_BAREUDP: + return "bareudp"; case OVS_VPORT_TYPE_NETDEV: case OVS_VPORT_TYPE_INTERNAL: case OVS_VPORT_TYPE_LISP: @@ -243,6 +261,24 @@ dpif_netlink_rtnl_geneve_verify(const struct netdev_tunnel_config *tnl_cfg, return err; } +static int +dpif_netlink_rtnl_bareudp_verify(const struct netdev_tunnel_config *tnl_cfg, + const char *kind, struct ofpbuf *reply) +{ + struct nlattr *bareudp[ARRAY_SIZE(bareudp_policy)]; + int err; + + err = rtnl_policy_parse(kind, reply, bareudp_policy, bareudp, + ARRAY_SIZE(bareudp_policy)); + if (!err) { + if ((tnl_cfg->dst_port != nl_attr_get_be16(bareudp[IFLA_BAREUDP_PORT])) + || (tnl_cfg->payload_ethertype + != nl_attr_get_be16(bareudp[IFLA_BAREUDP_ETHERTYPE]))) { + err = EINVAL; + } + } + return err; +} static int dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg, @@ -275,6 +311,9 @@ dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg, case OVS_VPORT_TYPE_GENEVE: err = dpif_netlink_rtnl_geneve_verify(tnl_cfg, kind, reply); break; + case OVS_VPORT_TYPE_BAREUDP: + err = dpif_netlink_rtnl_bareudp_verify(tnl_cfg, kind, reply); + break; case OVS_VPORT_TYPE_NETDEV: case OVS_VPORT_TYPE_INTERNAL: case OVS_VPORT_TYPE_LISP: @@ -362,6 +401,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg, case OVS_VPORT_TYPE_LISP: case OVS_VPORT_TYPE_STT: case OVS_VPORT_TYPE_GTPU: + case OVS_VPORT_TYPE_BAREUDP: + nl_msg_put_be16(&request, IFLA_BAREUDP_ETHERTYPE, + tnl_cfg->payload_ethertype); + if ((tnl_cfg->payload_ethertype == htons(ETH_TYPE_MPLS)) || + (tnl_cfg->payload_ethertype == htons(ETH_TYPE_MPLS_MCAST))) { + nl_msg_put_be16(&request, IFLA_BAREUDP_SRCPORT_MIN, + BAREUDP_MPLS_SRCPORT_MIN); + } + nl_msg_put_be16(&request, IFLA_BAREUDP_PORT, tnl_cfg->dst_port); + if (tnl_cfg->exts & (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE)) { + nl_msg_put_flag(&request, IFLA_BAREUDP_MULTIPROTO_MODE); + } + break; case OVS_VPORT_TYPE_UNSPEC: case __OVS_VPORT_TYPE_MAX: default: @@ -470,6 +522,7 @@ dpif_netlink_rtnl_port_destroy(const char *name, const char *type) case OVS_VPORT_TYPE_ERSPAN: case OVS_VPORT_TYPE_IP6ERSPAN: case OVS_VPORT_TYPE_IP6GRE: + case OVS_VPORT_TYPE_BAREUDP: return dpif_netlink_rtnl_destroy(name); case OVS_VPORT_TYPE_NETDEV: case OVS_VPORT_TYPE_INTERNAL: diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index dc64210..6822bf5 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -748,6 +748,9 @@ get_vport_type(const struct dpif_netlink_vport *vport) case OVS_VPORT_TYPE_GTPU: return "gtpu"; + case OVS_VPORT_TYPE_BAREUDP: + return "bareudp"; + case OVS_VPORT_TYPE_UNSPEC: case __OVS_VPORT_TYPE_MAX: break; @@ -783,6 +786,8 @@ netdev_to_ovs_vport_type(const char *type) return OVS_VPORT_TYPE_GRE; } else if (!strcmp(type, "gtpu")) { return OVS_VPORT_TYPE_GTPU; + } else if (!strcmp(type, "bareudp")) { + return OVS_VPORT_TYPE_BAREUDP; } else { return OVS_VPORT_TYPE_UNSPEC; } @@ -907,6 +912,11 @@ dpif_netlink_port_add_compat(struct dpif_netlink *dpif, struct netdev *netdev, nl_msg_put_u16(&options, OVS_TUNNEL_ATTR_DST_PORT, ntohs(tnl_cfg->dst_port)); } + if (tnl_cfg->payload_ethertype) { + nl_msg_put_u16(&options, OVS_TUNNEL_ATTR_PAYLOAD_ETHERTYPE, + ntohs(tnl_cfg->payload_ethertype)); + } + if (tnl_cfg->exts) { size_t ext_ofs; int i; diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index 8efd1ee..1e40cfa 100644 --- a/lib/netdev-vport.c +++ b/lib/netdev-vport.c @@ -112,7 +112,7 @@ netdev_vport_needs_dst_port(const struct netdev *dev) return (class->get_config == get_tunnel_config && (!strcmp("geneve", type) || !strcmp("vxlan", type) || !strcmp("lisp", type) || !strcmp("stt", type) || - !strcmp("gtpu", type))); + !strcmp("gtpu", type) || !strcmp("bareudp",type))); } const char * @@ -219,6 +219,8 @@ netdev_vport_construct(struct netdev *netdev_) dev->tnl_cfg.dst_port = port ? htons(port) : htons(STT_DST_PORT); } else if (!strcmp(type, "gtpu")) { dev->tnl_cfg.dst_port = port ? htons(port) : htons(GTPU_DST_PORT); + } else if (!strcmp(type, "bareudp")) { + dev->tnl_cfg.dst_port = htons(port); } dev->tnl_cfg.dont_fragment = true; @@ -438,6 +440,8 @@ tunnel_supported_layers(const char *type, return TNL_L2 | TNL_L3; } else if (!strcmp(type, "gtpu")) { return TNL_L3; + } else if (!strcmp(type, "bareudp")) { + return TNL_L3; } else { return TNL_L2; } @@ -745,6 +749,16 @@ set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp) goto out; } } + } else if (!strcmp(node->key, "payload_type")) { + if (strcmp(node->key, "mpls")) { + tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS); + tnl_cfg.exts |= (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE); + } else if ((strcmp(node->key, "ip"))) { + tnl_cfg.payload_ethertype = htons(ETH_TYPE_IP); + tnl_cfg.exts |= (1 << OVS_BAREUDP_EXT_MULTIPROTO_MODE); + } else { + tnl_cfg.payload_ethertype = htons(atoi(node->value)); + } } else { ds_put_format(&errors, "%s: unknown %s argument '%s'\n", name, type, node->key); @@ -1243,7 +1257,14 @@ netdev_vport_tunnel_register(void) }, {{NULL, NULL, 0, 0}} }, - + { "udp_sys", + { + TUNNEL_FUNCTIONS_COMMON, + .type = "bareudp", + .get_ifindex = NETDEV_VPORT_GET_IFINDEX, + }, + {{NULL, NULL, 0, 0}} + }, }; static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; diff --git a/lib/netdev.h b/lib/netdev.h index fdbe0e1..f15bca5 100644 --- a/lib/netdev.h +++ b/lib/netdev.h @@ -107,6 +107,7 @@ struct netdev_tunnel_config { bool out_key_flow; ovs_be64 out_key; + ovs_be16 payload_ethertype; ovs_be16 dst_port; bool ip_src_flow; diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 80fba84..ea88342 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3573,6 +3573,7 @@ propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, struct eth_addr dmac, case OVS_VPORT_TYPE_VXLAN: case OVS_VPORT_TYPE_GENEVE: case OVS_VPORT_TYPE_GTPU: + case OVS_VPORT_TYPE_BAREUDP: nw_proto = IPPROTO_UDP; break; case OVS_VPORT_TYPE_LISP: diff --git a/tests/system-layer3-tunnels.at b/tests/system-layer3-tunnels.at index 1232964..5d9ea93 100644 --- a/tests/system-layer3-tunnels.at +++ b/tests/system-layer3-tunnels.at @@ -152,3 +152,50 @@ AT_CHECK([tail -1 stdout], [0], OVS_VSWITCHD_STOP AT_CLEANUP + +AT_SETUP([layer3 - ping over MPLS Bareudp]) +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])]) +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "36:b1:ee:7c:01:01") +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24", "36:b1:ee:7c:01:02") + +ADD_OVS_TUNNEL([bareudp], [br0], [at_bareudp0], [8.1.1.3], [8.1.1.2/24], + [ options:local_ip=8.1.1.2 options:packet_type="legacy_l3" options:payload_type=mpls options:dst_port=6635]) + +ADD_OVS_TUNNEL([bareudp], [br1], [at_bareudp1], [8.1.1.2], [8.1.1.3/24], + [options:local_ip=8.1.1.3 options:packet_type="legacy_l3" options:payload_type=mpls options:dst_port=6635]) + +AT_DATA([flows0.txt], [dnl +table=0,priority=100,dl_type=0x0800 actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp0 +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp0 actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:01->dl_dst,set_field:36:b1:ee:7c:01:02->dl_src,output:ovs-p0 +table=0,priority=10 actions=normal +]) + +AT_DATA([flows1.txt], [dnl +table=0,priority=100,dl_type=0x0800 actions=push_mpls:0x8847,set_mpls_label:3,output:at_bareudp1 +table=0,priority=100,dl_type=0x8847 in_port=at_bareudp1 actions=pop_mpls:0x0800,set_field:36:b1:ee:7c:01:02->dl_dst,set_field:36:b1:ee:7c:01:01->dl_src,output:ovs-p1 +table=0,priority=10 actions=normal +]) + +AT_CHECK([ip link add patch0 type veth peer name patch1]) +on_exit 'ip link del patch0' + +AT_CHECK([ip link set dev patch0 up]) +AT_CHECK([ip link set dev patch1 up]) +AT_CHECK([ovs-vsctl add-port br0 patch0]) +AT_CHECK([ovs-vsctl add-port br1 patch1]) + + +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br0 flows0.txt]) +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br1 flows1.txt]) + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP