From patchwork Thu Mar 22 22:07:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Rose X-Patchwork-Id: 889693 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ETmkmF7n"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 406gzY4cbbz9s0y for ; Fri, 23 Mar 2018 09:18:17 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id F2AF5F9A; Thu, 22 Mar 2018 22:08:32 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id F121CF9C for ; Thu, 22 Mar 2018 22:08:27 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg0-f66.google.com (mail-pg0-f66.google.com [74.125.83.66]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id D9B27360 for ; Thu, 22 Mar 2018 22:08:25 +0000 (UTC) Received: by mail-pg0-f66.google.com with SMTP id t12so2045803pgp.13 for ; Thu, 22 Mar 2018 15:08:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=7sJW5srIeZ/eB2bmdqL5eQVfLhs+lFkjQ/ingYyv3MM=; b=ETmkmF7nDVfAsS1XMjpHZMl75Mvth1dLqQRiAjmSIEdkjufkXzCxjHo6SJPO62pmaX PxD/FjqURx8Yuywwgn4LIhbgDAJczkPFdVPzzpmK4UWI+zHBul32J7o79IYtSTPP94fb 6WhP4kqF7kAJCTTLBM2TV39/m1METzrQ9ZU+5gSW0BzP4nlpQUvnbMS37wkzK/BWNxZ0 1zAZdnYKxxGwFWWpbdglNxT5f7RWCmxVeGS5o32RMSXUDslAuT66emAsoa5wpk0FLzla YdHB+dF2KxaRqKwMQoWE8emEcm+70vpNTekGorb80uXIt8g5h5yW3ljG/OZFYAzBBKO8 3zLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7sJW5srIeZ/eB2bmdqL5eQVfLhs+lFkjQ/ingYyv3MM=; b=HVzJoJP6l07BFpf+bofgoAUYHfQbAQtkm33+FXePxiWuUubpLyftICkE19CuP3fbJX /Yn1JnegQKikBunmd58rhzdxbIDpLOOk/+D6l40iE3iY/b9iU7uM5QnE+qgePBS4zAJe j5JIv/QE8FeZ0fHqGSbrD0EEFMk1z1zEOjsu4ccKYuXVaZDJ2DIwMgu89kookVqX2phN O8+zTU3MZdBy9z9HtFnDyprDQngE0H3USs1maWXi5c3fW48PCzXskBC1nppiGUZbMCU4 a+KYOgKBCXRW2CbDPTiovRc+yaXClICUC97QOWhb/5nZ5cSkwPAF/JpH+/nIrUeUgSwv mLRQ== X-Gm-Message-State: AElRT7HPqc80OPOKprA0fyr4iJwU0lA9fhF217IxJNxU22lUWWHotLNv SyOgaLW+GE5d7IboEdhYTTd5Pw== X-Google-Smtp-Source: AG47ELspSvXlkKaNQz8kFR2oBKfRNvqDOiNiJllVRKfFaKA1Qk4xlEVcqpKqUHfQ3D79LsHyfv6jpw== X-Received: by 10.101.99.17 with SMTP id g17mr19522982pgv.48.1521756504301; Thu, 22 Mar 2018 15:08:24 -0700 (PDT) Received: from gizo.domain (184-100-240-187.ptld.qwest.net. [184.100.240.187]) by smtp.gmail.com with ESMTPSA id 4sm2280775pfn.32.2018.03.22.15.08.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Mar 2018 15:08:23 -0700 (PDT) From: Greg Rose To: dev@openvswitch.org Date: Thu, 22 Mar 2018 15:07:41 -0700 Message-Id: <1521756461-3870-26-git-send-email-gvrose8192@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1521756461-3870-1-git-send-email-gvrose8192@gmail.com> References: <1521756461-3870-1-git-send-email-gvrose8192@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [ERSPAN RFC 25/25] datapath: More ipgre fixes X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org After the initial backport of gre/ipgre/erspan code in the first patch of this series and as continued development and debugging occurred a number of fixes were made as well as more code being pulled in from upstream. This is a big, ugly patch with all of that stuff... For now let's pretend this patch doesn't really exist. All of this code will be folded into previous patches and this patch series re-ordered. Signed-off-by: Greg Rose --- acinclude.m4 | 6 +- datapath/linux/compat/gre.c | 27 +- datapath/linux/compat/include/net/dst_metadata.h | 23 +- datapath/linux/compat/include/net/ip_tunnels.h | 3 +- datapath/linux/compat/ip6_gre.c | 81 ++- datapath/linux/compat/ip_gre.c | 692 +++++++++++++++++++---- datapath/linux/compat/ip_tunnel.c | 93 ++- datapath/vport.c | 2 + 8 files changed, 764 insertions(+), 163 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index e6de83f..9cde520 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -814,9 +814,6 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/net/inet_frag.h], frag_percpu_counter_batch[], [OVS_DEFINE([HAVE_FRAG_PERCPU_COUNTER_BATCH])]) - OVS_FIND_FIELD_IFELSE([$KSRC/include/net/ip_tunnels.h], [tnl_ptk_info], - [hdr_len], - [OVS_DEFINE([HAVE_TNL_PTK_INFO_HDR_LEN])]) OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [null_compute_pseudo], [OVS_DEFINE([HAVE_NULL_COMPUTE_PSEUDO])]) @@ -862,6 +859,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/uapi/linux/if_tunnel.h], [IFLA_IPTUN_COLLECT_METADATA], [OVS_DEFINE([HAVE_IFLA_IPTUN_COLLECT_METADATA])]) + OVS_GREP_IFELSE([$KSRC/net/ipv4/gre_demux.c], + [parse_gre_header], + [OVS_DEFINE([HAVE_DEMUX_PARSE_GRE_HEADER])]) if cmp -s datapath/linux/kcompat.h.new \ datapath/linux/kcompat.h >/dev/null 2>&1; then diff --git a/datapath/linux/compat/gre.c b/datapath/linux/compat/gre.c index 6a6dc91..03ce91d 100644 --- a/datapath/linux/compat/gre.c +++ b/datapath/linux/compat/gre.c @@ -57,9 +57,9 @@ static int rpl_ip_gre_calc_hlen(__be16 o_flags) } #ifndef HAVE_GRE_HANDLE_OFFLOADS - #ifndef HAVE_GRE_CISCO_REGISTER +#ifdef HAVE_DEMUX_PARSE_GRE_HEADER static __sum16 check_checksum(struct sk_buff *skb) { __sum16 csum = 0; @@ -99,13 +99,12 @@ static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, tpi->flags = gre_flags_to_tnl_flags(greh->flags); hdr_len = ip_gre_calc_hlen(tpi->flags); + tpi->hdr_len = hdr_len; + tpi->proto = greh->protocol; if (!pskb_may_pull(skb, hdr_len)) return -EINVAL; - greh = (struct gre_base_hdr *)(skb_network_header(skb) + ip_hlen); - tpi->proto = greh->protocol; - options = (__be32 *)(greh + 1); if (greh->flags & GRE_CSUM) { if (check_checksum(skb)) { @@ -143,20 +142,25 @@ static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, return iptunnel_pull_header(skb, hdr_len, tpi->proto, false); } +#endif /* HAVE_DEMUX_PARSE_GRE_HEADER */ + static struct gre_cisco_protocol __rcu *gre_cisco_proto; static int gre_cisco_rcv(struct sk_buff *skb) { struct tnl_ptk_info tpi; - bool csum_err = false; struct gre_cisco_protocol *proto; rcu_read_lock(); proto = rcu_dereference(gre_cisco_proto); if (!proto) goto drop; - - if (parse_gre_header(skb, &tpi, &csum_err) < 0) - goto drop; +#ifdef HAVE_DEMUX_PARSE_GRE_HEADER + { + bool csum_err = false; + if (parse_gre_header(skb, &tpi, &csum_err) < 0) + goto drop; + } +#endif proto->handler(skb, &tpi); rcu_read_unlock(); return 0; @@ -288,10 +292,11 @@ int rpl_gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, } else { tpi->seq = 0; } + /* WCCP version 1 and 2 protocol decoding. - * * - Change protocol to IPv4/IPv6 - * * - When dealing with WCCPv2, Skip extra 4 bytes in GRE header - * */ + * - Change protocol to IPv4/IPv6 + * - When dealing with WCCPv2, Skip extra 4 bytes in GRE header + */ if (greh->flags == 0 && tpi->proto == htons(ETH_P_WCCP)) { tpi->proto = proto; if ((*(u8 *)options & 0xF0) != 0x40) diff --git a/datapath/linux/compat/include/net/dst_metadata.h b/datapath/linux/compat/include/net/dst_metadata.h index e401eb4..93ea954 100644 --- a/datapath/linux/compat/include/net/dst_metadata.h +++ b/datapath/linux/compat/include/net/dst_metadata.h @@ -1,6 +1,11 @@ #ifndef __NET_DST_METADATA_WRAPPER_H #define __NET_DST_METADATA_WRAPPER_H 1 +enum metadata_type { + METADATA_IP_TUNNEL, + METADATA_HW_PORT_MUX, +}; + #ifdef USE_UPSTREAM_TUNNEL #include_next #else @@ -11,19 +16,26 @@ #include #include +struct hw_port_info { + struct net_device *lower_dev; + u32 port_id; +}; + struct metadata_dst { - unsigned long dst; + struct dst_entry dst; + enum metadata_type type; union { struct ip_tunnel_info tun_info; + struct hw_port_info port_info; } u; }; static void __metadata_dst_init(struct metadata_dst *md_dst, u8 optslen) { - unsigned long *dst; + struct dst_entry *dst; dst = &md_dst->dst; - *dst = 0; + #if 0 dst_init(dst, &md_dst_ops, NULL, 1, DST_OBSOLETE_NONE, DST_METADATA | DST_NOCACHE | DST_NOCOUNT); @@ -105,11 +117,6 @@ void ovs_ip_tunnel_rcv(struct net_device *dev, struct sk_buff *skb, struct metadata_dst *tun_dst); #ifndef HAVE_METADATA_DST_ALLOC_WITH_METADATA_TYPE -enum metadata_type { - METADATA_IP_TUNNEL, - METADATA_HW_PORT_MUX, -}; - static inline struct metadata_dst * rpl_metadata_dst_alloc(u8 optslen, enum metadata_type type, gfp_t flags) { diff --git a/datapath/linux/compat/include/net/ip_tunnels.h b/datapath/linux/compat/include/net/ip_tunnels.h index fb70f9d..9b2621e 100644 --- a/datapath/linux/compat/include/net/ip_tunnels.h +++ b/datapath/linux/compat/include/net/ip_tunnels.h @@ -99,13 +99,12 @@ struct tnl_ptk_info { __be16 proto; __be32 key; __be32 seq; -#ifndef HAVE_TNL_PTK_INFO_HDR_LEN int hdr_len; -#endif }; #define PACKET_RCVD 0 #define PACKET_REJECT 1 +#define PACKET_NEXT 2 #endif #define IP_TNL_HASH_BITS 7 diff --git a/datapath/linux/compat/ip6_gre.c b/datapath/linux/compat/ip6_gre.c index 2832901..1057539 100644 --- a/datapath/linux/compat/ip6_gre.c +++ b/datapath/linux/compat/ip6_gre.c @@ -59,6 +59,8 @@ #include #include +#include "vport-netdev.h" + #define IP6_GRE_HASH_SIZE_SHIFT 5 #define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT) @@ -583,6 +585,7 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, return PACKET_REJECT; } + static int gre_rcv(struct sk_buff *skb) { struct tnl_ptk_info tpi; @@ -1073,6 +1076,11 @@ tx_err: return NETDEV_TX_OK; } +static netdev_tx_t __ip6erspan_tunnel_xmit(struct sk_buff *skb) +{ + return ip6erspan_tunnel_xmit(skb, skb->dev); +} + static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu) { struct net_device *dev = t->dev; @@ -2252,6 +2260,60 @@ static struct rtnl_link_ops ip6erspan_tap_ops __read_mostly = { #endif }; +static struct vport_ops ovs_erspan6_vport_ops; + +static struct vport *erspan6_tnl_create(const struct vport_parms *parms) +{ + struct net *net = ovs_dp_get_net(parms->dp); + struct net_device *dev; + struct vport *vport; + int err; + + vport = ovs_vport_alloc(0, &ovs_erspan6_vport_ops, parms); + if (IS_ERR(vport)) + return vport; + + rtnl_lock(); + dev = gretap_fb_dev_create(net, parms->name, NET_NAME_USER); + if (IS_ERR(dev)) { + rtnl_unlock(); + ovs_vport_free(vport); + return ERR_CAST(dev); + } + + err = dev_change_flags(dev, dev->flags | IFF_UP); + if (err < 0) { + rtnl_delete_link(dev); + rtnl_unlock(); + ovs_vport_free(vport); + return ERR_PTR(err); + } + + rtnl_unlock(); + return vport; +} + +static struct vport *erspan6_create(const struct vport_parms *parms) +{ + struct vport *vport; + + vport = erspan6_tnl_create(parms); + if (IS_ERR(vport)) + return vport; + + return ovs_netdev_link(vport, parms->name); +} + +static struct vport_ops ovs_erspan6_vport_ops = { + .type = OVS_VPORT_TYPE_IP6ERSPAN, + .create = erspan6_create, + .send = __ip6erspan_tunnel_xmit, +#ifndef USE_UPSTREAM_TUNNEL + .fill_metadata_dst = gre_fill_metadata_dst, +#endif + .destroy = ovs_netdev_tunnel_destroy, +}; + /* * And now the modules code and kernel interface. */ @@ -2260,8 +2322,6 @@ int rpl_ip6gre_init(void) { int err; - pr_info("GRE over IPv6 tunneling driver\n"); - err = register_pernet_device(&ip6gre_net_ops); if (err < 0) return err; @@ -2272,26 +2332,16 @@ int rpl_ip6gre_init(void) goto add_proto_failed; } - err = rtnl_link_register(&ip6gre_link_ops); - if (err < 0) - goto rtnl_link_failed; - err = rtnl_link_register(&ip6gre_tap_ops); if (err < 0) goto tap_ops_failed; - err = rtnl_link_register(&ip6erspan_tap_ops); - if (err < 0) - goto erspan_link_failed; - + pr_info("GRE over IPv6 tunneling driver\n"); + return ovs_vport_ops_register(&ovs_erspan6_vport_ops); out: return err; -erspan_link_failed: - rtnl_link_unregister(&ip6gre_tap_ops); tap_ops_failed: - rtnl_link_unregister(&ip6gre_link_ops); -rtnl_link_failed: inet6_del_protocol(&ip6gre_protocol, IPPROTO_GRE); add_proto_failed: unregister_pernet_device(&ip6gre_net_ops); @@ -2300,9 +2350,8 @@ add_proto_failed: void rpl_ip6gre_fini(void) { + ovs_vport_ops_unregister(&ovs_erspan6_vport_ops); rtnl_link_unregister(&ip6gre_tap_ops); - rtnl_link_unregister(&ip6gre_link_ops); - rtnl_link_unregister(&ip6erspan_tap_ops); inet6_del_protocol(&ip6gre_protocol, IPPROTO_GRE); unregister_pernet_device(&ip6gre_net_ops); } diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c index 08d3351..59a9fb3 100644 --- a/datapath/linux/compat/ip_gre.c +++ b/datapath/linux/compat/ip_gre.c @@ -64,11 +64,14 @@ #include "vport-netdev.h" static int gre_tap_net_id __read_mostly; +static int ipgre_net_id __read_mostly; static unsigned int erspan_net_id __read_mostly; static void erspan_build_header(struct sk_buff *skb, __be32 id, u32 index, bool truncate, bool is_ipv4); +static struct rtnl_link_ops ipgre_link_ops __read_mostly; + #define ip_gre_calc_hlen rpl_ip_gre_calc_hlen static int ip_gre_calc_hlen(__be16 o_flags) { @@ -83,15 +86,6 @@ static int ip_gre_calc_hlen(__be16 o_flags) return addend; } -static __be64 key_to_tunnel_id(__be32 key) -{ -#ifdef __BIG_ENDIAN - return (__force __be64)((__force u32)key); -#else - return (__force __be64)((__force u64)key << 32); -#endif -} - /* Returns the least-significant 32 bits of a __be64. */ static __be32 tunnel_id_to_key(__be64 x) { @@ -109,6 +103,89 @@ static int gre_err(struct sk_buff *skb, u32 info, return PACKET_REJECT; } +static struct dst_ops md_dst_ops = { + .family = AF_UNSPEC, +}; + +#ifndef DST_METADATA +#define DST_METADATA 0x0080 +#endif + +static void rpl__metadata_dst_init(struct metadata_dst *md_dst, + enum metadata_type type, u8 optslen) + +{ + struct dst_entry *dst; + + dst = &md_dst->dst; + dst_init(dst, &md_dst_ops, NULL, 1, DST_OBSOLETE_NONE, + DST_METADATA | DST_NOCOUNT); + +#if 0 + /* unused in OVS */ + dst->input = dst_md_discard; + dst->output = dst_md_discard_out; +#endif + memset(dst + 1, 0, sizeof(*md_dst) + optslen - sizeof(*dst)); + md_dst->type = type; +} + +struct metadata_dst *erspan_rpl_metadata_dst_alloc(u8 optslen, enum metadata_type type, + gfp_t flags) +{ + struct metadata_dst *md_dst; + + md_dst = kmalloc(sizeof(*md_dst) + optslen, flags); + if (!md_dst) + return NULL; + + rpl__metadata_dst_init(md_dst, type, optslen); + + return md_dst; +} +static inline struct metadata_dst *rpl_tun_rx_dst(int md_size) +{ + struct metadata_dst *tun_dst; + + tun_dst = erspan_rpl_metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC); + if (!tun_dst) + return NULL; + + tun_dst->u.tun_info.options_len = 0; + tun_dst->u.tun_info.mode = 0; + return tun_dst; +} +static inline struct metadata_dst *rpl__ip_tun_set_dst(__be32 saddr, + __be32 daddr, + __u8 tos, __u8 ttl, + __be16 tp_dst, + __be16 flags, + __be64 tunnel_id, + int md_size) +{ + struct metadata_dst *tun_dst; + + tun_dst = rpl_tun_rx_dst(md_size); + if (!tun_dst) + return NULL; + + ip_tunnel_key_init(&tun_dst->u.tun_info.key, + saddr, daddr, tos, ttl, + 0, 0, tp_dst, tunnel_id, flags); + return tun_dst; +} + +static inline struct metadata_dst *rpl_ip_tun_rx_dst(struct sk_buff *skb, + __be16 flags, + __be64 tunnel_id, + int md_size) +{ + const struct iphdr *iph = ip_hdr(skb); + + return rpl__ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl, + 0, flags, tunnel_id, md_size); +} + static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, int gre_hdr_len) { @@ -138,9 +215,16 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, * Use ERSPAN 10-bit session ID as key. */ tpi->key = cpu_to_be32(get_session_id(ershdr)); + /* OVS doesn't set tunnel key - so don't bother with it */ +#if 0 tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags | TUNNEL_KEY, iph->saddr, iph->daddr, tpi->key); +#else + tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, + tpi->flags, + iph->saddr, iph->daddr, 0); +#endif if (tunnel) { len = gre_hdr_len + erspan_hdr_len(ver); @@ -166,8 +250,9 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, flags = tpi->flags; tun_id = key32_to_tunnel_id(tpi->key); - ovs_ip_tun_rx_dst(tun_dst, skb, flags, - tun_id, sizeof(*md)); + tun_dst = rpl_ip_tun_rx_dst(skb, flags, tun_id, sizeof(*md)); + if (!tun_dst) + return PACKET_REJECT; md = ip_tunnel_info_opts(&tun_dst->u.tun_info); md->version = ver; @@ -182,6 +267,7 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, skb_reset_mac_header(skb); ovs_ip_tunnel_rcv(tunnel->dev, skb, tun_dst); + kfree(tun_dst); return PACKET_RCVD; } drop: @@ -189,48 +275,68 @@ drop: return PACKET_RCVD; } -static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) + +static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi, + struct ip_tunnel_net *itn, int hdr_len, bool raw_proto) { - struct net *net = dev_net(skb->dev); struct metadata_dst tun_dst; - struct ip_tunnel_net *itn; const struct iphdr *iph; struct ip_tunnel *tunnel; - if (tpi->proto != htons(ETH_P_TEB)) - return PACKET_REJECT; - - itn = net_generic(net, gre_tap_net_id); - iph = ip_hdr(skb); - tunnel = rcu_dereference(itn->collect_md_tun); + tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags, + iph->saddr, iph->daddr, tpi->key); + if (tunnel) { - __be16 flags; - __be64 tun_id; - int err; + if (__iptunnel_pull_header(skb, hdr_len, tpi->proto, + raw_proto, false) < 0) + goto drop; - if (iptunnel_pull_offloads(skb)) - return PACKET_REJECT; + if (tunnel->dev->type != ARPHRD_NONE) + skb_pop_mac_header(skb); + else + skb_reset_mac_header(skb); + if (tunnel->collect_md) { + __be16 flags; + __be64 tun_id; - skb_pop_mac_header(skb); - flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY); - tun_id = key_to_tunnel_id(tpi->key); - ovs_ip_tun_rx_dst(&tun_dst, skb, flags, tun_id, 0); - - skb_reset_network_header(skb); - err = IP_ECN_decapsulate(iph, skb); - if (unlikely(err)) { - if (err > 1) { - ++tunnel->dev->stats.rx_frame_errors; - ++tunnel->dev->stats.rx_errors; - return PACKET_REJECT; - } + flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY); + tun_id = key32_to_tunnel_id(tpi->key); + ovs_ip_tun_rx_dst(&tun_dst, skb, flags, tun_id, 0); } ovs_ip_tunnel_rcv(tunnel->dev, skb, &tun_dst); return PACKET_RCVD; } - return PACKET_REJECT; + return PACKET_NEXT; + +drop: + kfree_skb(skb); + return PACKET_RCVD; +} + + +static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi, + int hdr_len) +{ + struct net *net = dev_net(skb->dev); + struct ip_tunnel_net *itn; + int res; + + if (tpi->proto == htons(ETH_P_TEB)) + itn = net_generic(net, gre_tap_net_id); + else + itn = net_generic(net, ipgre_net_id); + + res = __ipgre_rcv(skb, tpi, itn, hdr_len, false); + if (res == PACKET_NEXT && tpi->proto == htons(ETH_P_TEB)) { + /* ipgre tunnels in collect metadata mode should receive + * also ETH_P_TEB traffic. + */ + itn = net_generic(net, ipgre_net_id); + res = __ipgre_rcv(skb, tpi, itn, hdr_len, true); + } + return res; } static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, @@ -253,6 +359,7 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol); } +#ifndef HAVE_DEMUX_PARSE_GRE_HEADER static int gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *unused_tpi) { struct tnl_ptk_info tpi; @@ -270,13 +377,33 @@ static int gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *unused_tpi) goto drop; } - if (ipgre_rcv(skb, &tpi) == PACKET_RCVD) + if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD) + return 0; +drop: + + kfree_skb(skb); + return 0; +} +#else +static int gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *__tpi) +{ + struct tnl_ptk_info tpi = *__tpi; + + if (unlikely(tpi.proto == htons(ETH_P_ERSPAN) || + tpi.proto == htons(ETH_P_ERSPAN2))) { + if (erspan_rcv(skb, &tpi, 0) == PACKET_RCVD) + return 0; + goto drop; + } + + if (ipgre_rcv(skb, &tpi, 0) == PACKET_RCVD) return 0; drop: kfree_skb(skb); return 0; } +#endif #if LINUX_VERSION_CODE < KERNEL_VERSION(4,7,0) /* gre_handle_offloads() has different return type on older kernsl. */ @@ -542,24 +669,25 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev, if (version == 1) { erspan_build_header(skb, ntohl(tunnel_id_to_key32(key->tun_id)), ntohl(md->u.index), truncate, true); + tpi.hdr_len = ERSPAN_V1_MDSIZE; + tpi.proto = htons(ETH_P_ERSPAN); } else if (version == 2) { erspan_build_header_v2(skb, ntohl(tunnel_id_to_key32(key->tun_id)), md->u.md2.dir, get_hwid(&md->u.md2), truncate, true); - + tpi.hdr_len = ERSPAN_V2_MDSIZE; + tpi.proto = htons(ETH_P_ERSPAN2); } else { goto err_free_rt; } tpi.flags = (TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_SEQ); - tpi.proto = htons(ETH_P_ERSPAN); tpi.key = tunnel_id_to_key32(key->tun_id); tpi.seq = htonl(tunnel->o_seqno++); - tpi.hdr_len = 8; - gre_build_header(skb, &tpi, 8); + gre_build_header(skb, &tpi, tunnel_hlen); df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; @@ -586,23 +714,27 @@ static void __gre_tunnel_init(struct net_device *dev) int t_hlen; tunnel = netdev_priv(dev); - tunnel->parms.iph.protocol = IPPROTO_GRE; tunnel->tun_hlen = ip_gre_calc_hlen(tunnel->parms.o_flags); + tunnel->parms.iph.protocol = IPPROTO_GRE; tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen; t_hlen = tunnel->hlen + sizeof(struct iphdr); - dev->needed_headroom = LL_MAX_HEADER + t_hlen + 4; - dev->mtu = ETH_DATA_LEN - t_hlen - 4; - dev->features |= GRE_FEATURES; dev->hw_features |= GRE_FEATURES; if (!(tunnel->parms.o_flags & TUNNEL_SEQ)) { - /* TCP offload with GRE SEQ is not supported. */ - dev->features |= NETIF_F_GSO_SOFTWARE; - dev->hw_features |= NETIF_F_GSO_SOFTWARE; + /* TCP offload with GRE SEQ is not supported, nor + * can we support 2 levels of outer headers requiring + * an update. + */ + if (!(tunnel->parms.o_flags & TUNNEL_CSUM) || + (tunnel->encap.type == TUNNEL_ENCAP_NONE)) { + dev->features |= NETIF_F_GSO_SOFTWARE; + dev->hw_features |= NETIF_F_GSO_SOFTWARE; + } + /* Can use a lockless transmit, unless we generate * output sequences */ @@ -616,6 +748,25 @@ static struct gre_cisco_protocol ipgre_protocol = { .priority = 1, }; +static int __net_init ipgre_init_net(struct net *net) +{ + return ip_tunnel_init_net(net, ipgre_net_id, &ipgre_link_ops, NULL); +} + +static void __net_exit ipgre_exit_net(struct net *net) +{ + struct ip_tunnel_net *itn = net_generic(net, ipgre_net_id); + + ip_tunnel_delete_net(itn, &ipgre_link_ops); +} + +static struct pernet_operations ipgre_net_ops = { + .init = ipgre_init_net, + .exit = ipgre_exit_net, + .id = &ipgre_net_id, + .size = sizeof(struct ip_tunnel_net), +}; + static int ipgre_tunnel_validate(struct nlattr *tb[], struct nlattr *data[]) { __be16 flags; @@ -717,14 +868,74 @@ static int erspan_validate(struct nlattr *tb[], struct nlattr *data[]) return 0; } -static void ipgre_netlink_parms(struct net_device *dev, - struct nlattr *data[], - struct nlattr *tb[], - struct ip_tunnel_parm *parms) +static int ipgre_netlink_parms(struct net_device *dev, + struct nlattr *data[], + struct nlattr *tb[], + struct ip_tunnel_parm *parms) { + struct ip_tunnel *t = netdev_priv(dev); + memset(parms, 0, sizeof(*parms)); parms->iph.protocol = IPPROTO_GRE; + + if (!data) + return 0; + + if (data[IFLA_GRE_LINK]) + parms->link = nla_get_u32(data[IFLA_GRE_LINK]); + + if (data[IFLA_GRE_IFLAGS]) + parms->i_flags = gre_flags_to_tnl_flags(nla_get_be16(data[IFLA_GRE_IFLAGS])); + + if (data[IFLA_GRE_OFLAGS]) + parms->o_flags = gre_flags_to_tnl_flags(nla_get_be16(data[IFLA_GRE_OFLAGS])); + + if (data[IFLA_GRE_IKEY]) + parms->i_key = nla_get_be32(data[IFLA_GRE_IKEY]); + + if (data[IFLA_GRE_OKEY]) + parms->o_key = nla_get_be32(data[IFLA_GRE_OKEY]); + + if (data[IFLA_GRE_LOCAL]) + parms->iph.saddr = nla_get_in_addr(data[IFLA_GRE_LOCAL]); + + if (data[IFLA_GRE_REMOTE]) + parms->iph.daddr = nla_get_in_addr(data[IFLA_GRE_REMOTE]); + + if (data[IFLA_GRE_TTL]) + parms->iph.ttl = nla_get_u8(data[IFLA_GRE_TTL]); + + if (data[IFLA_GRE_TOS]) + parms->iph.tos = nla_get_u8(data[IFLA_GRE_TOS]); + + if (!data[IFLA_GRE_PMTUDISC] || nla_get_u8(data[IFLA_GRE_PMTUDISC])) { + if (t->ignore_df) + return -EINVAL; + parms->iph.frag_off = htons(IP_DF); + } + + if (data[IFLA_GRE_COLLECT_METADATA]) { + t->collect_md = true; + if (dev->type == ARPHRD_IPGRE) + dev->type = ARPHRD_NONE; + } + + if (data[IFLA_GRE_IGNORE_DF]) { + if (nla_get_u8(data[IFLA_GRE_IGNORE_DF]) + && (parms->iph.frag_off & htons(IP_DF))) + return -EINVAL; + t->ignore_df = !!nla_get_u8(data[IFLA_GRE_IGNORE_DF]); + } + + if (data[IFLA_GRE_ERSPAN_INDEX]) { + t->index = nla_get_u32(data[IFLA_GRE_ERSPAN_INDEX]); + + if (t->index & ~INDEX_MASK) + return -EINVAL; + } + + return 0; } static int gre_tap_init(struct net_device *dev) @@ -788,6 +999,12 @@ free_skb: return NETDEV_TX_OK; } +static netdev_tx_t __erspan_fb_xmit(struct sk_buff *skb) +{ + erspan_fb_xmit(skb, skb->dev, skb->protocol); + return NETDEV_TX_OK; +} + int ovs_gre_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) { struct ip_tunnel_info *info = skb_tunnel_info(skb); @@ -807,22 +1024,6 @@ int ovs_gre_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(ovs_gre_fill_metadata_dst); -static const struct net_device_ops gre_tap_netdev_ops = { - .ndo_init = gre_tap_init, - .ndo_uninit = ip_tunnel_uninit, - .ndo_start_xmit = gre_dev_xmit, - .ndo_set_mac_address = eth_mac_addr, - .ndo_validate_addr = eth_validate_addr, - .ndo_change_mtu = ip_tunnel_change_mtu, - .ndo_get_stats64 = rpl_ip_tunnel_get_stats64, -#ifdef HAVE_NDO_GET_IFLINK - .ndo_get_iflink = rpl_ip_tunnel_get_iflink, -#endif -#ifdef HAVE_NDO_FILL_METADATA_DST - .ndo_fill_metadata_dst = gre_fill_metadata_dst, -#endif -}; - static int erspan_tunnel_init(struct net_device *dev) { struct ip_tunnel *tunnel = netdev_priv(dev); @@ -844,6 +1045,135 @@ static int erspan_tunnel_init(struct net_device *dev) return ip_tunnel_init(dev); } +static int ipgre_header(struct sk_buff *skb, struct net_device *dev, + unsigned short type, + const void *daddr, const void *saddr, unsigned int len) +{ + struct ip_tunnel *t = netdev_priv(dev); + struct iphdr *iph; + struct gre_base_hdr *greh; + + iph = (struct iphdr *)__skb_push(skb, t->hlen + sizeof(*iph)); + greh = (struct gre_base_hdr *)(iph+1); + greh->flags = gre_tnl_flags_to_gre_flags(t->parms.o_flags); + greh->protocol = htons(type); + + memcpy(iph, &t->parms.iph, sizeof(struct iphdr)); + + /* Set the source hardware address. */ + if (saddr) + memcpy(&iph->saddr, saddr, 4); + if (daddr) + memcpy(&iph->daddr, daddr, 4); + if (iph->daddr) + return t->hlen + sizeof(*iph); + + return -(t->hlen + sizeof(*iph)); +} + +static int ipgre_header_parse(const struct sk_buff *skb, unsigned char *haddr) +{ + const struct iphdr *iph = (const struct iphdr *) skb_mac_header(skb); + memcpy(haddr, &iph->saddr, 4); + return 4; +} + +static const struct header_ops ipgre_header_ops = { + .create = ipgre_header, + .parse = ipgre_header_parse, +}; + +static int ipgre_tunnel_init(struct net_device *dev) +{ + struct ip_tunnel *tunnel = netdev_priv(dev); + struct iphdr *iph = &tunnel->parms.iph; + + __gre_tunnel_init(dev); + + memcpy(dev->dev_addr, &iph->saddr, 4); + memcpy(dev->broadcast, &iph->daddr, 4); + + dev->flags = IFF_NOARP; + netif_keep_dst(dev); + dev->addr_len = 4; + + if (!tunnel->collect_md) { + dev->header_ops = &ipgre_header_ops; + } + + return ip_tunnel_init(dev); +} + +static netdev_tx_t ipgre_xmit(struct sk_buff *skb, + struct net_device *dev) +{ + struct ip_tunnel *tunnel = netdev_priv(dev); + const struct iphdr *tnl_params; + + if (tunnel->collect_md) { + gre_fb_xmit(skb); + return NETDEV_TX_OK; + } + + if (dev->header_ops) { + /* Need space for new headers */ + if (skb_cow_head(skb, dev->needed_headroom - + (tunnel->hlen + sizeof(struct iphdr)))) + goto free_skb; + + tnl_params = (const struct iphdr *)skb->data; + + /* Pull skb since ip_tunnel_xmit() needs skb->data pointing + * to gre header. + */ + skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)); + skb_reset_mac_header(skb); + } else { + if (skb_cow_head(skb, dev->needed_headroom)) + goto free_skb; + + tnl_params = &tunnel->parms.iph; + } + + if (gre_handle_offloads(skb, !!(tunnel->parms.o_flags & TUNNEL_CSUM))) + goto free_skb; + + __gre_xmit(skb, dev, tnl_params, skb->protocol); + return NETDEV_TX_OK; + +free_skb: + kfree_skb(skb); + dev->stats.tx_dropped++; + return NETDEV_TX_OK; +} + +static const struct net_device_ops ipgre_netdev_ops = { + .ndo_init = ipgre_tunnel_init, + .ndo_uninit = rpl_ip_tunnel_uninit, + .ndo_start_xmit = ipgre_xmit, + .ndo_change_mtu = ip_tunnel_change_mtu, + .ndo_get_stats64 = ip_tunnel_get_stats64, +#ifdef HAVE_GET_LINK_NET + .ndo_get_iflink = ip_tunnel_get_iflink, +#endif +}; + +static const struct net_device_ops gre_tap_netdev_ops = { + .ndo_init = gre_tap_init, + .ndo_uninit = rpl_ip_tunnel_uninit, + .ndo_start_xmit = gre_dev_xmit, + .ndo_set_mac_address = eth_mac_addr, + .ndo_validate_addr = eth_validate_addr, + .ndo_change_mtu = ip_tunnel_change_mtu, + .ndo_get_stats64 = rpl_ip_tunnel_get_stats64, +#ifdef HAVE_NDO_GET_IFLINK + .ndo_get_iflink = rpl_ip_tunnel_get_iflink, +#endif +#ifdef HAVE_NDO_FILL_METADATA_DST + .ndo_fill_metadata_dst = gre_fill_metadata_dst, +#endif +}; + static const struct net_device_ops erspan_netdev_ops = { .ndo_init = erspan_tunnel_init, .ndo_uninit = rpl_ip_tunnel_uninit, @@ -860,6 +1190,13 @@ static const struct net_device_ops erspan_netdev_ops = { #endif }; +static void ipgre_tunnel_setup(struct net_device *dev) +{ + dev->netdev_ops = &ipgre_netdev_ops; + dev->type = ARPHRD_IPGRE; + ip_tunnel_setup(dev, ipgre_net_id); +} + static void ipgre_tap_setup(struct net_device *dev) { ether_setup(dev); @@ -871,6 +1208,16 @@ static void ipgre_tap_setup(struct net_device *dev) ip_tunnel_setup(dev, gre_tap_net_id); } +static void erspan_setup(struct net_device *dev) +{ + eth_hw_addr_random(dev); + ether_setup(dev); + dev->netdev_ops = &erspan_netdev_ops; + dev->priv_flags &= ~IFF_TX_SKB_SHARING; + dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; + ip_tunnel_setup(dev, erspan_net_id); +} + static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) { @@ -964,15 +1311,6 @@ nla_put_failure: return -EMSGSIZE; } -static void erspan_setup(struct net_device *dev) -{ - ether_setup(dev); - dev->netdev_ops = &erspan_netdev_ops; - dev->priv_flags &= ~IFF_TX_SKB_SHARING; - dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; - ip_tunnel_setup(dev, erspan_net_id); -} - static const struct nla_policy ipgre_policy[RPL_IFLA_GRE_MAX + 1] = { [IFLA_GRE_LINK] = { .type = NLA_U32 }, [IFLA_GRE_IFLAGS] = { .type = NLA_U16 }, @@ -990,6 +1328,22 @@ static const struct nla_policy ipgre_policy[RPL_IFLA_GRE_MAX + 1] = { [IFLA_GRE_ERSPAN_HWID] = { .type = NLA_U16 }, }; +static struct rtnl_link_ops ipgre_link_ops __read_mostly = { + .kind = "gre", + .maxtype = RPL_IFLA_GRE_MAX, + .policy = ipgre_policy, + .priv_size = sizeof(struct ip_tunnel), + .setup = ipgre_tunnel_setup, + .validate = ipgre_tunnel_validate, + .newlink = ipgre_newlink, + .dellink = ip_tunnel_dellink, + .get_size = ipgre_get_size, + .fill_info = ipgre_fill_info, +#ifdef HAVE_GET_LINK_NET + .get_link_net = ip_tunnel_get_link_net, +#endif +}; + static struct rtnl_link_ops ipgre_tap_ops __read_mostly = { .kind = "ovs_gretap", .maxtype = RPL_IFLA_GRE_MAX, @@ -1065,7 +1419,7 @@ EXPORT_SYMBOL_GPL(rpl_gretap_fb_dev_create); static int __net_init erspan_init_net(struct net *net) { return ip_tunnel_init_net(net, erspan_net_id, - &erspan_link_ops, "erspan0"); + &erspan_link_ops, NULL); } static void __net_exit erspan_exit_net(struct net *net) @@ -1101,6 +1455,154 @@ static struct pernet_operations ipgre_tap_net_ops = { .size = sizeof(struct ip_tunnel_net), }; +static struct net_device *erspan_fb_dev_create(struct net *net, + const char *name, + u8 name_assign_type) +{ + struct nlattr *tb[IFLA_MAX + 1]; + struct net_device *dev; + LIST_HEAD(list_kill); + struct ip_tunnel *t; + int err; + + memset(&tb, 0, sizeof(tb)); + + dev = rtnl_create_link(net, (char *)name, name_assign_type, + &erspan_link_ops, tb); + if (IS_ERR(dev)) + return dev; + + t = netdev_priv(dev); + t->collect_md = true; + /* Configure flow based GRE device. */ + err = ipgre_newlink(net, dev, tb, NULL); + if (err < 0) { + free_netdev(dev); + return ERR_PTR(err); + } + + /* openvswitch users expect packet sizes to be unrestricted, + * so set the largest MTU we can. + */ + err = __ip_tunnel_change_mtu(dev, IP_MAX_MTU, false); + if (err) + goto out; + + return dev; +out: + ip_tunnel_dellink(dev, &list_kill); + unregister_netdevice_many(&list_kill); + return ERR_PTR(err); +} + +static struct vport_ops ovs_erspan_vport_ops; + +static struct vport *erspan_tnl_create(const struct vport_parms *parms) +{ + struct net *net = ovs_dp_get_net(parms->dp); + struct net_device *dev; + struct vport *vport; + int err; + + vport = ovs_vport_alloc(0, &ovs_erspan_vport_ops, parms); + if (IS_ERR(vport)) + return vport; + + rtnl_lock(); + dev = erspan_fb_dev_create(net, parms->name, NET_NAME_USER); + if (IS_ERR(dev)) { + rtnl_unlock(); + ovs_vport_free(vport); + return ERR_CAST(dev); + } + + err = dev_change_flags(dev, dev->flags | IFF_UP); + if (err < 0) { + rtnl_delete_link(dev); + rtnl_unlock(); + ovs_vport_free(vport); + return ERR_PTR(err); + } + + rtnl_unlock(); + return vport; +} + +static struct vport *erspan_create(const struct vport_parms *parms) +{ + struct vport *vport; + + vport = erspan_tnl_create(parms); + if (IS_ERR(vport)) + return vport; + + return ovs_netdev_link(vport, parms->name); +} + +static struct vport_ops ovs_erspan_vport_ops = { + .type = OVS_VPORT_TYPE_ERSPAN, + .create = erspan_create, + .send = __erspan_fb_xmit, +#ifndef USE_UPSTREAM_TUNNEL + .fill_metadata_dst = gre_fill_metadata_dst, +#endif + .destroy = ovs_netdev_tunnel_destroy, +}; + +static struct vport_ops ovs_ipgre_vport_ops; + +static struct vport *ipgre_tnl_create(const struct vport_parms *parms) +{ + struct net *net = ovs_dp_get_net(parms->dp); + struct net_device *dev; + struct vport *vport; + int err; + + vport = ovs_vport_alloc(0, &ovs_ipgre_vport_ops, parms); + if (IS_ERR(vport)) + return vport; + + rtnl_lock(); + dev = gretap_fb_dev_create(net, parms->name, NET_NAME_USER); + if (IS_ERR(dev)) { + rtnl_unlock(); + ovs_vport_free(vport); + return ERR_CAST(dev); + } + + err = dev_change_flags(dev, dev->flags | IFF_UP); + if (err < 0) { + rtnl_delete_link(dev); + rtnl_unlock(); + ovs_vport_free(vport); + return ERR_PTR(err); + } + + rtnl_unlock(); + return vport; +} + +static struct vport *ipgre_create(const struct vport_parms *parms) +{ + struct vport *vport; + + vport = ipgre_tnl_create(parms); + if (IS_ERR(vport)) + return vport; + + return ovs_netdev_link(vport, parms->name); +} + +static struct vport_ops ovs_ipgre_vport_ops = { + .type = OVS_VPORT_TYPE_GRE, + .create = ipgre_create, + .send = gre_fb_xmit, +#ifndef USE_UPSTREAM_TUNNEL + .fill_metadata_dst = gre_fill_metadata_dst, +#endif + .destroy = ovs_netdev_tunnel_destroy, +}; + int rpl_ipgre_init(void) { int err; @@ -1113,28 +1615,25 @@ int rpl_ipgre_init(void) if (err < 0) goto pnet_erspan_failed; + err = register_pernet_device(&ipgre_net_ops); + if (err < 0) + goto pnet_ipgre_failed; + err = gre_cisco_register(&ipgre_protocol); if (err < 0) { pr_info("%s: can't add protocol\n", __func__); goto add_proto_failed; } - err = rtnl_link_register(&ipgre_tap_ops); - if (err < 0) - goto tap_ops_failed; - - err = rtnl_link_register(&erspan_link_ops); - if (err < 0) - goto erspan_link_failed; - pr_info("GRE over IPv4 tunneling driver\n"); + + ovs_vport_ops_register(&ovs_ipgre_vport_ops); + ovs_vport_ops_register(&ovs_erspan_vport_ops); return 0; -erspan_link_failed: - rtnl_link_unregister(&ipgre_tap_ops); -tap_ops_failed: - gre_cisco_unregister(&ipgre_protocol); add_proto_failed: + unregister_pernet_device(&ipgre_net_ops); +pnet_ipgre_failed: unregister_pernet_device(&erspan_net_ops); pnet_erspan_failed: unregister_pernet_device(&ipgre_tap_net_ops); @@ -1145,11 +1644,12 @@ pnet_tap_failed: void rpl_ipgre_fini(void) { - rtnl_link_unregister(&ipgre_tap_ops); - rtnl_link_unregister(&erspan_link_ops); + ovs_vport_ops_unregister(&ovs_erspan_vport_ops); + ovs_vport_ops_unregister(&ovs_ipgre_vport_ops); gre_cisco_unregister(&ipgre_protocol); - unregister_pernet_device(&ipgre_tap_net_ops); + unregister_pernet_device(&ipgre_net_ops); unregister_pernet_device(&erspan_net_ops); + unregister_pernet_device(&ipgre_tap_net_ops); } #endif diff --git a/datapath/linux/compat/ip_tunnel.c b/datapath/linux/compat/ip_tunnel.c index d8cd798..68855c7 100644 --- a/datapath/linux/compat/ip_tunnel.c +++ b/datapath/linux/compat/ip_tunnel.c @@ -64,18 +64,58 @@ const struct ip_tunnel_encap_ops __rcu * rpl_iptun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly; +static unsigned int rpl_ip_tunnel_hash(__be32 key, __be32 remote) +{ + return hash_32((__force u32)key ^ (__force u32)remote, + IP_TNL_HASH_BITS); +} + +static bool rpl_ip_tunnel_key_match(const struct ip_tunnel_parm *p, + __be16 flags, __be32 key) +{ + if (p->i_flags & TUNNEL_KEY) { + if (flags & TUNNEL_KEY) + return key == p->i_key; + else + /* key expected, none present */ + return false; + } else + return !(flags & TUNNEL_KEY); +} + +static struct hlist_head *ip_bucket(struct ip_tunnel_net *itn, + struct ip_tunnel_parm *parms) +{ + unsigned int h; + __be32 remote; + __be32 i_key = parms->i_key; + + if (parms->iph.daddr && !ipv4_is_multicast(parms->iph.daddr)) + remote = parms->iph.daddr; + else + remote = 0; + + if (!(parms->i_flags & TUNNEL_KEY) && (parms->i_flags & VTI_ISVTI)) + i_key = 0; + + h = rpl_ip_tunnel_hash(i_key, remote); + return &itn->tunnels[h]; +} + static void ip_tunnel_add(struct ip_tunnel_net *itn, struct ip_tunnel *t) { + struct hlist_head *head = ip_bucket(itn, &t->parms); + if (t->collect_md) rcu_assign_pointer(itn->collect_md_tun, t); - else - WARN_ONCE(1, "%s: collect md not set\n", t->dev->name); + hlist_add_head_rcu(&t->hash_node, head); } static void ip_tunnel_del(struct ip_tunnel_net *itn, struct ip_tunnel *t) { if (t->collect_md) rcu_assign_pointer(itn->collect_md_tun, NULL); + hlist_del_init_rcu(&t->hash_node); } static struct net_device *__ip_tunnel_create(struct net *net, @@ -165,6 +205,8 @@ static int ip_tunnel_bind_dev(struct net_device *dev) } if (dev->type != ARPHRD_ETHER) dev->flags |= IFF_POINTOPOINT; + + dst_cache_reset(&tunnel->dst_cache); } if (!tdev && tunnel->parms.link) @@ -479,15 +521,31 @@ int rpl_ip_tunnel_init_net(struct net *net, int ip_tnl_net_id, static void ip_tunnel_destroy(struct ip_tunnel_net *itn, struct list_head *head, struct rtnl_link_ops *ops) { - struct ip_tunnel *t; + struct net *net = dev_net(itn->fb_tunnel_dev); + struct net_device *dev, *aux; + int h; + + for_each_netdev_safe(net, dev, aux) + if (dev->rtnl_link_ops == ops) + unregister_netdevice_queue(dev, head); + + for (h = 0; h < IP_TNL_HASH_SIZE; h++) { + struct ip_tunnel *t; + struct hlist_node *n; + struct hlist_head *thead = &itn->tunnels[h]; + + hlist_for_each_entry_safe(t, n, thead, hash_node) + /* If dev is in the same netns, it has already + * been added to the list by the previous loop. + */ + if (!net_eq(dev_net(t->dev), net)) + unregister_netdevice_queue(t->dev, head); + } - t = rtnl_dereference(itn->collect_md_tun); - if (!t) - return; - unregister_netdevice_queue(t->dev, head); } -void rpl_ip_tunnel_delete_net(struct ip_tunnel_net *itn, struct rtnl_link_ops *ops) +void rpl_ip_tunnel_delete_net(struct ip_tunnel_net *itn, + struct rtnl_link_ops *ops) { LIST_HEAD(list); @@ -608,25 +666,6 @@ struct net *rpl_ip_tunnel_get_link_net(const struct net_device *dev) return tunnel->net; } -static unsigned int rpl_ip_tunnel_hash(__be32 key, __be32 remote) -{ - return hash_32((__force u32)key ^ (__force u32)remote, - IP_TNL_HASH_BITS); -} - -static bool rpl_ip_tunnel_key_match(const struct ip_tunnel_parm *p, - __be16 flags, __be32 key) -{ - if (p->i_flags & TUNNEL_KEY) { - if (flags & TUNNEL_KEY) - return key == p->i_key; - else - /* key expected, none present */ - return false; - } else - return !(flags & TUNNEL_KEY); -} - struct ip_tunnel *rpl_ip_tunnel_lookup(struct ip_tunnel_net *itn, int link, __be16 flags, __be32 remote, __be32 local, diff --git a/datapath/vport.c b/datapath/vport.c index e5221e9..f4131be 100644 --- a/datapath/vport.c +++ b/datapath/vport.c @@ -112,6 +112,8 @@ void ovs_vport_exit(void) ovs_stt_cleanup_module(); vxlan_cleanup_module(); geneve_cleanup_module(); + ip6_tunnel_cleanup(); + ip6gre_fini(); ipgre_fini(); lisp_cleanup_module(); kfree(dev_table);