From patchwork Wed May 1 12:34:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1930159 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTxW62VSrz1ydT for ; Wed, 1 May 2024 22:40:49 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1s29GD-0004YF-BF; Wed, 01 May 2024 12:40:33 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1s29GC-0004XU-38 for kernel-team@lists.ubuntu.com; Wed, 01 May 2024 12:40:32 +0000 Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 896ED3F129 for ; Wed, 1 May 2024 12:40:29 +0000 (UTC) Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-a58bbfd44f7so343959766b.1 for ; Wed, 01 May 2024 05:40:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714567229; x=1715172029; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k59RvLaDXVLLbbodcH1dyYNJUy9xRHQsGSl3p5ULfu8=; b=WkPMZPmKxWAHwWTTGkj4DMlsoJI1r+bFy4VkuCitlE6Vr5KqPvuEWxsn7vY92Ig/5G z/esLBXHXZwJvVv3Bmw6r76uGPKGB++96HM22cwHCVWuHTe7araOVZhhd0/vZ/CR7A2n QpgGRJoptE8GHd2DBoQLrY9EO1hC14GBxJ8QSkCRkjY0g1ShbN6ahCnhEZwp3/tnxCT6 jwzOipFQDOFm3E7xwL4wii/QWpq8ddtBqLC2K9JA4WjayqHAdcUH4IJha+/edtka3CiP JqaQSSnoQ3Li3jMi9QHWd2sC6fd52BRWLgK4OaiPbhBYuVKcHHrWU2uE+WYx00G8I7JR iJuQ== X-Gm-Message-State: AOJu0YwmrByXzZQWejklVuZLT3Ee0l1Peqq12pIWkc8h/z94GL+DjK1Q bCR00PneyyQHaN9uzSTBaTEBqJ2GxowSKRa0FS/yg7dkql9u6n/EAusYOlEkFmsC7MIszinRmzD ZDCV+QGiAbihs4EUoMdv7p8v7K+Q0slYHbR1hU7uoDcFOrVsFmL0IvMnjTqyGc53DEkIedoWFWG IS4j0b8h9twA== X-Received: by 2002:a17:906:2dc1:b0:a55:5ee3:3c80 with SMTP id h1-20020a1709062dc100b00a555ee33c80mr1566186eji.29.1714567228481; Wed, 01 May 2024 05:40:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHwcVjR7FOxzDzUmFOjda/78j6/14vO3pMAke+neBBOgM+SkgpRBA4hfXRIkmeSzinCa+ZjHA== X-Received: by 2002:a17:906:2dc1:b0:a55:5ee3:3c80 with SMTP id h1-20020a1709062dc100b00a555ee33c80mr1566163eji.29.1714567227614; Wed, 01 May 2024 05:40:27 -0700 (PDT) Received: from gpd.homenet.telecomitalia.it (host-82-49-69-7.retail.telecomitalia.it. [82.49.69.7]) by smtp.gmail.com with ESMTPSA id i7-20020a1709061e4700b00a52244ab819sm16602108ejj.170.2024.05.01.05.40.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 May 2024 05:40:27 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [SRU][N][PATCH 1/4] UBUNTU: SAUCE: fan: tunnel multiple mapping mode (v3) Date: Wed, 1 May 2024 14:34:57 +0200 Message-ID: <20240501124023.683940-2-andrea.righi@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240501124023.683940-1-andrea.righi@canonical.com> References: <20240501124023.683940-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jay Vosburgh BugLink: https://bugs.launchpad.net/bugs/2064508 Switch to a single tunnel for all mappings, this removes the limitations on how many mappings each tunnel can handle, and therefore how many Fan slices each local address may hold. NOTE: This introduces a new kernel netlink interface which needs updated iproute2 support. BugLink: http://bugs.launchpad.net/bugs/1470091 Signed-off-by: Jay Vosburgh Signed-off-by: Andy Whitcroft Signed-off-by: Tim Gardner [saf: Fix conflicts during rebase to 4.12] Signed-off-by: Seth Forshee [arighi: support v6.8 ABI] Signed-off-by: Andrea Righi --- include/net/ip_tunnels.h | 14 +++ include/uapi/linux/if_tunnel.h | 21 ++++ net/ipv4/ip_tunnel.c | 7 +- net/ipv4/ipip.c | 179 ++++++++++++++++++++++++++++++++- 4 files changed, 218 insertions(+), 3 deletions(-) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 2d746f4c9a0a..98a3d8ad9415 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -109,6 +109,19 @@ struct ip_tunnel_prl_entry { }; struct metadata_dst; +/* A fan overlay /8 (250.0.0.0/8, for example) maps to exactly one /16 + * underlay (10.88.0.0/16, for example). Multiple local addresses within + * the /16 may be used, but a particular overlay may not span + * multiple underlay subnets. + * + * We store one underlay, indexed by the overlay's high order octet. + */ +#define FAN_OVERLAY_CNT 256 + +struct ip_tunnel_fan { +/* u32 __rcu *map;*/ + u32 map[FAN_OVERLAY_CNT]; +}; struct ip_tunnel { struct ip_tunnel __rcu *next; @@ -149,6 +162,7 @@ struct ip_tunnel { #endif struct ip_tunnel_prl_entry __rcu *prl; /* potential router list */ unsigned int prl_count; /* # of entries in PRL */ + struct ip_tunnel_fan fan; unsigned int ip_tnl_net_id; struct gro_cells gro_cells; __u32 fwmark; diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h index 102119628ff5..a862f0c483b7 100644 --- a/include/uapi/linux/if_tunnel.h +++ b/include/uapi/linux/if_tunnel.h @@ -77,6 +77,10 @@ enum { IFLA_IPTUN_ENCAP_DPORT, IFLA_IPTUN_COLLECT_METADATA, IFLA_IPTUN_FWMARK, + + __IFLA_IPTUN_VENDOR_BREAK, /* Ensure new entries do not hit the below. */ + IFLA_IPTUN_FAN_MAP = 33, + __IFLA_IPTUN_MAX, }; #define IFLA_IPTUN_MAX (__IFLA_IPTUN_MAX - 1) @@ -182,4 +186,21 @@ enum { (TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT | \ TUNNEL_GTP_OPT) +#define TUNNEL_FAN __cpu_to_be16(0x8000) + +enum { + IFLA_FAN_UNSPEC, + IFLA_FAN_MAPPING, + __IFLA_FAN_MAX, +}; + +#define IFLA_FAN_MAX (__IFLA_FAN_MAX - 1) + +struct ip_tunnel_fan_map { + __be32 underlay; + __be32 overlay; + __u16 underlay_prefix; + __u16 overlay_prefix; +}; + #endif /* _UAPI_IF_TUNNEL_H_ */ diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 7af36e4f1647..5faebe94d071 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -1227,6 +1227,11 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[], } EXPORT_SYMBOL_GPL(ip_tunnel_newlink); +static int ip_tunnel_is_fan(struct ip_tunnel *tunnel) +{ + return tunnel->parms.i_flags & TUNNEL_FAN; +} + int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_parm *p, __u32 fwmark) { @@ -1236,7 +1241,7 @@ int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_net *itn = net_generic(net, tunnel->ip_tnl_net_id); if (dev == itn->fb_tunnel_dev) - return -EINVAL; + return ip_tunnel_is_fan(tunnel) ? 0 : -EINVAL; t = ip_tunnel_find(itn, p, dev->type); diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index 03afa3871efc..2fbe16a33bc1 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -101,6 +101,7 @@ #include #include #include +#include #include #include @@ -267,6 +268,40 @@ static int mplsip_rcv(struct sk_buff *skb) } #endif +static int ipip_tunnel_is_fan(struct ip_tunnel *tunnel) +{ + return tunnel->parms.i_flags & TUNNEL_FAN; +} + +/* + * Determine fan tunnel endpoint to send packet to, based on the inner IP + * address. For an overlay (inner) address Y.A.B.C, the transformation is + * F.G.A.B, where "F" and "G" are the first two octets of the underlay + * network (the network portion of a /16), "A" and "B" are the low order + * two octets of the underlay network host (the host portion of a /16), + * and "Y" is a configured first octet of the overlay network. + * + * E.g., underlay host 10.88.3.4 with an overlay of 99 would host overlay + * subnet 99.3.4.0/24. An overlay network datagram from 99.3.4.5 to + * 99.6.7.8, would be directed to underlay host 10.88.6.7, which hosts + * overlay network 99.6.7.0/24. + */ +static int ipip_build_fan_iphdr(struct ip_tunnel *tunnel, struct sk_buff *skb, struct iphdr *iph) +{ + unsigned int overlay; + u32 daddr, underlay; + + daddr = ntohl(ip_hdr(skb)->daddr); + overlay = daddr >> 24; + underlay = tunnel->fan.map[overlay]; + if (!underlay) + return -EINVAL; + + *iph = tunnel->parms.iph; + iph->daddr = htonl(underlay | ((daddr >> 8) & 0x0000ffff)); + return 0; +} + /* * This function assumes it is being called from dev_queue_xmit() * and that skb is filled properly by that function. @@ -277,6 +312,7 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct ip_tunnel *tunnel = netdev_priv(dev); const struct iphdr *tiph = &tunnel->parms.iph; u8 ipproto; + struct iphdr fiph; if (!pskb_inet_may_pull(skb)) goto tx_error; @@ -300,6 +336,14 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP4)) goto tx_error; + if (ipip_tunnel_is_fan(tunnel)) { + if (ipip_build_fan_iphdr(tunnel, skb, &fiph)) + goto tx_error; + tiph = &fiph; + } else { + tiph = &tunnel->parms.iph; + } + skb_set_inner_ipproto(skb, ipproto); if (tunnel->collect_md) @@ -427,6 +471,68 @@ static void ipip_netlink_parms(struct nlattr *data[], *fwmark = nla_get_u32(data[IFLA_IPTUN_FWMARK]); } +static void ipip_fan_free_map(struct ip_tunnel *t) +{ + memset(&t->fan.map, 0, sizeof(t->fan.map)); +} + +static int ipip_fan_set_map(struct ip_tunnel *t, struct ip_tunnel_fan_map *map) +{ + u32 overlay, overlay_mask, underlay, underlay_mask; + + if ((map->underlay_prefix && map->underlay_prefix != 16) || + (map->overlay_prefix && map->overlay_prefix != 8)) + return -EINVAL; + + overlay = ntohl(map->overlay); + overlay_mask = ntohl(inet_make_mask(map->overlay_prefix)); + + underlay = ntohl(map->underlay); + underlay_mask = ntohl(inet_make_mask(map->underlay_prefix)); + + if ((overlay & ~overlay_mask) || (underlay & ~underlay_mask)) + return -EINVAL; + + if (!(overlay & overlay_mask) && (underlay & underlay_mask)) + return -EINVAL; + + t->parms.i_flags |= TUNNEL_FAN; + + /* Special case: overlay 0 and underlay 0 clears all mappings */ + if (!overlay && !underlay) { + ipip_fan_free_map(t); + return 0; + } + + overlay >>= (32 - map->overlay_prefix); + t->fan.map[overlay] = underlay; + + return 0; +} + +static int ipip_netlink_fan(struct nlattr *data[], struct ip_tunnel *t, + struct ip_tunnel_parm *parms) +{ + struct ip_tunnel_fan_map *map; + struct nlattr *attr; + int rem, rv; + + if (!data[IFLA_IPTUN_FAN_MAP]) + return 0; + + if (parms->iph.daddr) + return -EINVAL; + + nla_for_each_nested(attr, data[IFLA_IPTUN_FAN_MAP], rem) { + map = nla_data(attr); + rv = ipip_fan_set_map(t, map); + if (rv) + return rv; + } + + return 0; +} + static int ipip_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) @@ -435,15 +541,19 @@ static int ipip_newlink(struct net *src_net, struct net_device *dev, struct ip_tunnel_parm p; struct ip_tunnel_encap ipencap; __u32 fwmark = 0; + int err; if (ip_tunnel_netlink_encap_parms(data, &ipencap)) { - int err = ip_tunnel_encap_setup(t, &ipencap); + err = ip_tunnel_encap_setup(t, &ipencap); if (err < 0) return err; } ipip_netlink_parms(data, &p, &t->collect_md, &fwmark); + err = ipip_netlink_fan(data, t, &p); + if (err < 0) + return err; return ip_tunnel_newlink(dev, tb, &p, fwmark); } @@ -456,9 +566,10 @@ static int ipip_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_encap ipencap; bool collect_md; __u32 fwmark = t->fwmark; + int err; if (ip_tunnel_netlink_encap_parms(data, &ipencap)) { - int err = ip_tunnel_encap_setup(t, &ipencap); + err = ip_tunnel_encap_setup(t, &ipencap); if (err < 0) return err; @@ -467,6 +578,9 @@ static int ipip_changelink(struct net_device *dev, struct nlattr *tb[], ipip_netlink_parms(data, &p, &collect_md, &fwmark); if (collect_md) return -EINVAL; + err = ipip_netlink_fan(data, t, &p); + if (err < 0) + return err; if (((dev->flags & IFF_POINTOPOINT) && !p.iph.daddr) || (!(dev->flags & IFF_POINTOPOINT) && p.iph.daddr)) @@ -504,6 +618,8 @@ static size_t ipip_get_size(const struct net_device *dev) nla_total_size(0) + /* IFLA_IPTUN_FWMARK */ nla_total_size(4) + + /* IFLA_IPTUN_FAN_MAP */ + nla_total_size(sizeof(struct ip_tunnel_fan_map)) * 256 + 0; } @@ -536,6 +652,29 @@ static int ipip_fill_info(struct sk_buff *skb, const struct net_device *dev) if (tunnel->collect_md) if (nla_put_flag(skb, IFLA_IPTUN_COLLECT_METADATA)) goto nla_put_failure; + if (tunnel->parms.i_flags & TUNNEL_FAN) { + struct nlattr *fan_nest; + int i; + + fan_nest = nla_nest_start(skb, IFLA_IPTUN_FAN_MAP); + if (!fan_nest) + goto nla_put_failure; + for (i = 0; i < 256; i++) { + if (tunnel->fan.map[i]) { + struct ip_tunnel_fan_map map; + + map.underlay = htonl(tunnel->fan.map[i]); + map.underlay_prefix = 16; + map.overlay = htonl(i << 24); + map.overlay_prefix = 8; + if (nla_put(skb, IFLA_FAN_MAPPING, + sizeof(map), &map)) + goto nla_put_failure; + } + } + nla_nest_end(skb, fan_nest); + } + return 0; nla_put_failure: @@ -556,6 +695,9 @@ static const struct nla_policy ipip_policy[IFLA_IPTUN_MAX + 1] = { [IFLA_IPTUN_ENCAP_DPORT] = { .type = NLA_U16 }, [IFLA_IPTUN_COLLECT_METADATA] = { .type = NLA_FLAG }, [IFLA_IPTUN_FWMARK] = { .type = NLA_U32 }, + + [__IFLA_IPTUN_VENDOR_BREAK ... IFLA_IPTUN_MAX] = { .type = NLA_BINARY }, + [IFLA_IPTUN_FAN_MAP] = { .type = NLA_NESTED }, }; static struct rtnl_link_ops ipip_link_ops __read_mostly = { @@ -604,6 +746,23 @@ static struct pernet_operations ipip_net_ops = { .size = sizeof(struct ip_tunnel_net), }; +#ifdef CONFIG_SYSCTL +static struct ctl_table_header *ipip_fan_header; +static unsigned int ipip_fan_version = 3; + +static struct ctl_table ipip_fan_sysctls[] = { + { + .procname = "version", + .data = &ipip_fan_version, + .maxlen = sizeof(ipip_fan_version), + .mode = 0444, + .proc_handler = proc_dointvec, + }, + {}, +}; + +#endif /* CONFIG_SYSCTL */ + static int __init ipip_init(void) { int err; @@ -629,9 +788,22 @@ static int __init ipip_init(void) if (err < 0) goto rtnl_link_failed; +#ifdef CONFIG_SYSCTL + ipip_fan_header = register_net_sysctl(&init_net, "net/fan", + ipip_fan_sysctls); + if (!ipip_fan_header) { + err = -ENOMEM; + goto sysctl_failed; + } +#endif /* CONFIG_SYSCTL */ + out: return err; +#ifdef CONFIG_SYSCTL +sysctl_failed: + rtnl_link_unregister(&ipip_link_ops); +#endif /* CONFIG_SYSCTL */ rtnl_link_failed: #if IS_ENABLED(CONFIG_MPLS) xfrm4_tunnel_deregister(&mplsip_handler, AF_MPLS); @@ -646,6 +818,9 @@ static int __init ipip_init(void) static void __exit ipip_fini(void) { +#ifdef CONFIG_SYSCTL + unregister_net_sysctl_table(ipip_fan_header); +#endif /* CONFIG_SYSCTL */ rtnl_link_unregister(&ipip_link_ops); if (xfrm4_tunnel_deregister(&ipip_handler, AF_INET)) pr_info("%s: can't deregister tunnel\n", __func__); From patchwork Wed May 1 12:34:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1930161 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTxW714vzz23tZ for ; Wed, 1 May 2024 22:40:51 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1s29GF-0004Z2-Lv; Wed, 01 May 2024 12:40:35 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1s29GB-0004XT-Qk for kernel-team@lists.ubuntu.com; Wed, 01 May 2024 12:40:31 +0000 Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id D95AE3FE0C for ; Wed, 1 May 2024 12:40:30 +0000 (UTC) Received: by mail-ej1-f69.google.com with SMTP id a640c23a62f3a-a55709e5254so332484566b.3 for ; Wed, 01 May 2024 05:40:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714567229; x=1715172029; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SnTXo/ASiV2LGBk3M8Xuik6l5tOdZofpfxgFl7YDSTk=; b=K+Qx09XtbHnZWak4THWlni8c9BCnY3vXVKFkfD8CVgGVzvAbOiCqYHMMlMxB5cFL8U 3B0lLybXLbPZfrIHfTJigXqK/XtuLkpdZaf+WkKP7pmyt9KihvAzEtM9jjWhNXwxneNL CflGu3Gp6XPGjVQbKyuNBVOnWV0FC8H4lSVrIgUKSCZrd6+2XN5PDEqGGCFdnrHuvg+z rsPhzne3nj/pPDpcGeXyUz7d2I4S4Ced7LPMhjN8XD63An7TRSGFeAS5DwR3QOEejvG9 h53pJktR9hktLW00wLSkRK+3YH0ErRtYUNP1Wgu9ZeVmY3CB0+kZP8TSJ2BT57B66m06 Wiew== X-Gm-Message-State: AOJu0Yzh8wotHjQ1vaaz4AMzYttjfL9182ynk2xXA1BkdlHSpA4O0lY0 SLq0nQtpAzG83JPHR1msjqTxopBwZ7rlCkjBRiIVkgj+gJmstzF51/5llvUhkBsd6picYJteN8E D86/kQstvAPHUxf6xL3mnVjmJGH58wwjq165uo7/7LQCLneDT55CHIdvzkNarUkHehZkDDm9JFH UJMlwVjoMpVQ== X-Received: by 2002:a17:907:7286:b0:a59:43e5:2505 with SMTP id dt6-20020a170907728600b00a5943e52505mr2137155ejc.13.1714567229218; Wed, 01 May 2024 05:40:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFRn7X6cMdn8LOpj20gF1poLSyZFJvag8LgBurQSJO7zvDdCKkcJsBGvx0BvWhn3q4swa2diQ== X-Received: by 2002:a17:907:7286:b0:a59:43e5:2505 with SMTP id dt6-20020a170907728600b00a5943e52505mr2137133ejc.13.1714567228519; Wed, 01 May 2024 05:40:28 -0700 (PDT) Received: from gpd.homenet.telecomitalia.it (host-82-49-69-7.retail.telecomitalia.it. [82.49.69.7]) by smtp.gmail.com with ESMTPSA id i7-20020a1709061e4700b00a52244ab819sm16602108ejj.170.2024.05.01.05.40.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 May 2024 05:40:28 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [SRU][N][PATCH 2/4] UBUNTU: SAUCE: fan: add VXLAN implementation Date: Wed, 1 May 2024 14:34:58 +0200 Message-ID: <20240501124023.683940-3-andrea.righi@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240501124023.683940-1-andrea.righi@canonical.com> References: <20240501124023.683940-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jay Vosburgh BugLink: https://bugs.launchpad.net/bugs/2064508 Generify the fan mapping support and utilise that to implement fan mappings over vxlan transport. Expose the existance of this functionality (when the module is loaded) via an additional sysctl marker. Signed-off-by: Jay Vosburgh [apw@canonical.com: added feature marker for fan over vxlan.] Signed-off-by: Andy Whitcroft Signed-off-by: Seth Forshee [ arighi: adjust conflicts in vxlan_xmit() and vxlan_xmit_one() for 6.4-rc1 ] [ arighi: support v6.8 ABI ] Signed-off-by: Andrea Righi --- drivers/net/vxlan/vxlan_core.c | 245 +++++++++++++++++++++++++++++++++ include/net/ip_tunnels.h | 18 ++- include/net/vxlan.h | 2 + include/uapi/linux/if_link.h | 1 + include/uapi/linux/if_tunnel.h | 4 +- net/ipv4/ip_tunnel.c | 7 +- net/ipv4/ipip.c | 242 ++++++++++++++++++++++++-------- 7 files changed, 453 insertions(+), 66 deletions(-) diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index 16106e088c63..f16a4679e5ee 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -71,6 +72,167 @@ static inline bool vxlan_collect_metadata(struct vxlan_sock *vs) ip_tunnel_collect_metadata(); } +static struct ip_fan_map *vxlan_fan_find_map(struct vxlan_dev *vxlan, __be32 daddr) +{ + struct ip_fan_map *fan_map; + + rcu_read_lock(); + list_for_each_entry_rcu(fan_map, &vxlan->fan.fan_maps, list) { + if (fan_map->overlay == + (daddr & inet_make_mask(fan_map->overlay_prefix))) { + rcu_read_unlock(); + return fan_map; + } + } + rcu_read_unlock(); + + return NULL; +} + +static void vxlan_fan_flush_map(struct vxlan_dev *vxlan) +{ + struct ip_fan_map *fan_map; + + list_for_each_entry_rcu(fan_map, &vxlan->fan.fan_maps, list) { + list_del_rcu(&fan_map->list); + kfree_rcu(fan_map, rcu); + } +} + +static int vxlan_fan_del_map(struct vxlan_dev *vxlan, __be32 overlay) +{ + struct ip_fan_map *fan_map; + + fan_map = vxlan_fan_find_map(vxlan, overlay); + if (!fan_map) + return -ENOENT; + + list_del_rcu(&fan_map->list); + kfree_rcu(fan_map, rcu); + + return 0; +} + +static int vxlan_fan_add_map(struct vxlan_dev *vxlan, struct ifla_fan_map *map) +{ + __be32 overlay_mask, underlay_mask; + struct ip_fan_map *fan_map; + + overlay_mask = inet_make_mask(map->overlay_prefix); + underlay_mask = inet_make_mask(map->underlay_prefix); + + netdev_dbg(vxlan->dev, "vfam: map: o %x/%d u %x/%d om %x um %x\n", + map->overlay, map->overlay_prefix, + map->underlay, map->underlay_prefix, + overlay_mask, underlay_mask); + + if ((map->overlay & ~overlay_mask) || (map->underlay & ~underlay_mask)) + return -EINVAL; + + if (!(map->overlay & overlay_mask) && (map->underlay & underlay_mask)) + return -EINVAL; + + /* Special case: overlay 0 and underlay 0: flush all mappings */ + if (!map->overlay && !map->underlay) { + vxlan_fan_flush_map(vxlan); + return 0; + } + + /* Special case: overlay set and underlay 0: clear map for overlay */ + if (!map->underlay) + return vxlan_fan_del_map(vxlan, map->overlay); + + if (vxlan_fan_find_map(vxlan, map->overlay)) + return -EEXIST; + + fan_map = kmalloc(sizeof(*fan_map), GFP_KERNEL); + fan_map->underlay = map->underlay; + fan_map->overlay = map->overlay; + fan_map->underlay_prefix = map->underlay_prefix; + fan_map->overlay_mask = ntohl(overlay_mask); + fan_map->overlay_prefix = map->overlay_prefix; + + list_add_tail_rcu(&fan_map->list, &vxlan->fan.fan_maps); + + return 0; +} + +static int vxlan_parse_fan_map(struct nlattr *data[], struct vxlan_dev *vxlan) +{ + struct ifla_fan_map *map; + struct nlattr *attr; + int rem, rv; + + nla_for_each_nested(attr, data[IFLA_IPTUN_FAN_MAP], rem) { + map = nla_data(attr); + rv = vxlan_fan_add_map(vxlan, map); + if (rv) + return rv; + } + + return 0; +} + +static int vxlan_fan_build_rdst(struct vxlan_dev *vxlan, struct sk_buff *skb, + struct vxlan_rdst *fan_rdst) +{ + struct ip_fan_map *f_map; + union vxlan_addr *va; + u32 daddr, underlay; + struct arphdr *arp; + void *arp_ptr; + struct ethhdr *eth; + struct iphdr *iph; + + eth = eth_hdr(skb); + switch (eth->h_proto) { + case htons(ETH_P_IP): + iph = ip_hdr(skb); + if (!iph) + return -EINVAL; + daddr = iph->daddr; + break; + case htons(ETH_P_ARP): + arp = arp_hdr(skb); + if (!arp) + return -EINVAL; + arp_ptr = arp + 1; + netdev_dbg(vxlan->dev, + "vfbr: arp sha %pM sip %pI4 tha %pM tip %pI4\n", + arp_ptr, arp_ptr + skb->dev->addr_len, + arp_ptr + skb->dev->addr_len + 4, + arp_ptr + (skb->dev->addr_len * 2) + 4); + arp_ptr += (skb->dev->addr_len * 2) + 4; + memcpy(&daddr, arp_ptr, 4); + break; + default: + netdev_dbg(vxlan->dev, "vfbr: unknown eth p %x\n", eth->h_proto); + return -EINVAL; + } + + f_map = vxlan_fan_find_map(vxlan, daddr); + if (!f_map) + return -EINVAL; + + daddr = ntohl(daddr); + underlay = ntohl(f_map->underlay); + if (!underlay) + return -EINVAL; + + memset(fan_rdst, 0, sizeof(*fan_rdst)); + va = &fan_rdst->remote_ip; + va->sa.sa_family = AF_INET; + fan_rdst->remote_vni = vxlan->default_dst.remote_vni; + va->sin.sin_addr.s_addr = htonl(underlay | + ((daddr & ~f_map->overlay_mask) >> + (32 - f_map->overlay_prefix - + (32 - f_map->underlay_prefix)))); + netdev_dbg(vxlan->dev, "vfbr: daddr %x ul %x dst %x\n", + daddr, underlay, va->sin.sin_addr.s_addr); + + return 0; +} + /* Find VXLAN socket based on network namespace, address family, UDP port, * enabled unshareable flags and socket device binding (see l3mdev with * non-default VRF). @@ -2433,6 +2595,13 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, goto tx_error; } + if (fan_has_map(&vxlan->fan) && rt->rt_flags & RTCF_LOCAL) { + netdev_dbg(dev, "discard fan to localhost %pI4\n", + &rdst->remote_ip.sin.sin_addr.s_addr); + ip_rt_put(rt); + goto tx_free; + } + if (!info) { /* Bypass encapsulation if the destination is local */ err = encap_bypass_if_local(skb, dev, vxlan, AF_INET, @@ -2569,6 +2738,7 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, dst_release(ndst); dev->stats.tx_errors++; vxlan_vnifilter_count(vxlan, vni, NULL, VXLAN_VNI_STATS_TX_ERRORS, 0); +tx_free: kfree_skb(skb); } @@ -2716,6 +2886,20 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) rcu_read_unlock(); } + if (fan_has_map(&vxlan->fan)) { + struct vxlan_rdst fan_rdst; + + netdev_dbg(vxlan->dev, "vxlan_xmit p %x d %pM\n", + eth->h_proto, eth->h_dest); + if (vxlan_fan_build_rdst(vxlan, skb, &fan_rdst)) { + dev->stats.tx_dropped++; + kfree_skb(skb); + return NETDEV_TX_OK; + } + vxlan_xmit_one(skb, dev, vni, &fan_rdst, 0); + return NETDEV_TX_OK; + } + eth = eth_hdr(skb); f = vxlan_find_mac(vxlan, eth->h_dest, vni); did_rsc = false; @@ -3325,6 +3509,8 @@ static void vxlan_setup(struct net_device *dev) spin_lock_init(&vxlan->hash_lock[h]); INIT_HLIST_HEAD(&vxlan->fdb_head[h]); } + + INIT_LIST_HEAD(&vxlan->fan.fan_maps); } static void vxlan_ether_setup(struct net_device *dev) @@ -4055,6 +4241,12 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[], conf->remote_ip.sa.sa_family = AF_INET6; } + if (data[IFLA_VXLAN_FAN_MAP]) { + err = vxlan_parse_fan_map(data, vxlan); + if (err) + return err; + } + if (data[IFLA_VXLAN_LOCAL]) { if (changelink && (conf->saddr.sa.sa_family != AF_INET)) { NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LOCAL], "New local address family does not match old"); @@ -4440,6 +4632,7 @@ static size_t vxlan_get_size(const struct net_device *dev) nla_total_size(0) + /* IFLA_VXLAN_GPE */ nla_total_size(0) + /* IFLA_VXLAN_REMCSUM_NOPARTIAL */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_VNIFILTER */ + nla_total_size(sizeof(struct ip_fan_map) * 256) + 0; } @@ -4486,6 +4679,26 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev) } } + if (fan_has_map(&vxlan->fan)) { + struct nlattr *fan_nest; + struct ip_fan_map *fan_map; + + fan_nest = nla_nest_start(skb, IFLA_VXLAN_FAN_MAP); + if (!fan_nest) + goto nla_put_failure; + list_for_each_entry_rcu(fan_map, &vxlan->fan.fan_maps, list) { + struct ifla_fan_map map; + + map.underlay = fan_map->underlay; + map.underlay_prefix = fan_map->underlay_prefix; + map.overlay = fan_map->overlay; + map.overlay_prefix = fan_map->overlay_prefix; + if (nla_put(skb, IFLA_FAN_MAPPING, sizeof(map), &map)) + goto nla_put_failure; + } + nla_nest_end(skb, fan_nest); + } + if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->cfg.ttl) || nla_put_u8(skb, IFLA_VXLAN_TTL_INHERIT, !!(vxlan->cfg.flags & VXLAN_F_TTL_INHERIT)) || @@ -4826,6 +5039,22 @@ static __net_init int vxlan_init_net(struct net *net) NULL); } +#ifdef CONFIG_SYSCTL +static struct ctl_table_header *vxlan_fan_header; +static unsigned int vxlan_fan_version = 4; + +static struct ctl_table vxlan_fan_sysctls[] = { + { + .procname = "vxlan", + .data = &vxlan_fan_version, + .maxlen = sizeof(vxlan_fan_version), + .mode = 0444, + .proc_handler = proc_dointvec, + }, + {}, +}; +#endif /* CONFIG_SYSCTL */ + static void vxlan_destroy_tunnels(struct net *net, struct list_head *head) { struct vxlan_net *vn = net_generic(net, vxlan_net_id); @@ -4903,7 +5132,20 @@ static int __init vxlan_init_module(void) vxlan_vnifilter_init(); +#ifdef CONFIG_SYSCTL + vxlan_fan_header = register_net_sysctl(&init_net, "net/fan", + vxlan_fan_sysctls); + if (!vxlan_fan_header) { + rc = -ENOMEM; + goto sysctl_failed; + } +#endif /* CONFIG_SYSCTL */ + return 0; +#ifdef CONFIG_SYSCTL +sysctl_failed: + rtnl_link_unregister(&vxlan_link_ops); +#endif /* CONFIG_SYSCTL */ out4: unregister_switchdev_notifier(&vxlan_switchdev_notifier_block); out3: @@ -4917,6 +5159,9 @@ late_initcall(vxlan_init_module); static void __exit vxlan_cleanup_module(void) { +#ifdef CONFIG_SYSCTL + unregister_net_sysctl_table(vxlan_fan_header); +#endif /* CONFIG_SYSCTL */ vxlan_vnifilter_uninit(); rtnl_link_unregister(&vxlan_link_ops); unregister_switchdev_notifier(&vxlan_switchdev_notifier_block); diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 98a3d8ad9415..8c53e4179854 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -118,9 +118,18 @@ struct metadata_dst; */ #define FAN_OVERLAY_CNT 256 +struct ip_fan_map { + __be32 underlay; + __be32 overlay; + u16 underlay_prefix; + u16 overlay_prefix; + u32 overlay_mask; + struct list_head list; + struct rcu_head rcu; +}; + struct ip_tunnel_fan { -/* u32 __rcu *map;*/ - u32 map[FAN_OVERLAY_CNT]; + struct list_head fan_maps; }; struct ip_tunnel { @@ -170,6 +179,11 @@ struct ip_tunnel { bool ignore_df; }; +static inline int fan_has_map(const struct ip_tunnel_fan *fan) +{ + return !list_empty(&fan->fan_maps); +} + struct tnl_ptk_info { __be16 flags; __be16 proto; diff --git a/include/net/vxlan.h b/include/net/vxlan.h index 33ba6fc151cf..e55d5b1483db 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -294,6 +294,8 @@ struct vxlan_dev { struct net *net; /* netns for packet i/o */ struct vxlan_rdst default_dst; /* default destination */ + struct ip_tunnel_fan fan; + struct timer_list age_timer; spinlock_t hash_lock[FDB_HASH_SIZE]; unsigned int addrcnt; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index ab9bcff96e4d..4345ceae5d99 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1378,6 +1378,7 @@ enum { IFLA_VXLAN_VNIFILTER, /* only applicable with COLLECT_METADATA mode */ IFLA_VXLAN_LOCALBYPASS, IFLA_VXLAN_LABEL_POLICY, /* IPv6 flow label policy; ifla_vxlan_label_policy */ + IFLA_VXLAN_FAN_MAP = 33, __IFLA_VXLAN_MAX }; #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1) diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h index a862f0c483b7..f1401060d5b5 100644 --- a/include/uapi/linux/if_tunnel.h +++ b/include/uapi/linux/if_tunnel.h @@ -186,8 +186,6 @@ enum { (TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT | \ TUNNEL_GTP_OPT) -#define TUNNEL_FAN __cpu_to_be16(0x8000) - enum { IFLA_FAN_UNSPEC, IFLA_FAN_MAPPING, @@ -196,7 +194,7 @@ enum { #define IFLA_FAN_MAX (__IFLA_FAN_MAX - 1) -struct ip_tunnel_fan_map { +struct ifla_fan_map { __be32 underlay; __be32 overlay; __u16 underlay_prefix; diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 5faebe94d071..b6ca5742fc46 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -1227,11 +1227,6 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[], } EXPORT_SYMBOL_GPL(ip_tunnel_newlink); -static int ip_tunnel_is_fan(struct ip_tunnel *tunnel) -{ - return tunnel->parms.i_flags & TUNNEL_FAN; -} - int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_parm *p, __u32 fwmark) { @@ -1241,7 +1236,7 @@ int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_net *itn = net_generic(net, tunnel->ip_tnl_net_id); if (dev == itn->fb_tunnel_dev) - return ip_tunnel_is_fan(tunnel) ? 0 : -EINVAL; + return fan_has_map(&tunnel->fan) ? 0 : -EINVAL; t = ip_tunnel_find(itn, p, dev->type); diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index 2fbe16a33bc1..a044da845559 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -102,6 +102,7 @@ #include #include #include +#include #include #include @@ -268,37 +269,144 @@ static int mplsip_rcv(struct sk_buff *skb) } #endif -static int ipip_tunnel_is_fan(struct ip_tunnel *tunnel) +static struct ip_fan_map *ipip_fan_find_map(struct ip_tunnel *t, __be32 daddr) { - return tunnel->parms.i_flags & TUNNEL_FAN; + struct ip_fan_map *fan_map; + + rcu_read_lock(); + list_for_each_entry_rcu(fan_map, &t->fan.fan_maps, list) { + if (fan_map->overlay == + (daddr & inet_make_mask(fan_map->overlay_prefix))) { + rcu_read_unlock(); + return fan_map; + } + } + rcu_read_unlock(); + + return NULL; } -/* - * Determine fan tunnel endpoint to send packet to, based on the inner IP - * address. For an overlay (inner) address Y.A.B.C, the transformation is - * F.G.A.B, where "F" and "G" are the first two octets of the underlay - * network (the network portion of a /16), "A" and "B" are the low order - * two octets of the underlay network host (the host portion of a /16), - * and "Y" is a configured first octet of the overlay network. +/* Determine fan tunnel endpoint to send packet to, based on the inner IP + * address. + * + * Given a /8 overlay and /16 underlay, for an overlay (inner) address + * Y.A.B.C, the transformation is F.G.A.B, where "F" and "G" are the first + * two octets of the underlay network (the network portion of a /16), "A" + * and "B" are the low order two octets of the underlay network host (the + * host portion of a /16), and "Y" is a configured first octet of the + * overlay network. + * + * E.g., underlay host 10.88.3.4/16 with an overlay of 99.0.0.0/8 would + * host overlay subnet 99.3.4.0/24. An overlay network datagram from + * 99.3.4.5 to 99.6.7.8, would be directed to underlay host 10.88.6.7, + * which hosts overlay network subnet 99.6.7.0/24. This transformation is + * described in detail further below. + * + * Using netmasks for the overlay and underlay other than /8 and /16, as + * shown above, can yield larger (or smaller) overlay subnets, with the + * trade-off of allowing fewer (or more) underlay hosts to participate. + * + * The size of each overlay network subnet is defined by the total of the + * network mask of the overlay plus the size of host portion of the + * underlay network. In the above example, /8 + /16 = /24. + * + * E.g., consider underlay host 10.99.238.5/20 and overlay 99.0.0.0/8. In + * this case, the network portion of the underlay is 10.99.224.0/20, and + * the host portion is 0.0.14.5 (12 bits). To determine the overlay + * network subnet, the 12 bits of host portion are left shifted 12 bits + * (/20 - /8) and ORed with the overlay subnet prefix. This yields an + * overlay subnet of 99.224.80/20, composed of 8 bits overlay, followed by + * 12 bits underlay. This yields 12 bits in the overlay network portion, + * allowing for 4094 addresses in each overlay network subnet. The + * trade-off is that fewer hosts may participate in the underlay network, + * as its host address size has shrunk from 16 bits (65534 addresses) in + * the first example to 12 bits (4094 addresses) here. + * + * For fewer hosts per overlay subnet (permitting a larger number of + * underlay hosts to participate), the underlay netmask may be made + * smaller. + * + * E.g., underlay host 10.111.1.2/12 (network 10.96.0.0/12, host portion + * is 0.15.1.2, 20 bits) with an overlay of 33.0.0.0/8 would left shift + * the 20 bits of host by 4 (so that it's highest order bit is adjacent to + * the lowest order bit of the /8 overlay). This yields an overlay subnet + * of 33.240.16.32/28 (8 bits overlay, 20 bits from the host portion of + * the underlay). This provides more addresses for the underlay network + * (approximately 2^20), but each host's segment of the overlay provides + * only 4 bits of addresses (14 usable). + * + * It is also possible to adjust the overlay subnet. + * + * For an overlay of 240.0.0.0/5 and underlay of 10.88.0.0/20, consider + * underlay host 10.88.129.2; the 12 bits of host, 0.0.1.2, are left + * shifted 15 bits (/20 - /5), yielding an overlay network of + * 240.129.0.0/17. An underlay host of 10.88.244.215 would yield an + * overlay network of 242.107.128.0/17. + * + * For an overlay of 100.64.0.0/10 and underlay of 10.224.220.0/24, for + * underlay host 10.224.220.10, the underlay host portion (.10) is left + * shifted 14 bits, yielding an overlay network subnet of 100.66.128.0/18. + * This would permit 254 addresses on the underlay, with each overlay + * segment providing approximately 2^14 - 2 addresses (16382). + * + * For packets being encapsulated, the overlay network destination IP + * address is deconstructed into its overlay and underlay-derived + * portions. The underlay portion (determined by the overlay mask and + * overlay subnet mask) is right shifted according to the size of the + * underlay network mask. This value is then ORed with the network + * portion of the underlay network to produce the underlay network + * destination for the encapsulated datagram. + * + * For example, using the initial example of underlay 10.88.3.4/16 and + * overlay 99.0.0.0/8, with underlay host 10.88.3.4/16 providing overlay + * subnet 99.3.4.0/24 with specfic host 99.3.4.5. A datagram from + * 99.3.4.5 to 99.6.7.8 would first have the underlay host derived portion + * of the address extracted. This is a number of bits equal to underlay + * network host portion. In the destination address, the highest order of + * these bits is one bit lower than the lowest order bit from the overlay + * network mask. * - * E.g., underlay host 10.88.3.4 with an overlay of 99 would host overlay - * subnet 99.3.4.0/24. An overlay network datagram from 99.3.4.5 to - * 99.6.7.8, would be directed to underlay host 10.88.6.7, which hosts - * overlay network 99.6.7.0/24. + * Using the sample value, 99.6.7.8, the overlay mask is /8, and the + * underlay mask is /16 (leaving 16 bits for the host portion). The bits + * to be shifted are the middle two octets, 0.6.7.0, as this is 99.6.7.8 + * ANDed with the mask 0x00ffff00 (which is 16 bits, the highest order of + * which is 1 bit lower than the lowest order overlay address bit). + * + * These octets, 0.6.7.0, are then right shifted 8 bits, yielding 0.0.6.7. + * This value is then ORed with the underlay network portion, + * 10.88.0.0/16, providing 10.88.6.7 as the final underlay destination for + * the encapuslated datagram. + * + * Another transform using the final example: overlay 100.64.0.0/10 and + * underlay 10.224.220.0/24. Consider overlay address 100.66.128.1 + * sending a datagram to 100.66.200.5. In this case, 8 bits (the host + * portion size of 10.224.220.0/24) beginning after the 100.64/10 overlay + * prefix are masked off, yielding 0.2.192.0. This is right shifted 14 + * (32 - 10 - (32 - 24), i.e., the number of bits between the overlay + * network portion and the underlay host portion) bits, yielding 0.0.0.11. + * This is ORed with the underlay network portion, 10.224.220.0/24, giving + * the underlay destination of 10.224.220.11 for overlay destination + * 100.66.200.5. */ static int ipip_build_fan_iphdr(struct ip_tunnel *tunnel, struct sk_buff *skb, struct iphdr *iph) { - unsigned int overlay; + struct ip_fan_map *f_map; u32 daddr, underlay; + f_map = ipip_fan_find_map(tunnel, ip_hdr(skb)->daddr); + if (!f_map) + return -ENOENT; + daddr = ntohl(ip_hdr(skb)->daddr); - overlay = daddr >> 24; - underlay = tunnel->fan.map[overlay]; + underlay = ntohl(f_map->underlay); if (!underlay) return -EINVAL; *iph = tunnel->parms.iph; - iph->daddr = htonl(underlay | ((daddr >> 8) & 0x0000ffff)); + iph->daddr = htonl(underlay | + ((daddr & ~f_map->overlay_mask) >> + (32 - f_map->overlay_prefix - + (32 - f_map->underlay_prefix)))); return 0; } @@ -336,7 +444,7 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP4)) goto tx_error; - if (ipip_tunnel_is_fan(tunnel)) { + if (fan_has_map(&tunnel->fan)) { if (ipip_build_fan_iphdr(tunnel, skb, &fiph)) goto tx_error; tiph = &fiph; @@ -407,6 +515,8 @@ static const struct net_device_ops ipip_netdev_ops = { static void ipip_tunnel_setup(struct net_device *dev) { + struct ip_tunnel *t = netdev_priv(dev); + dev->netdev_ops = &ipip_netdev_ops; dev->header_ops = &ip_tunnel_header_ops; @@ -419,6 +529,7 @@ static void ipip_tunnel_setup(struct net_device *dev) dev->features |= IPIP_FEATURES; dev->hw_features |= IPIP_FEATURES; ip_tunnel_setup(dev, ipip_net_id); + INIT_LIST_HEAD(&t->fan.fan_maps); } static int ipip_tunnel_init(struct net_device *dev) @@ -471,41 +582,65 @@ static void ipip_netlink_parms(struct nlattr *data[], *fwmark = nla_get_u32(data[IFLA_IPTUN_FWMARK]); } -static void ipip_fan_free_map(struct ip_tunnel *t) +static void ipip_fan_flush_map(struct ip_tunnel *t) { - memset(&t->fan.map, 0, sizeof(t->fan.map)); + struct ip_fan_map *fan_map; + + list_for_each_entry_rcu(fan_map, &t->fan.fan_maps, list) { + list_del_rcu(&fan_map->list); + kfree_rcu(fan_map, rcu); + } } -static int ipip_fan_set_map(struct ip_tunnel *t, struct ip_tunnel_fan_map *map) +static int ipip_fan_del_map(struct ip_tunnel *t, __be32 overlay) { - u32 overlay, overlay_mask, underlay, underlay_mask; + struct ip_fan_map *fan_map; - if ((map->underlay_prefix && map->underlay_prefix != 16) || - (map->overlay_prefix && map->overlay_prefix != 8)) - return -EINVAL; + fan_map = ipip_fan_find_map(t, overlay); + if (!fan_map) + return -ENOENT; - overlay = ntohl(map->overlay); - overlay_mask = ntohl(inet_make_mask(map->overlay_prefix)); + list_del_rcu(&fan_map->list); + kfree_rcu(fan_map, rcu); - underlay = ntohl(map->underlay); - underlay_mask = ntohl(inet_make_mask(map->underlay_prefix)); + return 0; +} - if ((overlay & ~overlay_mask) || (underlay & ~underlay_mask)) - return -EINVAL; +static int ipip_fan_add_map(struct ip_tunnel *t, struct ifla_fan_map *map) +{ + __be32 overlay_mask, underlay_mask; + struct ip_fan_map *fan_map; - if (!(overlay & overlay_mask) && (underlay & underlay_mask)) + overlay_mask = inet_make_mask(map->overlay_prefix); + underlay_mask = inet_make_mask(map->underlay_prefix); + + if ((map->overlay & ~overlay_mask) || (map->underlay & ~underlay_mask)) return -EINVAL; - t->parms.i_flags |= TUNNEL_FAN; + if (!(map->overlay & overlay_mask) && (map->underlay & underlay_mask)) + return -EINVAL; - /* Special case: overlay 0 and underlay 0 clears all mappings */ - if (!overlay && !underlay) { - ipip_fan_free_map(t); + /* Special case: overlay 0 and underlay 0: flush all mappings */ + if (!map->overlay && !map->underlay) { + ipip_fan_flush_map(t); return 0; } + + /* Special case: overlay set and underlay 0: clear map for overlay */ + if (!map->underlay) + return ipip_fan_del_map(t, map->overlay); + + if (ipip_fan_find_map(t, map->overlay)) + return -EEXIST; + + fan_map = kmalloc(sizeof(*fan_map), GFP_KERNEL); + fan_map->underlay = map->underlay; + fan_map->overlay = map->overlay; + fan_map->underlay_prefix = map->underlay_prefix; + fan_map->overlay_mask = ntohl(overlay_mask); + fan_map->overlay_prefix = map->overlay_prefix; - overlay >>= (32 - map->overlay_prefix); - t->fan.map[overlay] = underlay; + list_add_tail_rcu(&fan_map->list, &t->fan.fan_maps); return 0; } @@ -513,7 +648,7 @@ static int ipip_fan_set_map(struct ip_tunnel *t, struct ip_tunnel_fan_map *map) static int ipip_netlink_fan(struct nlattr *data[], struct ip_tunnel *t, struct ip_tunnel_parm *parms) { - struct ip_tunnel_fan_map *map; + struct ifla_fan_map *map; struct nlattr *attr; int rem, rv; @@ -525,7 +660,7 @@ static int ipip_netlink_fan(struct nlattr *data[], struct ip_tunnel *t, nla_for_each_nested(attr, data[IFLA_IPTUN_FAN_MAP], rem) { map = nla_data(attr); - rv = ipip_fan_set_map(t, map); + rv = ipip_fan_add_map(t, map); if (rv) return rv; } @@ -619,7 +754,7 @@ static size_t ipip_get_size(const struct net_device *dev) /* IFLA_IPTUN_FWMARK */ nla_total_size(4) + /* IFLA_IPTUN_FAN_MAP */ - nla_total_size(sizeof(struct ip_tunnel_fan_map)) * 256 + + nla_total_size(sizeof(struct ifla_fan_map)) * 256 + 0; } @@ -652,25 +787,22 @@ static int ipip_fill_info(struct sk_buff *skb, const struct net_device *dev) if (tunnel->collect_md) if (nla_put_flag(skb, IFLA_IPTUN_COLLECT_METADATA)) goto nla_put_failure; - if (tunnel->parms.i_flags & TUNNEL_FAN) { + if (fan_has_map(&tunnel->fan)) { struct nlattr *fan_nest; - int i; + struct ip_fan_map *fan_map; fan_nest = nla_nest_start(skb, IFLA_IPTUN_FAN_MAP); if (!fan_nest) goto nla_put_failure; - for (i = 0; i < 256; i++) { - if (tunnel->fan.map[i]) { - struct ip_tunnel_fan_map map; - - map.underlay = htonl(tunnel->fan.map[i]); - map.underlay_prefix = 16; - map.overlay = htonl(i << 24); - map.overlay_prefix = 8; - if (nla_put(skb, IFLA_FAN_MAPPING, - sizeof(map), &map)) - goto nla_put_failure; - } + list_for_each_entry_rcu(fan_map, &tunnel->fan.fan_maps, list) { + struct ifla_fan_map map; + + map.underlay = fan_map->underlay; + map.underlay_prefix = fan_map->underlay_prefix; + map.overlay = fan_map->overlay; + map.overlay_prefix = fan_map->overlay_prefix; + if (nla_put(skb, IFLA_FAN_MAPPING, sizeof(map), &map)) + goto nla_put_failure; } nla_nest_end(skb, fan_nest); } From patchwork Wed May 1 12:34:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1930160 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTxW63rDwz23rw for ; Wed, 1 May 2024 22:40:50 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1s29GH-0004Zg-1K; Wed, 01 May 2024 12:40:37 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1s29GB-0004XS-QO for kernel-team@lists.ubuntu.com; Wed, 01 May 2024 12:40:31 +0000 Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 628083F15B for ; Wed, 1 May 2024 12:40:30 +0000 (UTC) Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-a524b774e39so72200166b.1 for ; Wed, 01 May 2024 05:40:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714567230; x=1715172030; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8PNENLAH/ZXUz9xAexOzQfIbSeZjCOwpoDfcEv9v6UA=; b=BCDC88EGpt9G/IfSNuO00VY+h7HN/WKFWU4xCj1YCSPNJLgsQuVkCa1ITylZXYG0ii bzAxGplKuue1ARZlxZYJUw2UFiRKE783BMzDxe96YF4pGEkuDbXsyCUEleSKgbTqp3Gm jhtD8tmgX/knJjYclF5yVdNhxgZtCLKai7H7Q6UxfvGLsKdQTGP5xXb5TkDbiNBCg4Fs Z+0k2We/SfvYddSfFUy5+ti6V5mQdxhxxJUFQ4BHxwytOIXeoiAyQOBgDEmNF3d7etL4 saN0TLbQamG3ynb70wBif5xRtTdp6JmKgPSNv+vuUb6+d+UB7AZRbiErsCvbxIQiwoXg lGzQ== X-Gm-Message-State: AOJu0YzdiaAK41e6LowiIKDf8AC4W+j18/1mRqaA8pCr6G8jUvRuvGku 0Zz/97MiwCbRYaLqZCHSsMXcqxD6vEzf6UFEEEKNOYfCYOMALW1YIawe8CbJ/6XOKrJ7sgSc0eG T1SDiJ7Hjrhu35BArsRoSpBGcHo5ASMn1MAP29lpbcEs5J6J1tuE7NmZ0ax8IlOqKraZFEzN3Xe lPPtAEd7IK7g== X-Received: by 2002:a17:907:7245:b0:a59:2141:5b0d with SMTP id ds5-20020a170907724500b00a5921415b0dmr5224714ejc.35.1714567229811; Wed, 01 May 2024 05:40:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFNgVuJXlihfl187sRkMrf1TgRB7eKr/nH7TLFe5dSXFv242hT7nZ3wzqwPDR0xMXjVyqHSCQ== X-Received: by 2002:a17:907:7245:b0:a59:2141:5b0d with SMTP id ds5-20020a170907724500b00a5921415b0dmr5224697ejc.35.1714567229341; Wed, 01 May 2024 05:40:29 -0700 (PDT) Received: from gpd.homenet.telecomitalia.it (host-82-49-69-7.retail.telecomitalia.it. [82.49.69.7]) by smtp.gmail.com with ESMTPSA id i7-20020a1709061e4700b00a52244ab819sm16602108ejj.170.2024.05.01.05.40.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 May 2024 05:40:29 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [SRU][N][PATCH 3/4] UBUNTU: SAUCE: fan: Fix NULL pointer dereference Date: Wed, 1 May 2024 14:34:59 +0200 Message-ID: <20240501124023.683940-4-andrea.righi@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240501124023.683940-1-andrea.righi@canonical.com> References: <20240501124023.683940-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Juerg Haefliger BugLink: https://bugs.launchpad.net/bugs/2064508 BugLink: https://bugs.launchpad.net/bugs/1811803 Fix a NULL pointer dereference in fan code that can easily be triggered by running: $ sudo ip link add foo type ipip Which leads to: [ 1.330067] BUG: unable to handle kernel NULL pointer dereference at 0000000000000108 [ 1.330792] IP: [] ipip_netlink_fan.isra.7+0x12/0x280 [ 1.331399] PGD 800000003fb94067 PUD 3fb93067 PMD 0 [ 1.331882] Oops: 0000 [#1] SMP [ 1.332200] Modules linked in: [ 1.332492] CPU: 0 PID: 137 Comm: ip Not tainted 4.4.167+ #5 [ 1.333001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014 [ 1.333740] task: ffff88003c38a640 ti: ffff88003fb5c000 task.ti: ffff88003fb5c000 [ 1.334375] RIP: 0010:[] [] ipip_netlink_fan.isra.7+0x12/0x280 [ 1.335193] RSP: 0018:ffff88003fb5f778 EFLAGS: 00010246 [ 1.335671] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 1.336305] RDX: ffff88003fb5f7f0 RSI: ffff88003fa3f840 RDI: 0000000000000000 [ 1.336940] RBP: ffff88003fb5f7a0 R08: 000000000000000a R09: 0000000000000092 [ 1.337587] R10: 0000000000000000 R11: 00000000000001ad R12: ffff88003fa3f000 [ 1.338267] R13: ffff88003fb5f9d0 R14: ffff88003fa3f840 R15: ffffffff81f4b240 [ 1.338904] FS: 00007f535979b700(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000 [ 1.339590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.340066] CR2: 0000000000000108 CR3: 000000003fb60000 CR4: 0000000000000670 [ 1.340750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.341341] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.341909] Stack: [ 1.342080] 0000000000000000 ffff88003fa3f000 ffff88003fb5f9d0 ffff88003fa3f840 [ 1.342725] ffffffff81f4b240 ffff88003fb5f828 ffffffff817e8515 0000000381356f0e [ 1.343334] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.343943] Call Trace: [ 1.344141] [] ipip_newlink+0xa5/0xc0 [ 1.344553] [] ? __netlink_ns_capable+0x3b/0x40 [ 1.345029] [] rtnl_newlink+0x6fd/0x8b0 [ 1.345699] [] ? kmem_cache_alloc+0x1a1/0x1f0 [ 1.346165] [] ? mempool_alloc_slab+0x15/0x20 [ 1.346630] [] ? validate_nla+0x93/0x1a0 [ 1.347060] [] ? nla_parse+0xa0/0x100 [ 1.347474] [] ? nla_strlcpy+0x52/0x60 [ 1.347891] [] ? rtnl_link_ops_get+0x39/0x50 [ 1.348347] [] ? rtnl_newlink+0x176/0x8b0 [ 1.348784] [] rtnetlink_rcv_msg+0xec/0x230 [ 1.349237] [] ? __kmalloc_node_track_caller+0x24b/0x310 [ 1.349774] [] ? __alloc_skb+0x87/0x1d0 [ 1.350198] [] ? rtnetlink_rcv+0x30/0x30 [ 1.350628] [] netlink_rcv_skb+0xa6/0xc0 [ 1.351059] [] rtnetlink_rcv+0x28/0x30 [ 1.351476] [] netlink_unicast+0x190/0x240 [ 1.351919] [] netlink_sendmsg+0x33a/0x3b0 [ 1.352363] [] ? aa_sock_msg_perm+0x61/0x150 [ 1.352820] [] sock_sendmsg+0x3e/0x50 [ 1.353235] [] ___sys_sendmsg+0x287/0x2a0 [ 1.353672] [] ? mem_cgroup_try_charge+0x6b/0x1e0 [ 1.354162] [] ? handle_mm_fault+0xecd/0x1b80 [ 1.354625] [] ? __alloc_fd+0xc7/0x190 [ 1.355044] [] __sys_sendmsg+0x51/0x90 [ 1.355525] [] SyS_sendmsg+0x12/0x20 [ 1.355933] [] entry_SYSCALL_64_fastpath+0x22/0xcb [ 1.356426] Code: 50 01 00 00 01 eb d3 49 8d 94 24 b8 08 00 00 eb ac e8 83 cf 89 ff 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 <48> 8b 9f 08 01 00 00 48 85 db 74 1e 8b 02 85 c0 75 25 44 0f b7 [ 1.358557] RIP [] ipip_netlink_fan.isra.7+0x12/0x280 [ 1.359086] RSP [ 1.359359] CR2: 0000000000000108 [ 1.359637] ---[ end trace 7820fbc7ced5dd6e ]--- Signed-off-by: Juerg Haefliger Acked-by: Colin Ian King Signed-off-by: Seth Forshee --- net/ipv4/ipip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index a044da845559..6025822763bf 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -652,7 +652,7 @@ static int ipip_netlink_fan(struct nlattr *data[], struct ip_tunnel *t, struct nlattr *attr; int rem, rv; - if (!data[IFLA_IPTUN_FAN_MAP]) + if (data == NULL || !data[IFLA_IPTUN_FAN_MAP]) return 0; if (parms->iph.daddr) From patchwork Wed May 1 12:35:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1930162 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTxW7525yz1ydT for ; Wed, 1 May 2024 22:40:51 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1s29GI-0004aI-71; Wed, 01 May 2024 12:40:38 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1s29GC-0004XW-1d for kernel-team@lists.ubuntu.com; Wed, 01 May 2024 12:40:32 +0000 Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 9D6353FA4A for ; Wed, 1 May 2024 12:40:31 +0000 (UTC) Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-51bc35e78a2so5961144e87.2 for ; Wed, 01 May 2024 05:40:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714567231; x=1715172031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TmbGpxDt2kpB8IKPmFYpscV6xssVc4+q0ljZp5pdcp8=; b=ZK21aci0/xrb99nQtMdA1LR7WKA8XMwCktoV9wvbk+lNawVvKV7BRBPi/XDlCRRJyC nzmGKND7xBeV4B7gDES1hTz+DzGA45pXXcYh7x2Xl0TohuPrXcMoYVgwdZ2BknvcR0ZT /wM6wnGWqDfHNGfAwbmOsuO8OTQGzEBlEa973N1EUBK6r7Wr4nu3eo4ezHCzwnyd04yz pBGGSRRC4jwU9GDXBhjsmC9uEieLSSpWGuDvkpdszhJ3hvYGKoJgah/24yFP58Nu0Pzx x/3+kPohki2/B4Ef7uk+Cichxhw66Q1NL2g7HCn/dGR/wGbdd+a6f2T94yUzyCQCqHRy 5PgQ== X-Gm-Message-State: AOJu0YwF/qfJ4FaOa1KjcfH81P3vgvAnSNJr4cTQKtPuDeoUQ3BqgyOV ti5RiV++lHWv3ufeAN9oR/xztllpYAFgMPIdvamu5IdFRCnw9lAAtECshBVU1flaehVxMXrsl/C 8Qt2Cl1n6X3Ku2flEj3sBQU3bRWD764a9LWh61JUBR7nXR+4LOg86O7XSV9I5vv2SG1Wci3x9sJ UOz9GBAbwARA== X-Received: by 2002:a19:7419:0:b0:51c:eeee:8679 with SMTP id v25-20020a197419000000b0051ceeee8679mr1752627lfe.56.1714567230780; Wed, 01 May 2024 05:40:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHoH49F0KwdylYrMTJ95fSSO+OgsVq+r+sEVD+xOR2UolSKtKQab4T723mGqLxBMILBjWwOjA== X-Received: by 2002:a19:7419:0:b0:51c:eeee:8679 with SMTP id v25-20020a197419000000b0051ceeee8679mr1752607lfe.56.1714567230158; Wed, 01 May 2024 05:40:30 -0700 (PDT) Received: from gpd.homenet.telecomitalia.it (host-82-49-69-7.retail.telecomitalia.it. [82.49.69.7]) by smtp.gmail.com with ESMTPSA id i7-20020a1709061e4700b00a52244ab819sm16602108ejj.170.2024.05.01.05.40.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 May 2024 05:40:29 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [SRU][N][PATCH 4/4] UBUNTU: SAUCE: fan: support vxlan strict length validation Date: Wed, 1 May 2024 14:35:00 +0200 Message-ID: <20240501124023.683940-5-andrea.righi@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240501124023.683940-1-andrea.righi@canonical.com> References: <20240501124023.683940-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/2064508 Make IFLA_VXLAN_FAN_MAP compatible with the strict length check validation enforced by vxlan_policy for attribute types >= IFLA_VXLAN_LOCALBYPASS. This allows to support new vxlan attribute types with kernels >= 6.8, without breaking the existent user-space tools relying on Ubuntu FAN. Signed-off-by: Andrea Righi --- drivers/net/vxlan/vxlan_core.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index f16a4679e5ee..d43d2221ebf0 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -3530,6 +3530,22 @@ static void vxlan_raw_setup(struct net_device *dev) dev->netdev_ops = &vxlan_netdev_raw_ops; } +/* Validate Ubuntu FAN payload. + * + * This is required to bypass the strict length validation enforced for the + * attribute types >= IFLA_VXLAN_LOCALBYPASS in vxlan_policy. + * + * In this way we can continue to use the same allocated ID for + * IFLA_VXLAN_FAN_MAP, without breaking the existing user-space and also + * future kernel ABIs that may add new attribute types to vxlan_policy. + */ +static int fan_map_validate_entry(const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + /* Accept any payload for Ubuntu FAN */ + return 0; +} + static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = { [IFLA_VXLAN_UNSPEC] = { .strict_start_type = IFLA_VXLAN_LOCALBYPASS }, [IFLA_VXLAN_ID] = { .type = NLA_U32 }, @@ -3564,6 +3580,9 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = { [IFLA_VXLAN_VNIFILTER] = { .type = NLA_U8 }, [IFLA_VXLAN_LOCALBYPASS] = NLA_POLICY_MAX(NLA_U8, 1), [IFLA_VXLAN_LABEL_POLICY] = NLA_POLICY_MAX(NLA_U32, VXLAN_LABEL_MAX), + [IFLA_VXLAN_FAN_MAP] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, + fan_map_validate_entry, + sizeof(struct ifla_fan_map) * 256), }; static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[],