From patchwork Fri Jan 20 06:10:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Ahern X-Patchwork-Id: 717482 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3v4Vgn3vLQz9s5g for ; Fri, 20 Jan 2017 17:10:45 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=cumulusnetworks.com header.i=@cumulusnetworks.com header.b="RHGLAdXG"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751345AbdATGKo (ORCPT ); Fri, 20 Jan 2017 01:10:44 -0500 Received: from mail-pg0-f43.google.com ([74.125.83.43]:34517 "EHLO mail-pg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751267AbdATGKi (ORCPT ); Fri, 20 Jan 2017 01:10:38 -0500 Received: by mail-pg0-f43.google.com with SMTP id 14so20838976pgg.1 for ; Thu, 19 Jan 2017 22:10:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cumulusnetworks.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=MbTsvTeh55DgeBMPb5Kd3pTRKK/DF8Rn1qrw3rJ0pa8=; b=RHGLAdXGxXPun47uYceclviP1+jpAbYGYwq2XV29mF/ThCcrN7Lzw3smT8FOsrN3SU 5WMvmPJXuZ4/kwKdzc8giGV5E5219R2alG5jVz+/kh9mHeGBXbcrqVo41ZjSzyFylVDM jIIOOKQHRpwQ/gyCkmarcdUInE1IfUHGLROk4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=MbTsvTeh55DgeBMPb5Kd3pTRKK/DF8Rn1qrw3rJ0pa8=; b=hsP6Hfnt9CAh+353TBFY7Gzg8iqjioxpbQ7gMqTL4Ol6BOtsXbxGVcdyu+bMUjyJCw wTHDvxq3Krrwc77Sq+SuHib1rAgNygH28D3cXLzr/DhTjH2Vu56VE10SRNcg3PO85liW RdhXuTBD+m4ODHH5hb+EAUB/+1xVw21rEzWLiYvnMCCIouen4YZcf4udEDl4+35sghFH TMDgrySrlUzQhmf/Fiube+h+sY0fliQEP3kMz7MKLh1vdzEuhxSRW7kzwrruCv6EKksY eNafr2LOQ4mQ9C7ja5wduZFVJJ3iYczqQ/kn5YjF4GsrarNfwumd5+eyYL0NHinNay4l qqnA== X-Gm-Message-State: AIkVDXIbxckjAZ/IwZjQgVpaJgjX9tz2LPw3XN+Nd7Ah790E1ysDdg1ei+nS4GGehWlSAIHo X-Received: by 10.99.120.65 with SMTP id t62mr14860120pgc.149.1484892637425; Thu, 19 Jan 2017 22:10:37 -0800 (PST) Received: from kenny.cumulusnetworks.com. ([216.129.126.126]) by smtp.googlemail.com with ESMTPSA id n8sm13316897pgc.0.2017.01.19.22.10.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Jan 2017 22:10:36 -0800 (PST) From: David Ahern To: netdev@vger.kernel.org Cc: David Ahern Subject: [PATCH net-next v2 2/2] net: ipv6: Add option to dump multipath routes via RTA_MULTIPATH attribute Date: Thu, 19 Jan 2017 22:10:26 -0800 Message-Id: <1484892626-14257-3-git-send-email-dsa@cumulusnetworks.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1484892626-14257-1-git-send-email-dsa@cumulusnetworks.com> References: <1484892626-14257-1-git-send-email-dsa@cumulusnetworks.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org IPv6 returns multipath routes as a series of individual routes making their display and handling by userspace different and more complicated than IPv4, putting the burden on the user to see that a route is part of a multipath route and internally creating a multipath route if desired (e.g., libnl does this as of commit 29b71371e764). This patch addresses this difference, allowing users to request multipath routes to be returned using the RTA_MULTIPATH attribute in dump requests. To maintain backwards compatibility, a user has to opt in to the change in behavior. Currently, the ancillary header in the netlink request message is supposed to be an rtmsg, but it could be just an rtgenmsg since only the family is actually referenced in the code to this point. But, ip for example sends an ifinfomsg (plus a filter in newer versions) which is completely wrong, but an established code base at this point. This patch requires a struct rtmsg for the ancillary header, something it is able to uniquely detect given the different struct sizes. Within rtmsg, rtm_flags must have the new (from patch 1) RTM_F_ALL_NEXTHOPS flag set. The end result is that IPv6 multipath routes can be treated and displayed in a format similar to IPv4: $ ip -6 ro ls vrf red 2001:db8::/120 metric 1024 nexthop via 2001:db8:1::62 dev eth1 weight 1 nexthop via 2001:db8:1::61 dev eth1 weight 1 nexthop via 2001:db8:1::60 dev eth1 weight 1 nexthop via 2001:db8:1::59 dev eth1 weight 1 2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium ... v2 - changed user api to opt in to new behavior from attribute appended to the request to requiring an rtmsg struct with the RTM_F_ALL_NEXTHOPS set Suggested-by: Dinesh Dutt Signed-off-by: David Ahern --- include/net/ip6_route.h | 1 + include/uapi/linux/rtnetlink.h | 5 +- net/ipv6/ip6_fib.c | 28 ++++++++++- net/ipv6/route.c | 106 +++++++++++++++++++++++++++++++++-------- 4 files changed, 118 insertions(+), 22 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 9dc2c182a263..7620974826a5 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -154,6 +154,7 @@ struct rt6_rtnl_dump_arg { struct sk_buff *skb; struct netlink_callback *cb; struct net *net; + unsigned int rtm_flags; }; int rt6_dump_route(struct rt6_info *rt, void *p_arg); diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 7fb206bc42f9..5a070f24f111 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -276,7 +276,10 @@ enum rt_scope_t { #define RTM_F_EQUALIZE 0x400 /* Multipath equalizer: NI */ #define RTM_F_PREFIX 0x800 /* Prefix addresses */ #define RTM_F_LOOKUP_TABLE 0x1000 /* set rtm_table to FIB lookup result */ -#define RTM_F_ALL_NEXTHOPS 0x2000 /* delete all nexthops (IPv6) */ +#define RTM_F_ALL_NEXTHOPS 0x2000 /* IPv6 flag: + * delete: remove all nexthops + * dump: nexthops can use RTA_MULTIPATH + */ /* Reserved table identifiers */ diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index ef5485204522..ef2bf33e34e8 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -308,16 +308,27 @@ static void __net_init fib6_tables_init(struct net *net) static int fib6_dump_node(struct fib6_walker *w) { + struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *)w->args; int res; struct rt6_info *rt; for (rt = w->leaf; rt; rt = rt->dst.rt6_next) { - res = rt6_dump_route(rt, w->args); + res = rt6_dump_route(rt, arg); if (res < 0) { /* Frame is full, suspend walking */ w->leaf = rt; return 1; } + + /* if multipath routes are dumped in one route with + * the RTA_MULTIPATH attribute, then jump rt to point + * to the last sibling of this route (no need to dump + * the sibling routes again) + */ + if ((arg->rtm_flags & RTM_F_ALL_NEXTHOPS) && rt->rt6i_nsiblings) + rt = list_last_entry(&rt->rt6i_siblings, + struct rt6_info, + rt6i_siblings); } w->leaf = NULL; return 0; @@ -398,6 +409,7 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) struct fib6_walker *w; struct fib6_table *tb; struct hlist_head *head; + unsigned int flags = 0; int res = 0; s_h = cb->args[0]; @@ -422,9 +434,23 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) cb->args[2] = (long)w; } + /* sadly, the size of the ancillary header can vary: the correct + * header is struct rtmsg as passed by libnl. iproute2 sends + * ifinfomsg [+ filter]. Technically, passing only rtgenmsg is + * sufficient since the header has not been parsed before. Luckily, + * the struct sizes sufficiently vary to detect the permutations. + * For the dump request to contain flags, require rtmsg header. + */ + if (nlmsg_len(cb->nlh) == sizeof(struct rtmsg)) { + struct rtmsg *rtm = nlmsg_data(cb->nlh); + + flags = rtm->rtm_flags; + } + arg.skb = skb; arg.cb = cb; arg.net = net; + arg.rtm_flags = flags; w->args = &arg; rcu_read_lock(); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index f207d4d0a782..ee57adf1a765 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3195,11 +3195,65 @@ static inline size_t rt6_nlmsg_size(struct rt6_info *rt) + lwtunnel_get_encap_size(rt->dst.lwtstate); } +static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt, + unsigned int *flags) +{ + if (!netif_carrier_ok(rt->dst.dev)) { + *flags |= RTNH_F_LINKDOWN; + if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown) + *flags |= RTNH_F_DEAD; + } + + if (rt->rt6i_flags & RTF_GATEWAY) { + if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0) + goto nla_put_failure; + } + + if (rt->dst.dev && + nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex)) + goto nla_put_failure; + + if (rt->dst.lwtstate && + lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + +static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt) +{ + struct rtnexthop *rtnh; + unsigned int flags = 0; + + rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh)); + if (!rtnh) + goto nla_put_failure; + + rtnh->rtnh_hops = 0; + rtnh->rtnh_ifindex = rt->dst.dev ? rt->dst.dev->ifindex : 0; + + if (rt6_nexthop_info(skb, rt, &flags) < 0) + goto nla_put_failure; + + rtnh->rtnh_flags = flags; + + /* length of rtnetlink header + attributes */ + rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *)rtnh; + + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + static int rt6_fill_node(struct net *net, struct sk_buff *skb, struct rt6_info *rt, struct in6_addr *dst, struct in6_addr *src, int iif, int type, u32 portid, u32 seq, - unsigned int flags) + unsigned int nlm_flags, unsigned int rtm_flags) { u32 metrics[RTAX_MAX]; struct rtmsg *rtm; @@ -3207,7 +3261,7 @@ static int rt6_fill_node(struct net *net, long expires; u32 table; - nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), flags); + nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), nlm_flags); if (!nlh) return -EMSGSIZE; @@ -3246,11 +3300,6 @@ static int rt6_fill_node(struct net *net, else rtm->rtm_type = RTN_UNICAST; rtm->rtm_flags = 0; - if (!netif_carrier_ok(rt->dst.dev)) { - rtm->rtm_flags |= RTNH_F_LINKDOWN; - if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown) - rtm->rtm_flags |= RTNH_F_DEAD; - } rtm->rtm_scope = RT_SCOPE_UNIVERSE; rtm->rtm_protocol = rt->rt6i_protocol; if (rt->rt6i_flags & RTF_DYNAMIC) @@ -3314,17 +3363,36 @@ static int rt6_fill_node(struct net *net, if (rtnetlink_put_metrics(skb, metrics) < 0) goto nla_put_failure; - if (rt->rt6i_flags & RTF_GATEWAY) { - if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0) - goto nla_put_failure; - } - - if (rt->dst.dev && - nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex)) - goto nla_put_failure; if (nla_put_u32(skb, RTA_PRIORITY, rt->rt6i_metric)) goto nla_put_failure; + /* if user wants nexthops included via the RTA_MULTIPATH + * attribute, then walk the siblings list and add each + * as a nexthop + */ + if ((rtm_flags & RTM_F_ALL_NEXTHOPS) && rt->rt6i_nsiblings) { + struct rt6_info *sibling, *next_sibling; + struct nlattr *mp; + + mp = nla_nest_start(skb, RTA_MULTIPATH); + if (!mp) + goto nla_put_failure; + + if (rt6_add_nexthop(skb, rt) < 0) + goto nla_put_failure; + + list_for_each_entry_safe(sibling, next_sibling, + &rt->rt6i_siblings, rt6i_siblings) { + if (rt6_add_nexthop(skb, sibling) < 0) + goto nla_put_failure; + } + + nla_nest_end(skb, mp); + } else { + if (rt6_nexthop_info(skb, rt, &rtm->rtm_flags) < 0) + goto nla_put_failure; + } + expires = (rt->rt6i_flags & RTF_EXPIRES) ? rt->dst.expires - jiffies : 0; if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, rt->dst.error) < 0) @@ -3333,8 +3401,6 @@ static int rt6_fill_node(struct net *net, if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt->rt6i_flags))) goto nla_put_failure; - if (lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0) - goto nla_put_failure; nlmsg_end(skb, nlh); return 0; @@ -3362,7 +3428,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg) return rt6_fill_node(arg->net, arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq, - NLM_F_MULTI); + NLM_F_MULTI, arg->rtm_flags); } static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh) @@ -3453,7 +3519,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh) err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif, RTM_NEWROUTE, NETLINK_CB(in_skb).portid, - nlh->nlmsg_seq, 0); + nlh->nlmsg_seq, 0, 0); if (err < 0) { kfree_skb(skb); goto errout; @@ -3480,7 +3546,7 @@ void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info, goto errout; err = rt6_fill_node(net, skb, rt, NULL, NULL, 0, - event, info->portid, seq, nlm_flags); + event, info->portid, seq, nlm_flags, 0); if (err < 0) { /* -EMSGSIZE implies BUG in rt6_nlmsg_size() */ WARN_ON(err == -EMSGSIZE);