From patchwork Fri Jun 21 15:45:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120397 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45VjjX5br1z9s00 for ; Sat, 22 Jun 2019 01:46:56 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726417AbfFUPqz (ORCPT ); Fri, 21 Jun 2019 11:46:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39886 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726253AbfFUPqz (ORCPT ); Fri, 21 Jun 2019 11:46:55 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A4E47C049598; Fri, 21 Jun 2019 15:46:48 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 51B4919C4F; Fri, 21 Jun 2019 15:46:45 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 01/11] fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED Date: Fri, 21 Jun 2019 17:45:20 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 21 Jun 2019 15:46:54 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The following patches add back the ability to dump IPv4 and IPv6 exception routes, and we need to allow selection of regular routes or exceptions. Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions: iproute2 passes it in dump requests (except for IPv6 cache flush requests, this will be fixed in iproute2) and this used to work as long as exceptions were stored directly in the FIB, for both IPv4 and IPv6. Caveat: if strict checking is not requested (that is, if the dump request doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol, tables or route types. In this case, filtering on RTM_F_CLONED would be inconsistent: we would fix 'ip route list cache' by returning exception routes and at the same time introduce another bug in case another selector is present, e.g. on 'ip route list cache table main' we would return all exception routes, without filtering on tables. Keep this consistent by applying no filters at all, and dumping both routes and exceptions, if strict checking is not requested. iproute2 currently filters results anyway, and no unwanted results will be presented to the user. The kernel will just dump more data than needed. v7: No changes v6: Rebase onto net-next, no changes v5: New patch: add dump_routes and dump_exceptions flags in filter and simply clear the unwanted one if strict checking is enabled, don't ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set. Skip filtering altogether if no strict checking is requested: selecting routes or exceptions only would be inconsistent with the fact we can't filter on tables. Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- include/net/ip_fib.h | 2 ++ net/ipv4/fib_frontend.c | 8 +++++++- net/ipv6/ip6_fib.c | 3 ++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 7e1e621a56df..4c81846ccce8 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -245,6 +245,8 @@ struct fib_dump_filter { /* filter_set is an optimization that an entry is set */ bool filter_set; bool dump_all_families; + bool dump_routes; + bool dump_exceptions; unsigned char protocol; unsigned char rt_type; unsigned int flags; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 108191667531..ed7fb5fd885c 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -912,10 +912,15 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh, NL_SET_ERR_MSG(extack, "Invalid values in header for FIB dump request"); return -EINVAL; } + if (rtm->rtm_flags & ~(RTM_F_CLONED | RTM_F_PREFIX)) { NL_SET_ERR_MSG(extack, "Invalid flags for FIB dump request"); return -EINVAL; } + if (rtm->rtm_flags & RTM_F_CLONED) + filter->dump_routes = false; + else + filter->dump_exceptions = false; filter->dump_all_families = (rtm->rtm_family == AF_UNSPEC); filter->flags = rtm->rtm_flags; @@ -962,9 +967,10 @@ EXPORT_SYMBOL_GPL(ip_valid_fib_dump_req); static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) { + struct fib_dump_filter filter = { .dump_routes = true, + .dump_exceptions = true }; const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); - struct fib_dump_filter filter = {}; unsigned int h, s_h; unsigned int e = 0, s_e; struct fib_table *tb; diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 1d16a01eccf5..f2a3cc4c8f12 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -552,9 +552,10 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) { + struct rt6_rtnl_dump_arg arg = { .filter.dump_exceptions = true, + .filter.dump_routes = true }; const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); - struct rt6_rtnl_dump_arg arg = {}; unsigned int h, s_h; unsigned int e = 0, s_e; struct fib6_walker *w; From patchwork Fri Jun 21 15:45:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120399 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjjh3yspz9s00 for ; Sat, 22 Jun 2019 01:47:04 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726451AbfFUPrD (ORCPT ); Fri, 21 Jun 2019 11:47:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39954 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726196AbfFUPrD (ORCPT ); Fri, 21 Jun 2019 11:47:03 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 39A57C0528CB; Fri, 21 Jun 2019 15:46:53 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1C72D19C6A; Fri, 21 Jun 2019 15:46:48 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 02/11] ipv4/fib_frontend: Allow RTM_F_CLONED flag to be used for filtering Date: Fri, 21 Jun 2019 17:45:21 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 21 Jun 2019 15:47:03 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This functionally reverts the check introduced by commit e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking") as modified by commit e4e92fb160d7 ("net/ipv4: Bail early if user only wants prefix entries"). As we are preparing to fix listing of IPv4 cached routes, we need to give userspace a way to request them. Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- v7: No changes v6: Rebase onto net-next, no changes v5: No changes v4: New patch net/ipv4/fib_frontend.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index ed7fb5fd885c..317339cd7f03 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -987,8 +987,8 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) filter.flags = rtm->rtm_flags & (RTM_F_PREFIX | RTM_F_CLONED); } - /* fib entries are never clones and ipv4 does not use prefix flag */ - if (filter.flags & (RTM_F_PREFIX | RTM_F_CLONED)) + /* ipv4 does not use prefix flag */ + if (filter.flags & RTM_F_PREFIX) return skb->len; if (filter.table_id) { From patchwork Fri Jun 21 15:45:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120398 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjjc0Trwz9s00 for ; Sat, 22 Jun 2019 01:47:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726437AbfFUPq7 (ORCPT ); Fri, 21 Jun 2019 11:46:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54966 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726253AbfFUPq6 (ORCPT ); Fri, 21 Jun 2019 11:46:58 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 724BF3086262; Fri, 21 Jun 2019 15:46:57 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id A2B2D19C4F; Fri, 21 Jun 2019 15:46:53 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 03/11] ipv4/route: Allow NULL flowinfo in rt_fill_info() Date: Fri, 21 Jun 2019 17:45:22 +0200 Message-Id: <01f48f723d7d46c417a502d0548407658d19aaab.1561131177.git.sbrivio@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Fri, 21 Jun 2019 15:46:58 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In the next patch, we're going to use rt_fill_info() to dump exception routes upon RTM_GETROUTE with NLM_F_ROOT, meaning userspace is requesting a dump and not a specific route selection, which in turn implies the input interface is not relevant. Update rt_fill_info() to handle a NULL flowinfo. v7: If fl4 is NULL, explicitly set r->rtm_tos to 0: it's not initialised otherwise (spotted by David Ahern) v6: New patch Suggested-by: David Ahern Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- net/ipv4/route.c | 56 ++++++++++++++++++++++++++---------------------- 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 66cbe8a7a168..b1628d25e828 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2699,7 +2699,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, r->rtm_family = AF_INET; r->rtm_dst_len = 32; r->rtm_src_len = 0; - r->rtm_tos = fl4->flowi4_tos; + r->rtm_tos = fl4 ? fl4->flowi4_tos : 0; r->rtm_table = table_id < 256 ? table_id : RT_TABLE_COMPAT; if (nla_put_u32(skb, RTA_TABLE, table_id)) goto nla_put_failure; @@ -2727,7 +2727,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, nla_put_u32(skb, RTA_FLOW, rt->dst.tclassid)) goto nla_put_failure; #endif - if (!rt_is_input_route(rt) && + if (fl4 && !rt_is_input_route(rt) && fl4->saddr != src) { if (nla_put_in_addr(skb, RTA_PREFSRC, fl4->saddr)) goto nla_put_failure; @@ -2767,36 +2767,40 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, if (rtnetlink_put_metrics(skb, metrics) < 0) goto nla_put_failure; - if (fl4->flowi4_mark && - nla_put_u32(skb, RTA_MARK, fl4->flowi4_mark)) - goto nla_put_failure; - - if (!uid_eq(fl4->flowi4_uid, INVALID_UID) && - nla_put_u32(skb, RTA_UID, - from_kuid_munged(current_user_ns(), fl4->flowi4_uid))) - goto nla_put_failure; + if (fl4) { + if (fl4->flowi4_mark && + nla_put_u32(skb, RTA_MARK, fl4->flowi4_mark)) + goto nla_put_failure; - error = rt->dst.error; + if (!uid_eq(fl4->flowi4_uid, INVALID_UID) && + nla_put_u32(skb, RTA_UID, + from_kuid_munged(current_user_ns(), + fl4->flowi4_uid))) + goto nla_put_failure; - if (rt_is_input_route(rt)) { + if (rt_is_input_route(rt)) { #ifdef CONFIG_IP_MROUTE - if (ipv4_is_multicast(dst) && !ipv4_is_local_multicast(dst) && - IPV4_DEVCONF_ALL(net, MC_FORWARDING)) { - int err = ipmr_get_route(net, skb, - fl4->saddr, fl4->daddr, - r, portid); - - if (err <= 0) { - if (err == 0) - return 0; - goto nla_put_failure; - } - } else + if (ipv4_is_multicast(dst) && + !ipv4_is_local_multicast(dst) && + IPV4_DEVCONF_ALL(net, MC_FORWARDING)) { + int err = ipmr_get_route(net, skb, + fl4->saddr, fl4->daddr, + r, portid); + + if (err <= 0) { + if (err == 0) + return 0; + goto nla_put_failure; + } + } else #endif - if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif)) - goto nla_put_failure; + if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif)) + goto nla_put_failure; + } } + error = rt->dst.error; + if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, error) < 0) goto nla_put_failure; From patchwork Fri Jun 21 15:45:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120400 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjjl27YXz9s00 for ; Sat, 22 Jun 2019 01:47:07 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726466AbfFUPrG (ORCPT ); Fri, 21 Jun 2019 11:47:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58904 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726196AbfFUPrF (ORCPT ); Fri, 21 Jun 2019 11:47:05 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 62E26356EC; Fri, 21 Jun 2019 15:47:00 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id DBAF819C68; Fri, 21 Jun 2019 15:46:57 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 04/11] ipv4: Dump route exceptions if requested Date: Fri, 21 Jun 2019 17:45:23 +0200 Message-Id: <8d3b68cd37fb5fddc470904cdd6793fcf480c6c1.1561131177.git.sbrivio@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 21 Jun 2019 15:47:05 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Since commit 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions."), cached exception routes are stored as a separate entity, so they are not dumped on a FIB dump, even if the RTM_F_CLONED flag is passed. This implies that the command 'ip route list cache' doesn't return any result anymore. If the RTM_F_CLONED is passed, and strict checking requested, retrieve nexthop exception routes and dump them. If no strict checking is requested, filtering can't be performed consistently: dump everything in that case. With this, we need to add an argument to the netlink callback in order to track how many entries were already dumped for the last leaf included in a partial netlink dump. A single additional argument is sufficient, even if we traverse logically nested structures (nexthop objects, hash table buckets, bucket chains): it doesn't matter if we stop in the middle of any of those, because they are always traversed the same way. As an example, s_i values in [], s_fa values in (): node (fa) #1 [1] nexthop #1 bucket #1 -> #0 in chain (1) bucket #2 -> #0 in chain (2) -> #1 in chain (3) -> #2 in chain (4) bucket #3 -> #0 in chain (5) -> #1 in chain (6) nexthop #2 bucket #1 -> #0 in chain (7) -> #1 in chain (8) bucket #2 -> #0 in chain (9) -- node (fa) #2 [2] nexthop #1 bucket #1 -> #0 in chain (1) -> #1 in chain (2) bucket #2 -> #0 in chain (3) it doesn't matter if we stop at (3), (4), (7) for "node #1", or at (2) for "node #2": walking flattens all that. It would even be possible to drop the distinction between the in-tree (s_i) and in-node (s_fa) counter, but a further improvement might advise against this. This is only as accurate as the existing tracking mechanism for leaves: if a partial dump is restarted after exceptions are removed or expired, we might skip some non-dumped entries. To improve this, we could attach a 'sernum' attribute (similar to the one used for IPv6) to nexthop entities, and bump this counter whenever exceptions change: having a distinction between the two counters would make this more convenient. Listing of exception routes (modified routes pre-3.5) was tested against these versions of kernel and iproute2: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 3.5-rc4 + + + + + 4.4 4.9 4.14 4.15 4.19 5.0 5.1 fixed + + + + + v7: - Move loop over nexthop objects to route.c, and pass struct fib_info and table ID to it, not a struct fib_alias (suggested by David Ahern) - While at it, note that the NULL check on fa->fa_info is redundant, and the check on RTNH_F_DEAD is also not consistent with what's done with regular route listing: just keep it for nhc_flags - Rename entry point function for dumping exceptions to fib_dump_info_fnhe(), and rearrange arguments for consistency with fib_dump_info() - Rename fnhe_dump_buckets() to fnhe_dump_bucket() and make it handle one bucket at a time - Expand commit message to describe why we can have a single "skip" counter for all exceptions stored in bucket chains in nexthop objects (suggested by David Ahern) v6: - Rebased onto net-next - Loop over nexthop paths too. Move loop over fnhe buckets to route.c, avoids need to export rt_fill_info() and to touch exceptions from fib_trie.c. Pass NULL as flow to rt_fill_info(), it now allows that (suggested by David Ahern) Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- include/net/route.h | 4 +++ net/ipv4/fib_trie.c | 44 +++++++++++++++++++-------- net/ipv4/route.c | 73 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 108 insertions(+), 13 deletions(-) diff --git a/include/net/route.h b/include/net/route.h index 065b47754f05..cfcd0f5980f9 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -230,6 +230,10 @@ void fib_modify_prefix_metric(struct in_ifaddr *ifa, u32 new_metric); void rt_add_uncached_list(struct rtable *rt); void rt_del_uncached_list(struct rtable *rt); +int fib_dump_info_fnhe(struct sk_buff *skb, struct netlink_callback *cb, + u32 table_id, struct fib_info *fi, + int *fa_index, int fa_start); + static inline void ip_rt_put(struct rtable *rt) { /* dst_release() accepts a NULL parameter. diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 90f0fc8c87bd..4400f5051977 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -2090,22 +2090,26 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb, { unsigned int flags = NLM_F_MULTI; __be32 xkey = htonl(l->key); + int i, s_i, i_fa, s_fa, err; struct fib_alias *fa; - int i, s_i; - if (filter->filter_set) + if (filter->filter_set || + !filter->dump_exceptions || !filter->dump_routes) flags |= NLM_F_DUMP_FILTERED; s_i = cb->args[4]; + s_fa = cb->args[5]; i = 0; /* rcu_read_lock is hold by caller */ hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) { - int err; + struct fib_info *fi = fa->fa_info; if (i < s_i) goto next; + i_fa = 0; + if (tb->tb_id != fa->tb_id) goto next; @@ -2114,29 +2118,43 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb, goto next; if ((filter->protocol && - fa->fa_info->fib_protocol != filter->protocol)) + fi->fib_protocol != filter->protocol)) goto next; if (filter->dev && - !fib_info_nh_uses_dev(fa->fa_info, filter->dev)) + !fib_info_nh_uses_dev(fi, filter->dev)) goto next; } - err = fib_dump_info(skb, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, RTM_NEWROUTE, - tb->tb_id, fa->fa_type, - xkey, KEYLENGTH - fa->fa_slen, - fa->fa_tos, fa->fa_info, flags); - if (err < 0) { - cb->args[4] = i; - return err; + if (filter->dump_routes && !s_fa) { + err = fib_dump_info(skb, NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, RTM_NEWROUTE, + tb->tb_id, fa->fa_type, + xkey, KEYLENGTH - fa->fa_slen, + fa->fa_tos, fi, flags); + if (err < 0) + goto stop; + i_fa++; } + + if (filter->dump_exceptions) { + err = fib_dump_info_fnhe(skb, cb, tb->tb_id, fi, + &i_fa, s_fa); + if (err < 0) + goto stop; + } + next: i++; } cb->args[4] = i; return skb->len; + +stop: + cb->args[4] = i; + cb->args[5] = i_fa; + return err; } /* rcu_read_lock needs to be hold by caller from readside */ diff --git a/net/ipv4/route.c b/net/ipv4/route.c index b1628d25e828..6aee412a68bd 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2812,6 +2812,79 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, return -EMSGSIZE; } +static int fnhe_dump_bucket(struct net *net, struct sk_buff *skb, + struct netlink_callback *cb, u32 table_id, + struct fnhe_hash_bucket *bucket, int genid, + int *fa_index, int fa_start) +{ + int i; + + for (i = 0; i < FNHE_HASH_SIZE; i++) { + struct fib_nh_exception *fnhe; + + for (fnhe = rcu_dereference(bucket[i].chain); fnhe; + fnhe = rcu_dereference(fnhe->fnhe_next)) { + struct rtable *rt; + int err; + + if (*fa_index < fa_start) + goto next; + + if (fnhe->fnhe_genid != genid) + goto next; + + if (fnhe->fnhe_expires && + time_after(jiffies, fnhe->fnhe_expires)) + goto next; + + rt = rcu_dereference(fnhe->fnhe_rth_input); + if (!rt) + rt = rcu_dereference(fnhe->fnhe_rth_output); + if (!rt) + goto next; + + err = rt_fill_info(net, fnhe->fnhe_daddr, 0, rt, + table_id, NULL, skb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq); + if (err) + return err; +next: + (*fa_index)++; + } + } + + return 0; +} + +int fib_dump_info_fnhe(struct sk_buff *skb, struct netlink_callback *cb, + u32 table_id, struct fib_info *fi, + int *fa_index, int fa_start) +{ + struct net *net = sock_net(cb->skb->sk); + int nhsel, genid = fnhe_genid(net); + + for (nhsel = 0; nhsel < fib_info_num_path(fi); nhsel++) { + struct fib_nh_common *nhc = fib_info_nhc(fi, nhsel); + struct fnhe_hash_bucket *bucket; + int err; + + if (nhc->nhc_flags & RTNH_F_DEAD) + continue; + + bucket = rcu_dereference(nhc->nhc_exceptions); + if (!bucket) + continue; + + err = fnhe_dump_bucket(net, skb, cb, table_id, bucket, genid, + fa_index, fa_start); + if (err) + return err; + } + + return 0; +} + static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst, u8 ip_proto, __be16 sport, __be16 dport) From patchwork Fri Jun 21 15:45:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120401 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjjp0Wsfz9s00 for ; Sat, 22 Jun 2019 01:47:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726483AbfFUPrJ (ORCPT ); Fri, 21 Jun 2019 11:47:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58592 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726196AbfFUPrI (ORCPT ); Fri, 21 Jun 2019 11:47:08 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 20FF730C31A2; Fri, 21 Jun 2019 15:47:05 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id C96CA19C7F; Fri, 21 Jun 2019 15:47:00 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 05/11] Revert "net/ipv6: Bail early if user only wants cloned entries" Date: Fri, 21 Jun 2019 17:45:24 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Fri, 21 Jun 2019 15:47:08 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This reverts commit 08e814c9e8eb5a982cbd1e8f6bd255d97c51026f: as we are preparing to fix listing and dumping of IPv6 cached routes, we need to allow RTM_F_CLONED as a flag to match routes against while dumping them. Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- v7: No changes v6: No changes v5: No changes v4: New patch net/ipv6/ip6_fib.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index f2a3cc4c8f12..2baca6ef9e01 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -572,13 +572,10 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) } else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) { struct rtmsg *rtm = nlmsg_data(nlh); - arg.filter.flags = rtm->rtm_flags & (RTM_F_PREFIX|RTM_F_CLONED); + if (rtm->rtm_flags & RTM_F_PREFIX) + arg.filter.flags = RTM_F_PREFIX; } - /* fib entries are never clones */ - if (arg.filter.flags & RTM_F_CLONED) - goto out; - w = (void *)cb->args[2]; if (!w) { /* New dump: From patchwork Fri Jun 21 15:45:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120402 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjjq4DvCz9s4V for ; Sat, 22 Jun 2019 01:47:11 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726489AbfFUPrK (ORCPT ); Fri, 21 Jun 2019 11:47:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58604 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726469AbfFUPrJ (ORCPT ); Fri, 21 Jun 2019 11:47:09 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EB5CD30C1AE7; Fri, 21 Jun 2019 15:47:08 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8FD2B19C4F; Fri, 21 Jun 2019 15:47:05 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 06/11] ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del() Date: Fri, 21 Jun 2019 17:45:25 +0200 Message-Id: <3450805b83561af0a7de249da6bf860aee4d2524.1561131177.git.sbrivio@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Fri, 21 Jun 2019 15:47:09 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If fc_nh_id isn't set, we shouldn't try to match against it. This actually matters just for the RTF_CACHE below (where this case is already handled): if iproute2 gets a route exception and tries to delete it, it won't reference it by fc_nh_id, even if a nexthop object might be associated to the originating route. Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects") Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- v7: No changes v6: New patch net/ipv6/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index c4d285fe0adc..86859023cd01 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3827,7 +3827,8 @@ static int ip6_route_del(struct fib6_config *cfg, for_each_fib6_node_rt_rcu(fn) { struct fib6_nh *nh; - if (rt->nh && rt->nh->id != cfg->fc_nh_id) + if (rt->nh && cfg->fc_nh_id && + rt->nh->id != cfg->fc_nh_id) continue; if (cfg->fc_flags & RTF_CACHE) { From patchwork Fri Jun 21 15:45:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120403 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjjv254Xz9s00 for ; Sat, 22 Jun 2019 01:47:15 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726498AbfFUPrO (ORCPT ); Fri, 21 Jun 2019 11:47:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12915 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726469AbfFUPrO (ORCPT ); Fri, 21 Jun 2019 11:47:14 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A868A87620; Fri, 21 Jun 2019 15:47:13 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 59DDF19C68; Fri, 21 Jun 2019 15:47:09 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 07/11] ipv6/route: Change return code of rt6_dump_route() for partial node dumps Date: Fri, 21 Jun 2019 17:45:26 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Fri, 21 Jun 2019 15:47:13 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In the next patch, we are going to add optional dump of exceptions to rt6_dump_route(). Change the return code of rt6_dump_route() to accomodate partial node dumps: we might dump multiple routes per node, and might be able to dump only a given number of them, so fib6_dump_node() will need to know how many routes have been dumped on partial dump, to restart the dump from the point where it was interrupted. Note that fib6_dump_node() is the only caller and already handles all non-negative return codes as success: those become -1 to signal that we're done with the node. If we fail, return 0, as we were unable to dump the single route in the node, but we're not done with it. Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- v7: No changes v6: New patch net/ipv6/ip6_fib.c | 2 +- net/ipv6/route.c | 16 ++++++++++------ 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 2baca6ef9e01..0e9a2ec69336 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -464,7 +464,7 @@ static int fib6_dump_node(struct fib6_walker *w) for_each_fib6_walker_rt(w) { res = rt6_dump_route(rt, w->args); - if (res < 0) { + if (res >= 0) { /* Frame is full, suspend walking */ w->leaf = rt; return 1; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 86859023cd01..1282f5a55b08 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5503,6 +5503,7 @@ static bool fib6_info_uses_dev(const struct fib6_info *f6i, return false; } +/* Return -1 if done with node, number of handled routes on partial dump */ int rt6_dump_route(struct fib6_info *rt, void *p_arg) { struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; @@ -5511,25 +5512,28 @@ int rt6_dump_route(struct fib6_info *rt, void *p_arg) struct net *net = arg->net; if (rt == net->ipv6.fib6_null_entry) - return 0; + return -1; if ((filter->flags & RTM_F_PREFIX) && !(rt->fib6_flags & RTF_PREFIX_RT)) { /* success since this is not a prefix route */ - return 1; + return -1; } if (filter->filter_set) { if ((filter->rt_type && rt->fib6_type != filter->rt_type) || (filter->dev && !fib6_info_uses_dev(rt, filter->dev)) || (filter->protocol && rt->fib6_protocol != filter->protocol)) { - return 1; + return -1; } flags |= NLM_F_DUMP_FILTERED; } - return rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, 0, - RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid, - arg->cb->nlh->nlmsg_seq, flags); + if (rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, 0, RTM_NEWROUTE, + NETLINK_CB(arg->cb->skb).portid, + arg->cb->nlh->nlmsg_seq, flags)) + return 0; + + return -1; } static int inet6_rtm_valid_getroute_req(struct sk_buff *skb, From patchwork Fri Jun 21 15:45:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120404 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjjy46Jlz9s3C for ; Sat, 22 Jun 2019 01:47:18 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726331AbfFUPrR (ORCPT ); Fri, 21 Jun 2019 11:47:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58674 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726469AbfFUPrR (ORCPT ); Fri, 21 Jun 2019 11:47:17 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8320130C31A2; Fri, 21 Jun 2019 15:47:16 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 19BBB19C4F; Fri, 21 Jun 2019 15:47:13 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 08/11] ipv6: Dump route exceptions if requested Date: Fri, 21 Jun 2019 17:45:27 +0200 Message-Id: <4af136ad187cdd167bac5effcf7e7b2932db87b9.1561131177.git.sbrivio@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Fri, 21 Jun 2019 15:47:16 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"), route exceptions reside in a separate hash table, and won't be found by walking the FIB, so they won't be dumped to userspace on a RTM_GETROUTE message. This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to have no function anymore: # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium # ip -6 route list cache # ip -6 route flush cache # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium because iproute2 lists cached routes using RTM_GETROUTE, and flushes them by listing all the routes, and deleting them with RTM_DELROUTE one by one. If cached routes are requested using the RTM_F_CLONED flag together with strict checking, or if no strict checking is requested (and hence we can't consistently apply filters), look up exceptions in the hash table associated with the current fib6_info in rt6_dump_route(), and, if present and not expired, add them to the dump. We might be unable to dump all the entries for a given node in a single message, so keep track of how many entries were handled for the current node in fib6_walker, and skip that amount in case we start from the same partially dumped node. When a partial dump restarts, as the starting node might change when 'sernum' changes, we have no guarantee that we need to skip the same amount of in-node entries. Therefore, we need two counters, and we need to zero the in-node counter if the node from which the dump is resumed differs. Note that, with the current version of iproute2, this only fixes the 'ip -6 route list cache': on a flush command, iproute2 doesn't pass RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is still unable to fetch the routes to be flushed. This will be addressed in a patch for iproute2. To flush cached routes, a procfs entry could be introduced instead: that's how it works for IPv4. We already have a rt6_flush_exception() function ready to be wired to it. However, this would not solve the issue for listing. Versions of iproute2 and kernel tested: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched 3.18 list + + + + + + flush + + + + + + 4.4 list + + + + + + flush + + + + + + 4.9 list + + + + + + flush + + + + + + 4.14 list + + + + + + flush + + + + + + 4.15 list flush 4.19 list flush 5.0 list flush 5.1 list flush with list + + + + + + fix flush + + + + v7: - Explain usage of "skip" counters in commit message (suggested by David Ahern) v6: - Rebase onto net-next, use recently introduced nexthop walker - Make rt6_nh_dump_exceptions() a separate function (suggested by David Ahern) v5: - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH, update test results (flushing works with iproute2 < 5.0.0 now) v4: - Split NLM_F_MATCH and strict check handling in separate patches - Filter routes using RTM_F_CLONED: if it's not set, only return non-cached routes, and if it's set, only return cached routes: change requested by David Ahern and Martin Lau. This implies that iproute2 needs a separate patch to be able to flush IPv6 cached routes. This is not ideal because we can't fix the breakage caused by 2b760fcf5cfb entirely in kernel. However, two years have passed since then, and this makes it more tolerable v3: - More descriptive comment about expired exceptions in rt6_dump_route() - Swap return values of rt6_dump_route() (suggested by Martin Lau) - Don't zero skip_in_node in case we don't dump anything in a given pass (also suggested by Martin Lau) - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic, it's just a flag to indicate the route was cloned, not to filter on routes v2: Add tracking of number of entries to be skipped in current node after a partial dump. As we restart from the same node, if not all the exceptions for a given node fit in a single message, the dump will not terminate, as suggested by Martin Lau. This is a concrete possibility, setting up a big number of exceptions for the same route actually causes the issue, suggested by David Ahern. Reported-by: Jianlin Shi Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Stefano Brivio Reviewed-by: David Ahern --- include/net/ip6_fib.h | 1 + include/net/ip6_route.h | 2 +- net/ipv6/ip6_fib.c | 12 ++++- net/ipv6/route.c | 114 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 116 insertions(+), 13 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 87331f2c4af0..4b5656c71abc 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -316,6 +316,7 @@ struct fib6_walker { enum fib6_walk_state state; unsigned int skip; unsigned int count; + unsigned int skip_in_node; int (*func)(struct fib6_walker *); void *args; }; diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 7375a165fd98..a849a379f426 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -182,7 +182,7 @@ struct rt6_rtnl_dump_arg { struct fib_dump_filter filter; }; -int rt6_dump_route(struct fib6_info *f6i, void *p_arg); +int rt6_dump_route(struct fib6_info *f6i, void *p_arg, unsigned int skip); void rt6_mtu_change(struct net_device *dev, unsigned int mtu); void rt6_remove_prefsrc(struct inet6_ifaddr *ifp); void rt6_clean_tohost(struct net *net, struct in6_addr *gateway); diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 0e9a2ec69336..92a6bde57594 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -463,12 +463,19 @@ static int fib6_dump_node(struct fib6_walker *w) struct fib6_info *rt; for_each_fib6_walker_rt(w) { - res = rt6_dump_route(rt, w->args); + res = rt6_dump_route(rt, w->args, w->skip_in_node); if (res >= 0) { /* Frame is full, suspend walking */ w->leaf = rt; + + /* We'll restart from this node, so if some routes were + * already dumped, skip them next time. + */ + w->skip_in_node += res; + return 1; } + w->skip_in_node = 0; /* Multipath routes are dumped in one route with the * RTA_MULTIPATH attribute. Jump 'rt' to point to the @@ -520,6 +527,7 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, if (cb->args[4] == 0) { w->count = 0; w->skip = 0; + w->skip_in_node = 0; spin_lock_bh(&table->tb6_lock); res = fib6_walk(net, w); @@ -535,6 +543,7 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, w->state = FWS_INIT; w->node = w->root; w->skip = w->count; + w->skip_in_node = 0; } else w->skip = 0; @@ -2093,6 +2102,7 @@ static void fib6_clean_tree(struct net *net, struct fib6_node *root, c.w.func = fib6_clean_node; c.w.count = 0; c.w.skip = 0; + c.w.skip_in_node = 0; c.func = func; c.sernum = sernum; c.arg = arg; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 1282f5a55b08..e2900cd0410c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5503,13 +5503,73 @@ static bool fib6_info_uses_dev(const struct fib6_info *f6i, return false; } +struct fib6_nh_exception_dump_walker { + struct rt6_rtnl_dump_arg *dump; + struct fib6_info *rt; + unsigned int flags; + unsigned int skip; + unsigned int count; +}; + +static int rt6_nh_dump_exceptions(struct fib6_nh *nh, void *arg) +{ + struct fib6_nh_exception_dump_walker *w = arg; + struct rt6_rtnl_dump_arg *dump = w->dump; + struct rt6_exception_bucket *bucket; + struct rt6_exception *rt6_ex; + int i, err; + + bucket = fib6_nh_get_excptn_bucket(nh, NULL); + if (!bucket) + return 0; + + for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) { + hlist_for_each_entry(rt6_ex, &bucket->chain, hlist) { + if (w->skip) { + w->skip--; + continue; + } + + /* Expiration of entries doesn't bump sernum, insertion + * does. Removal is triggered by insertion, so we can + * rely on the fact that if entries change between two + * partial dumps, this node is scanned again completely, + * see rt6_insert_exception() and fib6_dump_table(). + * + * Count expired entries we go through as handled + * entries that we'll skip next time, in case of partial + * node dump. Otherwise, if entries expire meanwhile, + * we'll skip the wrong amount. + */ + if (rt6_check_expired(rt6_ex->rt6i)) { + w->count++; + continue; + } + + err = rt6_fill_node(dump->net, dump->skb, w->rt, + &rt6_ex->rt6i->dst, NULL, NULL, 0, + RTM_NEWROUTE, + NETLINK_CB(dump->cb->skb).portid, + dump->cb->nlh->nlmsg_seq, w->flags); + if (err) + return err; + + w->count++; + } + bucket++; + } + + return 0; +} + /* Return -1 if done with node, number of handled routes on partial dump */ -int rt6_dump_route(struct fib6_info *rt, void *p_arg) +int rt6_dump_route(struct fib6_info *rt, void *p_arg, unsigned int skip) { struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; struct fib_dump_filter *filter = &arg->filter; unsigned int flags = NLM_F_MULTI; struct net *net = arg->net; + int count = 0; if (rt == net->ipv6.fib6_null_entry) return -1; @@ -5519,19 +5579,51 @@ int rt6_dump_route(struct fib6_info *rt, void *p_arg) /* success since this is not a prefix route */ return -1; } - if (filter->filter_set) { - if ((filter->rt_type && rt->fib6_type != filter->rt_type) || - (filter->dev && !fib6_info_uses_dev(rt, filter->dev)) || - (filter->protocol && rt->fib6_protocol != filter->protocol)) { - return -1; - } + if (filter->filter_set && + ((filter->rt_type && rt->fib6_type != filter->rt_type) || + (filter->dev && !fib6_info_uses_dev(rt, filter->dev)) || + (filter->protocol && rt->fib6_protocol != filter->protocol))) { + return -1; + } + + if (filter->filter_set || + !filter->dump_routes || !filter->dump_exceptions) { flags |= NLM_F_DUMP_FILTERED; } - if (rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, 0, RTM_NEWROUTE, - NETLINK_CB(arg->cb->skb).portid, - arg->cb->nlh->nlmsg_seq, flags)) - return 0; + if (filter->dump_routes) { + if (skip) { + skip--; + } else { + if (rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, + 0, RTM_NEWROUTE, + NETLINK_CB(arg->cb->skb).portid, + arg->cb->nlh->nlmsg_seq, flags)) { + return 0; + } + count++; + } + } + + if (filter->dump_exceptions) { + struct fib6_nh_exception_dump_walker w = { .dump = arg, + .rt = rt, + .flags = flags, + .skip = skip, + .count = 0 }; + int err; + + if (rt->nh) { + err = nexthop_for_each_fib6_nh(rt->nh, + rt6_nh_dump_exceptions, + &w); + } else { + err = rt6_nh_dump_exceptions(rt->fib6_nh, &w); + } + + if (err) + return count += w.count; + } return -1; } From patchwork Fri Jun 21 15:45:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120405 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjk21LCDz9s00 for ; Sat, 22 Jun 2019 01:47:22 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726521AbfFUPrV (ORCPT ); Fri, 21 Jun 2019 11:47:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59050 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726469AbfFUPrU (ORCPT ); Fri, 21 Jun 2019 11:47:20 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7CAD4356F8; Fri, 21 Jun 2019 15:47:20 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id EC86119C4F; Fri, 21 Jun 2019 15:47:16 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 09/11] ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() Date: Fri, 21 Jun 2019 17:45:28 +0200 Message-Id: <25dbd936ec0f3266aceccdf51056eb8aa093e05b.1561131177.git.sbrivio@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 21 Jun 2019 15:47:20 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When we perform an inexact match on FIB nodes via fib6_locate_1(), longer prefixes will be preferred to shorter ones. However, it might happen that a node, with higher fn_bit value than some other, has no valid routing information. In this case, we'll pick that node, but it will be discarded by the check on RTN_RTINFO in fib6_locate(), and we might miss nodes with valid routing information but with lower fn_bit value. This is apparent when a routing exception is created for a default route: # ip -6 route list fc00:1::/64 dev veth_A-R1 proto kernel metric 256 pref medium fc00:2::/64 dev veth_A-R2 proto kernel metric 256 pref medium fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 pref medium fe80::/64 dev veth_A-R1 proto kernel metric 256 pref medium fe80::/64 dev veth_A-R2 proto kernel metric 256 pref medium default via fc00:1::2 dev veth_A-R1 metric 1024 pref medium # ip -6 route list cache fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 expires 593sec mtu 1500 pref medium fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 593sec mtu 1500 pref medium # ip -6 route flush cache # node for default route is discarded Failed to send flush request: No such process # ip -6 route list cache fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 586sec mtu 1500 pref medium Check right away if the node has a RTN_RTINFO flag, before replacing the 'prev' pointer, that indicates the longest matching prefix found so far. Fixes: 38fbeeeeccdb ("ipv6: prepare fib6_locate() for exception table") Signed-off-by: Stefano Brivio --- v7: No changes v6: Rebased onto net-next, no changes v5: No changes v4: No changes v3: No changes v2: No changes net/ipv6/ip6_fib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 92a6bde57594..8f36e6742c36 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1595,7 +1595,8 @@ static struct fib6_node *fib6_locate_1(struct fib6_node *root, if (plen == fn->fn_bit) return fn; - prev = fn; + if (fn->fn_flags & RTN_RTINFO) + prev = fn; next: /* From patchwork Fri Jun 21 15:45:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120406 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Vjk46m6Hz9s3C for ; Sat, 22 Jun 2019 01:47:24 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726534AbfFUPrY (ORCPT ); Fri, 21 Jun 2019 11:47:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35366 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726469AbfFUPrX (ORCPT ); Fri, 21 Jun 2019 11:47:23 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 619AA75734; Fri, 21 Jun 2019 15:47:23 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id E649319C6A; Fri, 21 Jun 2019 15:47:20 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 10/11] selftests: pmtu: Introduce list_flush_ipv4_exception test case Date: Fri, 21 Jun 2019 17:45:29 +0200 Message-Id: <9f59f090409b67e4854c820f07f7dcfdf57e2e84.1561131177.git.sbrivio@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 21 Jun 2019 15:47:23 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This test checks that route exceptions can be successfully listed and flushed using ip -6 route {list,flush} cache. v7: No changes v6: - Merge this patch into series including fix, as it's also targeted for net-next - Drop left-over print of 'ip route list cache | wc -l' Signed-off-by: Stefano Brivio --- tools/testing/selftests/net/pmtu.sh | 60 +++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 269e839b747e..f5004a9df229 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -112,6 +112,10 @@ # - cleanup_ipv6_exception # Same as above, but use IPv6 transport from A to B # +# - list_flush_ipv4_exception +# Using the same topology as in pmtu_ipv4, create exceptions, and check +# they are shown when listing exception caches, gone after flushing them +# # - list_flush_ipv6_exception # Using the same topology as in pmtu_ipv6, create exceptions, and check # they are shown when listing exception caches, gone after flushing them @@ -156,6 +160,7 @@ tests=" pmtu_vti6_link_change_mtu vti6: MTU changes on link changes 0 cleanup_ipv4_exception ipv4: cleanup of cached exceptions 1 cleanup_ipv6_exception ipv6: cleanup of cached exceptions 1 + list_flush_ipv4_exception ipv4: list and flush cached exceptions 1 list_flush_ipv6_exception ipv6: list and flush cached exceptions 1" NS_A="ns-A" @@ -1207,6 +1212,61 @@ run_test_nh() { USE_NH=no } +test_list_flush_ipv4_exception() { + setup namespaces routing || return 2 + trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \ + "${ns_r1}" veth_R1-B "${ns_b}" veth_B-R1 \ + "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \ + "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2 + + dst_prefix1="${prefix4}.${b_r1}." + dst2="${prefix4}.${b_r2}.1" + + # Set up initial MTU values + mtu "${ns_a}" veth_A-R1 2000 + mtu "${ns_r1}" veth_R1-A 2000 + mtu "${ns_r1}" veth_R1-B 1500 + mtu "${ns_b}" veth_B-R1 1500 + + mtu "${ns_a}" veth_A-R2 2000 + mtu "${ns_r2}" veth_R2-A 2000 + mtu "${ns_r2}" veth_R2-B 1500 + mtu "${ns_b}" veth_B-R2 1500 + + fail=0 + + # Add 100 addresses for veth endpoint on B reached by default A route + for i in $(seq 100 199); do + run_cmd ${ns_b} ip addr add "${dst_prefix1}${i}" dev veth_B-R1 + done + + # Create 100 cached route exceptions for path via R1, one via R2. Note + # that with IPv4 we need to actually cause a route lookup that matches + # the exception caused by ICMP, in order to actually have a cached + # route, so we need to ping each destination twice + for i in $(seq 100 199); do + run_cmd ${ns_a} ping -q -M want -i 0.1 -c 2 -s 1800 "${dst_prefix1}${i}" + done + run_cmd ${ns_a} ping -q -M want -i 0.1 -c 2 -s 1800 "${dst2}" + + # Each exception is printed as two lines + if [ "$(${ns_a} ip route list cache | wc -l)" -ne 202 ]; then + err " can't list cached exceptions" + fail=1 + fi + + run_cmd ${ns_a} ip route flush cache + pmtu1="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst_prefix}1)" + pmtu2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst_prefix}2)" + if [ -n "${pmtu1}" ] || [ -n "${pmtu2}" ] || \ + [ -n "$(${ns_a} ip route list cache)" ]; then + err " can't flush cached exceptions" + fail=1 + fi + + return ${fail} +} + test_list_flush_ipv6_exception() { setup namespaces routing || return 2 trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \ From patchwork Fri Jun 21 15:45:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Brivio X-Patchwork-Id: 1120407 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45VjkC28DMz9s00 for ; Sat, 22 Jun 2019 01:47:30 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726548AbfFUPr3 (ORCPT ); Fri, 21 Jun 2019 11:47:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40198 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726469AbfFUPr3 (ORCPT ); Fri, 21 Jun 2019 11:47:29 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 802D3C18B2C4; Fri, 21 Jun 2019 15:47:29 +0000 (UTC) Received: from epycfail.redhat.com (unknown [10.36.112.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9B2719C6A; Fri, 21 Jun 2019 15:47:23 +0000 (UTC) From: Stefano Brivio To: David Miller , David Ahern Cc: Jianlin Shi , Wei Wang , Martin KaFai Lau , Eric Dumazet , Matti Vaittinen , netdev@vger.kernel.org Subject: [PATCH net-next v7 11/11] selftests: pmtu: Make list_flush_ipv6_exception test more demanding Date: Fri, 21 Jun 2019 17:45:30 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 21 Jun 2019 15:47:29 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Instead of just listing and flushing two cached exceptions, create a relatively big number of them, and count how many are listed. Single netlink dump messages contain approximately 25 entries each, and this way we can make sure the partial dump tracking mechanism is working properly. While at it, also ensure that no cached routes can be listed after flush, and remove 'sleep 1' calls, they are not actually needed. v7: No changes v6: - Merge this patch into series including fix, as it's also targeted for net-next. No actual changes Signed-off-by: Stefano Brivio --- tools/testing/selftests/net/pmtu.sh | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index f5004a9df229..ab367e75f095 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -1274,7 +1274,7 @@ test_list_flush_ipv6_exception() { "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \ "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2 - dst1="${prefix6}:${b_r1}::1" + dst_prefix1="${prefix6}:${b_r1}::" dst2="${prefix6}:${b_r2}::1" # Set up initial MTU values @@ -1290,20 +1290,26 @@ test_list_flush_ipv6_exception() { fail=0 - # Create route exceptions - run_cmd ${ns_a} ${ping6} -q -M want -i 0.1 -w 1 -s 1800 ${dst1} - run_cmd ${ns_a} ${ping6} -q -M want -i 0.1 -w 1 -s 1800 ${dst2} + # Add 100 addresses for veth endpoint on B reached by default A route + for i in $(seq 100 199); do + run_cmd ${ns_b} ip addr add "${dst_prefix1}${i}" dev veth_B-R1 + done - if [ "$(${ns_a} ip -6 route list cache | wc -l)" -ne 2 ]; then + # Create 100 cached route exceptions for path via R1, one via R2 + for i in $(seq 100 199); do + run_cmd ${ns_a} ping -q -M want -i 0.1 -w 1 -s 1800 "${dst_prefix1}${i}" + done + run_cmd ${ns_a} ping -q -M want -i 0.1 -w 1 -s 1800 "${dst2}" + if [ "$(${ns_a} ip -6 route list cache | wc -l)" -ne 101 ]; then err " can't list cached exceptions" fail=1 fi run_cmd ${ns_a} ip -6 route flush cache - sleep 1 - pmtu1="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst1})" + pmtu1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst_prefix1}100")" pmtu2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})" - if [ -n "${pmtu1}" ] || [ -n "${pmtu2}" ]; then + if [ -n "${pmtu1}" ] || [ -n "${pmtu2}" ] || \ + [ -n "$(${ns_a} ip -6 route list cache)" ]; then err " can't flush cached exceptions" fail=1 fi