[iproute2] ip route: broken logic when using default word and family not specified

Message ID CABXn0zdkK4fgHKuDn2a27BoPFDas-vcS9eTRiZsmxadsqaRHxA@mail.gmail.com
State Under Review
Delegated to: stephen hemminger
Headers show
Series
  • [iproute2] ip route: broken logic when using default word and family not specified
Related show

Commit Message

Alexander Zubkov Nov. 18, 2017, 1:12 p.m.
Hello,

I have found odd behaviour when using "ip route list" (and other bound
commands) with prefix "default".

When family not specified, its value is completely ignored and "ip
route list default" shows all inet4 prefixes. Same do "ip route list
exact default" and "ip route list match default". Examples are at the
end of the message.

When family is specified, the behaviour changes and default works as
expected (=0.0.0.0/0 for inet4 and =::/0 for inet6). The above
commands all shows only default prefix in the output and only "root
default" shows all prefixes.

I tried to dig into the code and found that when default is using with
unspecified family - the resulting structures filter.[mr]dst will
actually become all-zeroes as in the case when nothing is specified.

I propose to change this in such a way (see attached patch). When
default prefix is parsed, the flag PREFIXLEN_SPECIFIED is attached to
it too, like for prefixes with "/<masklen>". It seems logical to me,
because "/0" is really implied by "default" and even directly set up
in the code:

dst->bitlen = 0;

Then during filtering there is additional logic for unspecified family
and specified prefix.

With this patch ip route list commands shown above are working as
expected. And it also works with unspecified table when routes are
printed from different families.

Examples after applying the patch:

# ./ip route list
default via 192.168.0.2 dev eth0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
192.168.1.0/24 via 192.168.0.3 dev eth0
# ./ip -6 route list
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80:1::/64 via fe80::3 dev eth0 metric 1024 pref medium
default via fe80::2 dev eth0 metric 1024 pref medium
# ./ip route list default
default via 192.168.0.2 dev eth0
# ./ip route list exact default
default via 192.168.0.2 dev eth0
# ./ip route list match default
default via 192.168.0.2 dev eth0
# ./ip route list root default
default via 192.168.0.2 dev eth0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
192.168.1.0/24 via 192.168.0.3 dev eth0
# ./ip route list default table all
default via 192.168.0.2 dev eth0
unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
default via fe80::2 dev eth0 metric 1024 pref medium
unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
# ./ip route list root default table all
default via 192.168.0.2 dev eth0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
...
ff00::/8 dev eth0 table local metric 256 pref medium
unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium

And before patch:

# ip route list default
default via 192.168.0.2 dev eth0
192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
192.168.1.0/24 via 192.168.0.3 dev eth0
# ip route list match default
default via 192.168.0.2 dev eth0
192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
192.168.1.0/24 via 192.168.0.3 dev eth0
# ip route list exact default
default via 192.168.0.2 dev eth0
192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
192.168.1.0/24 via 192.168.0.3 dev eth0
# ip -4 route list exact default
default via 192.168.0.2 dev eth0
# ip -6 route list exact default
default via fe80::2 dev eth0  metric 1024
# ip route list exact default table all
default via 192.168.0.2 dev eth0
192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
...
local fe80::3c09:19ff:feee:9866 dev lo  table local  proto none  metric 0
ff00::/8 dev eth0  table local  metric 256
unreachable default dev lo  table unspec  proto kernel  metric
4294967295  error -101

Comments

Alexander Zubkov Nov. 18, 2017, 4:56 p.m. | #1
I also opened earlier a ticket in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=197899
And Stephen Hemminger had couple of comments there which I want to argue:

> $ ip route list default
> Means list all routes in any address family (ie same as any)
> but
>
> $ ip route list 0/0
> Means list all routes for IPv4 default route.

This is not correct, because first command do not show routes in any
address family. Now it do so only when table 0 is specified, otherwise
only IPv4 routes are showed. Here is the code from iproute.c:

        if (do_ipv6 == AF_UNSPEC && filter.tb)
                do_ipv6 = AF_INET;

> It probably is worth a man page warning, but changing semantics
> that have existed for many years is more likely to break some existing user.

Yes, backward compatibility is a reason. But as I remember, that
sematics already have changed earlier. Probably it was showing IPv4
and IPv6 routes together without family specified - I do not remember
exactly. And I have doubts that such feature could be lied on
reliably.
I as a end user would prefer to make the behaviour more consistent and
without such excetptions. But of course there may be other opinions.

On Sat, Nov 18, 2017 at 2:12 PM, Alexander Zubkov <zubkov318@gmail.com> wrote:
> Hello,
>
> I have found odd behaviour when using "ip route list" (and other bound
> commands) with prefix "default".
>
> When family not specified, its value is completely ignored and "ip
> route list default" shows all inet4 prefixes. Same do "ip route list
> exact default" and "ip route list match default". Examples are at the
> end of the message.
>
> When family is specified, the behaviour changes and default works as
> expected (=0.0.0.0/0 for inet4 and =::/0 for inet6). The above
> commands all shows only default prefix in the output and only "root
> default" shows all prefixes.
>
> I tried to dig into the code and found that when default is using with
> unspecified family - the resulting structures filter.[mr]dst will
> actually become all-zeroes as in the case when nothing is specified.
>
> I propose to change this in such a way (see attached patch). When
> default prefix is parsed, the flag PREFIXLEN_SPECIFIED is attached to
> it too, like for prefixes with "/<masklen>". It seems logical to me,
> because "/0" is really implied by "default" and even directly set up
> in the code:
>
> dst->bitlen = 0;
>
> Then during filtering there is additional logic for unspecified family
> and specified prefix.
>
> With this patch ip route list commands shown above are working as
> expected. And it also works with unspecified table when routes are
> printed from different families.
>
> Examples after applying the patch:
>
> # ./ip route list
> default via 192.168.0.2 dev eth0
> 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
> 192.168.1.0/24 via 192.168.0.3 dev eth0
> # ./ip -6 route list
> fe80::/64 dev eth0 proto kernel metric 256 pref medium
> fe80:1::/64 via fe80::3 dev eth0 metric 1024 pref medium
> default via fe80::2 dev eth0 metric 1024 pref medium
> # ./ip route list default
> default via 192.168.0.2 dev eth0
> # ./ip route list exact default
> default via 192.168.0.2 dev eth0
> # ./ip route list match default
> default via 192.168.0.2 dev eth0
> # ./ip route list root default
> default via 192.168.0.2 dev eth0
> 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
> 192.168.1.0/24 via 192.168.0.3 dev eth0
> # ./ip route list default table all
> default via 192.168.0.2 dev eth0
> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
> default via fe80::2 dev eth0 metric 1024 pref medium
> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
> # ./ip route list root default table all
> default via 192.168.0.2 dev eth0
> 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
> ...
> ff00::/8 dev eth0 table local metric 256 pref medium
> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
>
> And before patch:
>
> # ip route list default
> default via 192.168.0.2 dev eth0
> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
> 192.168.1.0/24 via 192.168.0.3 dev eth0
> # ip route list match default
> default via 192.168.0.2 dev eth0
> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
> 192.168.1.0/24 via 192.168.0.3 dev eth0
> # ip route list exact default
> default via 192.168.0.2 dev eth0
> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
> 192.168.1.0/24 via 192.168.0.3 dev eth0
> # ip -4 route list exact default
> default via 192.168.0.2 dev eth0
> # ip -6 route list exact default
> default via fe80::2 dev eth0  metric 1024
> # ip route list exact default table all
> default via 192.168.0.2 dev eth0
> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
> ...
> local fe80::3c09:19ff:feee:9866 dev lo  table local  proto none  metric 0
> ff00::/8 dev eth0  table local  metric 256
> unreachable default dev lo  table unspec  proto kernel  metric
> 4294967295  error -101
Alexander Zubkov Dec. 4, 2017, 7:41 p.m. | #2
Hello everybody,

I will be glad to hear a piece of feedback on this proposal.

On Sat, Nov 18, 2017 at 5:56 PM, Alexander Zubkov <zubkov318@gmail.com> wrote:
> I also opened earlier a ticket in bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=197899
> And Stephen Hemminger had couple of comments there which I want to argue:
>
>> $ ip route list default
>> Means list all routes in any address family (ie same as any)
>> but
>>
>> $ ip route list 0/0
>> Means list all routes for IPv4 default route.
>
> This is not correct, because first command do not show routes in any
> address family. Now it do so only when table 0 is specified, otherwise
> only IPv4 routes are showed. Here is the code from iproute.c:
>
>         if (do_ipv6 == AF_UNSPEC && filter.tb)
>                 do_ipv6 = AF_INET;
>
>> It probably is worth a man page warning, but changing semantics
>> that have existed for many years is more likely to break some existing user.
>
> Yes, backward compatibility is a reason. But as I remember, that
> sematics already have changed earlier. Probably it was showing IPv4
> and IPv6 routes together without family specified - I do not remember
> exactly. And I have doubts that such feature could be lied on
> reliably.
> I as a end user would prefer to make the behaviour more consistent and
> without such excetptions. But of course there may be other opinions.
>
> On Sat, Nov 18, 2017 at 2:12 PM, Alexander Zubkov <zubkov318@gmail.com> wrote:
>> Hello,
>>
>> I have found odd behaviour when using "ip route list" (and other bound
>> commands) with prefix "default".
>>
>> When family not specified, its value is completely ignored and "ip
>> route list default" shows all inet4 prefixes. Same do "ip route list
>> exact default" and "ip route list match default". Examples are at the
>> end of the message.
>>
>> When family is specified, the behaviour changes and default works as
>> expected (=0.0.0.0/0 for inet4 and =::/0 for inet6). The above
>> commands all shows only default prefix in the output and only "root
>> default" shows all prefixes.
>>
>> I tried to dig into the code and found that when default is using with
>> unspecified family - the resulting structures filter.[mr]dst will
>> actually become all-zeroes as in the case when nothing is specified.
>>
>> I propose to change this in such a way (see attached patch). When
>> default prefix is parsed, the flag PREFIXLEN_SPECIFIED is attached to
>> it too, like for prefixes with "/<masklen>". It seems logical to me,
>> because "/0" is really implied by "default" and even directly set up
>> in the code:
>>
>> dst->bitlen = 0;
>>
>> Then during filtering there is additional logic for unspecified family
>> and specified prefix.
>>
>> With this patch ip route list commands shown above are working as
>> expected. And it also works with unspecified table when routes are
>> printed from different families.
>>
>> Examples after applying the patch:
>>
>> # ./ip route list
>> default via 192.168.0.2 dev eth0
>> 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
>> 192.168.1.0/24 via 192.168.0.3 dev eth0
>> # ./ip -6 route list
>> fe80::/64 dev eth0 proto kernel metric 256 pref medium
>> fe80:1::/64 via fe80::3 dev eth0 metric 1024 pref medium
>> default via fe80::2 dev eth0 metric 1024 pref medium
>> # ./ip route list default
>> default via 192.168.0.2 dev eth0
>> # ./ip route list exact default
>> default via 192.168.0.2 dev eth0
>> # ./ip route list match default
>> default via 192.168.0.2 dev eth0
>> # ./ip route list root default
>> default via 192.168.0.2 dev eth0
>> 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
>> 192.168.1.0/24 via 192.168.0.3 dev eth0
>> # ./ip route list default table all
>> default via 192.168.0.2 dev eth0
>> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
>> default via fe80::2 dev eth0 metric 1024 pref medium
>> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
>> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
>> # ./ip route list root default table all
>> default via 192.168.0.2 dev eth0
>> 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1
>> ...
>> ff00::/8 dev eth0 table local metric 256 pref medium
>> unreachable default dev lo proto kernel metric 4294967295 error -101 pref medium
>>
>> And before patch:
>>
>> # ip route list default
>> default via 192.168.0.2 dev eth0
>> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
>> 192.168.1.0/24 via 192.168.0.3 dev eth0
>> # ip route list match default
>> default via 192.168.0.2 dev eth0
>> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
>> 192.168.1.0/24 via 192.168.0.3 dev eth0
>> # ip route list exact default
>> default via 192.168.0.2 dev eth0
>> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
>> 192.168.1.0/24 via 192.168.0.3 dev eth0
>> # ip -4 route list exact default
>> default via 192.168.0.2 dev eth0
>> # ip -6 route list exact default
>> default via fe80::2 dev eth0  metric 1024
>> # ip route list exact default table all
>> default via 192.168.0.2 dev eth0
>> 192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.1
>> ...
>> local fe80::3c09:19ff:feee:9866 dev lo  table local  proto none  metric 0
>> ff00::/8 dev eth0  table local  metric 256
>> unreachable default dev lo  table unspec  proto kernel  metric
>> 4294967295  error -101
Stephen Hemminger Dec. 6, 2017, 1:42 a.m. | #3
On Sat, 18 Nov 2017 17:56:32 +0100
Alexander Zubkov <zubkov318@gmail.com> wrote:

> I also opened earlier a ticket in bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=197899
> And Stephen Hemminger had couple of comments there which I want to argue:
> 
> > $ ip route list default
> > Means list all routes in any address family (ie same as any)
> > but
> >
> > $ ip route list 0/0
> > Means list all routes for IPv4 default route.  
> 
> This is not correct, because first command do not show routes in any
> address family. Now it do so only when table 0 is specified, otherwise
> only IPv4 routes are showed. Here is the code from iproute.c:
> 
>         if (do_ipv6 == AF_UNSPEC && filter.tb)
>                 do_ipv6 = AF_INET;
> 
> > It probably is worth a man page warning, but changing semantics
> > that have existed for many years is more likely to break some existing user.  
> 
> Yes, backward compatibility is a reason. But as I remember, that
> sematics already have changed earlier. Probably it was showing IPv4
> and IPv6 routes together without family specified - I do not remember
> exactly. And I have doubts that such feature could be lied on
> reliably.
> I as a end user would prefer to make the behaviour more consistent and
> without such excetptions. But of course there may be other opinions.

As a conservative maintainer, my preference is always to receive acknowledgments
from others before accepting something that may break existing users.

Backward compatibility is more important than the surprising result you discovered.
A confused new user is less of an issue than breaking some existing user
who may not even have contact back to network developers.

Since it has been that way for many years, waiting a couple of weeks for review
is not going to hurt anything.

Patch

--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -191,20 +191,42 @@  static int filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
 		return 0;
 	if ((filter.tos^r->rtm_tos)&filter.tosmask)
 		return 0;
-	if (filter.rdst.family &&
-	    (r->rtm_family != filter.rdst.family || filter.rdst.bitlen > r->rtm_dst_len))
-		return 0;
+	if (filter.rdst.family) {
+		if (r->rtm_family != filter.rdst.family ||
+		    filter.rdst.bitlen > r->rtm_dst_len)
+			return 0;
+	} else if (filter.rdst.flags & PREFIXLEN_SPECIFIED) {
+		if (filter.rdst.bitlen > r->rtm_dst_len)
+			return 0;
+	}
-	if (filter.mdst.family &&
-	    (r->rtm_family != filter.mdst.family ||
-	     (filter.mdst.bitlen >= 0 && filter.mdst.bitlen < r->rtm_dst_len)))
-		return 0;
+	if (filter.mdst.family) {
+		if (r->rtm_family != filter.mdst.family ||
+		    (filter.mdst.bitlen >= 0 &&
+		     filter.mdst.bitlen < r->rtm_dst_len))
+			return 0;
+	} else if (filter.mdst.flags & PREFIXLEN_SPECIFIED) {
+		if (filter.mdst.bitlen >= 0 &&
+		    filter.mdst.bitlen < r->rtm_dst_len)
+			return 0;
+	}
-	if (filter.rsrc.family &&
-	    (r->rtm_family != filter.rsrc.family || filter.rsrc.bitlen > r->rtm_src_len))
-		return 0;
+	if (filter.rsrc.family) {
+		if (r->rtm_family != filter.rsrc.family ||
+		    filter.rsrc.bitlen > r->rtm_src_len)
+			return 0;
+	} else if (filter.rsrc.flags & PREFIXLEN_SPECIFIED) {
+		if (filter.rsrc.bitlen > r->rtm_src_len)
+			return 0;
+	}
-	if (filter.msrc.family &&
-	    (r->rtm_family != filter.msrc.family ||
-	     (filter.msrc.bitlen >= 0 && filter.msrc.bitlen < r->rtm_src_len)))
-		return 0;
+	if (filter.msrc.family) {
+		if (r->rtm_family != filter.msrc.family ||
+		    (filter.msrc.bitlen >= 0 &&
+		     filter.msrc.bitlen < r->rtm_src_len))
+			return 0;
+	} else if (filter.msrc.flags & PREFIXLEN_SPECIFIED) {
+		if (filter.msrc.bitlen >= 0 &&
+		    filter.msrc.bitlen < r->rtm_src_len)
+			return 0;
+	}
 	if (filter.rvia.family) {
 		int family = r->rtm_family;
 
@@ -221,7 +243,9 @@  static int filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
 
 	if (tb[RTA_DST])
 		memcpy(&dst.data, RTA_DATA(tb[RTA_DST]), (r->rtm_dst_len+7)/8);
-	if (filter.rsrc.family || filter.msrc.family) {
+	if (filter.rsrc.family || filter.msrc.family ||
+	    filter.rsrc.flags & PREFIXLEN_SPECIFIED ||
+	    filter.msrc.flags & PREFIXLEN_SPECIFIED) {
 		if (tb[RTA_SRC])
 			memcpy(&src.data, RTA_DATA(tb[RTA_SRC]), (r->rtm_src_len+7)/8);
 	}
@@ -241,15 +265,18 @@  static int filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
 			memcpy(&prefsrc.data, RTA_DATA(tb[RTA_PREFSRC]), host_len/8);
 	}
 
-	if (filter.rdst.family && inet_addr_match(&dst, &filter.rdst, filter.rdst.bitlen))
+	if ((filter.rdst.family || filter.rdst.flags & PREFIXLEN_SPECIFIED) &&
+	    inet_addr_match(&dst, &filter.rdst, filter.rdst.bitlen))
 		return 0;
-	if (filter.mdst.family && filter.mdst.bitlen >= 0 &&
+	if ((filter.mdst.family || filter.mdst.flags & PREFIXLEN_SPECIFIED) &&
 	    inet_addr_match(&dst, &filter.mdst, r->rtm_dst_len))
 		return 0;
 
-	if (filter.rsrc.family && inet_addr_match(&src, &filter.rsrc, filter.rsrc.bitlen))
+	if ((filter.rsrc.family || filter.rsrc.flags & PREFIXLEN_SPECIFIED) &&
+	    inet_addr_match(&src, &filter.rsrc, filter.rsrc.bitlen))
 		return 0;
-	if (filter.msrc.family && filter.msrc.bitlen >= 0 &&
+	if ((filter.msrc.family || filter.msrc.flags & PREFIXLEN_SPECIFIED) &&
+	    filter.msrc.bitlen >= 0 &&
 	    inet_addr_match(&src, &filter.msrc, r->rtm_src_len))
 		return 0;
 
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -590,6 +590,7 @@  int get_prefix_1(inet_prefix *dst, char *arg, int family)
 		dst->family = family;
 		dst->bytelen = 0;
 		dst->bitlen = 0;
+		dst->flags |= PREFIXLEN_SPECIFIED;
 		return 0;
 	}