diff mbox

[net] ipv6: no need to return rt->dst.error if it is not null entry.

Message ID 20170724030907.GC2938@leo.usersys.redhat.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Hangbin Liu July 24, 2017, 3:09 a.m. UTC
Hi Cong,
On Fri, Jul 21, 2017 at 11:42:46AM -0700, Cong Wang wrote:
> On Thu, Jul 20, 2017 at 8:23 AM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> > 2017-07-20 23:06 GMT+08:00 Hangbin Liu <liuhangbin@gmail.com>:
> >>> +++ b/net/ipv6/route.c
> >>> @@ -3637,12 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
> >>>                 dst = ip6_route_lookup(net, &fl6, 0);
> >>>
> >>>         rt = container_of(dst, struct rt6_info, dst);
> >>> -       if (rt->dst.error) {
> >>> -               err = rt->dst.error;
> >>> -               ip6_rt_put(rt);
> >>> -               goto errout;
> >>> -       }
> >>
> >> hmm... or instead of remove this check, should we check all the entry? Like
> >> if ((rt->dst.error && rt != net->ipv6.ip6_null_entry && rt !=
> >                                                     ^^ mistake here
> >> net->ipv6.ip6_blk_hole_entry) ||
> >>      rt == net->ipv6.ip6_null_entry )
> >
> > Sorry,  there should be no need to check ip6_null_entry since the
> > error is already
> > -ENETUNREACH. So how about
> 
> Hmm? All of these 3 entries have error set, right??
> So we should only check dst.error...

A question, since you have fixed the ip6_null_entry->rt6i_idev issue via

  ipv6: initialize route null entry in addrconf_init()
  ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
  ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
  ipv6: avoid unregistering inet6_dev for loopback

Do we still need this net->ipv6.ip6_null_entry check? How about remove all
the checks?



Before this patch:
+ ip netns exec client ip -6 route get 2003::1
RTNETLINK answers: Network is unreachable

After this patch:
+ ip netns exec client ip -6 route get 2003::1
unreachable 2003::1 dev lo table unspec proto kernel src 2001::1 metric 4294967295 error -101


Thanks
Hangbin

Comments

Cong Wang July 24, 2017, 7:57 p.m. UTC | #1
On Sun, Jul 23, 2017 at 8:09 PM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> Do we still need this net->ipv6.ip6_null_entry check? How about remove all
> the checks?

I believe you only need to check for rt->dst.error, no need to check against
NULL or ip6_null_entry.

Take a look at other ip6_route_lookup() callers.
Hangbin Liu July 25, 2017, 12:08 a.m. UTC | #2
On Mon, Jul 24, 2017 at 12:57:43PM -0700, Cong Wang wrote:
> On Sun, Jul 23, 2017 at 8:09 PM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> > Do we still need this net->ipv6.ip6_null_entry check? How about remove all
> > the checks?
> 
> I believe you only need to check for rt->dst.error, no need to check against
> NULL or ip6_null_entry.
> 
> Take a look at other ip6_route_lookup() callers.

Yes, I saw it. That why I send v2 patch to check both rt->dst.error and
ip6_null_entry.

The question is the other two caller are rpfilter_lookup_reverse6() and
nft_fib6_eval(). From the code it looks these two caller only care about
device match.

         if (rt->rt6i_idev->dev == dev || (flags & XT_RPFILTER_LOOSE))
                 ret = true;

And the device would be lo if it is ip6_null_entry. So they just discard it.
I'm not familiar with netfilter, Please correct me if I make any mistake.

But what we want in inet6_rtm_getroute() and rt6_dump_route() is to
get/dump the route info. So we should get the info even it's unreachable or
prohibit.

That's why I think we should remove both rt->dst.error and ip6_null_entry
check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
check in rt6_dump_route().

What do you think?

Thanks
Hangbin
David Ahern July 25, 2017, 3:28 a.m. UTC | #3
On 7/24/17 6:08 PM, Hangbin Liu wrote:
> That's why I think we should remove both rt->dst.error and ip6_null_entry
> check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
> check in rt6_dump_route().

git blame net/ipv6/route.c

find the commits and review.
Hangbin Liu July 25, 2017, 7:32 a.m. UTC | #4
On Mon, Jul 24, 2017 at 09:28:07PM -0600, David Ahern wrote:
> On 7/24/17 6:08 PM, Hangbin Liu wrote:
> > That's why I think we should remove both rt->dst.error and ip6_null_entry
> > check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
> > check in rt6_dump_route().
> 
> git blame net/ipv6/route.c
> 
> find the commits and review.

Hi David,

Sorry, would you like to give me more hints?

I saw your commit 1f17e2f2c8a8 ("net: ipv6: ignore null_entry on route
dumps"). But I think this issue has been fixed by

2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") and
242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")

I use a reproducer which add unreachable route in netns with lo down. And I
could not trigger Panic in the fixed kernel. That's why I think we could
remove ip6_null_entry check in rt6_dump_route().

Please correct me if I make any mistake.

Thanks
Hangbin
Cong Wang July 25, 2017, 5:49 p.m. UTC | #5
On Mon, Jul 24, 2017 at 5:08 PM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> But what we want in inet6_rtm_getroute() and rt6_dump_route() is to
> get/dump the route info. So we should get the info even it's unreachable or
> prohibit.


If you want to dump prohibit/blackhole entry, then you have to check
for null_entry, and rt->dst.error check is still needed because we
could return error on other normal entries too, IOW, your v2 is correct
if dumping prohibit/blackhole is expected.

I thought we don't dump them. I am not sure about the behavior either.
Hangbin Liu July 26, 2017, 9:18 a.m. UTC | #6
On Tue, Jul 25, 2017 at 10:49:05AM -0700, Cong Wang wrote:
> On Mon, Jul 24, 2017 at 5:08 PM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> > But what we want in inet6_rtm_getroute() and rt6_dump_route() is to
> > get/dump the route info. So we should get the info even it's unreachable or
> > prohibit.
> 
> If you want to dump prohibit/blackhole entry, then you have to check
> for null_entry, and rt->dst.error check is still needed because we

I could not reproduce the NULL rt6i_idev issue after you route init fix, So
I think it's also safe to dump null_entry now. There is a v3 patch. After
the patch, we could dump unreachable route info like:

+ ip netns exec client ip -6 route get 2003::1
unreachable 2003::1 dev lo table unspec proto kernel metric 4294967295 error -101

> could return error on other normal entries too, IOW, your v2 is correct
> if dumping prohibit/blackhole is expected.

Would you hlep review the v3 patch? I prepare not touch the check in
function rt6_dump_route() now.

Thanks
Hangbin
David Ahern July 26, 2017, 5:18 p.m. UTC | #7
On 7/25/17 1:32 AM, Hangbin Liu wrote:
> On Mon, Jul 24, 2017 at 09:28:07PM -0600, David Ahern wrote:
>> On 7/24/17 6:08 PM, Hangbin Liu wrote:
>>> That's why I think we should remove both rt->dst.error and ip6_null_entry
>>> check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
>>> check in rt6_dump_route().
>>
>> git blame net/ipv6/route.c
>>
>> find the commits and review.
> 
> Hi David,
> 
> Sorry, would you like to give me more hints?
> 
> I saw your commit 1f17e2f2c8a8 ("net: ipv6: ignore null_entry on route
> dumps"). But I think this issue has been fixed by
> 
> 2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") and
> 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
> 
> I use a reproducer which add unreachable route in netns with lo down. And I
> could not trigger Panic in the fixed kernel. That's why I think we could
> remove ip6_null_entry check in rt6_dump_route().

Understood. Cong's patch to fix the intialization order (lo device
before route init) makes sure the idev is not null. That said, the
null_entry route is internal ipv6 logic and is not a route entry to be
returned to userspace.
Roopa Prabhu July 26, 2017, 6:27 p.m. UTC | #8
On Wed, Jul 26, 2017 at 10:18 AM, David Ahern <dsahern@gmail.com> wrote:
> On 7/25/17 1:32 AM, Hangbin Liu wrote:
>> On Mon, Jul 24, 2017 at 09:28:07PM -0600, David Ahern wrote:
>>> On 7/24/17 6:08 PM, Hangbin Liu wrote:
>>>> That's why I think we should remove both rt->dst.error and ip6_null_entry
>>>> check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
>>>> check in rt6_dump_route().
>>>
>>> git blame net/ipv6/route.c
>>>
>>> find the commits and review.
>>
>> Hi David,
>>
>> Sorry, would you like to give me more hints?
>>
>> I saw your commit 1f17e2f2c8a8 ("net: ipv6: ignore null_entry on route
>> dumps"). But I think this issue has been fixed by
>>
>> 2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") and
>> 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
>>
>> I use a reproducer which add unreachable route in netns with lo down. And I
>> could not trigger Panic in the fixed kernel. That's why I think we could
>> remove ip6_null_entry check in rt6_dump_route().
>
> Understood. Cong's patch to fix the intialization order (lo device
> before route init) makes sure the idev is not null. That said, the
> null_entry route is internal ipv6 logic and is not a route entry to be
> returned to userspace.

agreed...so looks like the check in v3 should be


+       if ( rt == net->ipv6.ip6_null_entry ||
+            (rt->dst.error &&
+ #ifdef CONFIG_IPV6_MULTIPLE_TABLES
+              rt != net->ipv6.ip6_prohibit_entry &&
+              rt != net->ipv6.ip6_blk_hole_entry &&
+#endif
+             )) {
                err = rt->dst.error;
                ip6_rt_put(rt);
                goto errout;
David Ahern July 26, 2017, 6:49 p.m. UTC | #9
On 7/26/17 12:27 PM, Roopa Prabhu wrote:
> agreed...so looks like the check in v3 should be
> 
> 
> +       if ( rt == net->ipv6.ip6_null_entry ||
> +            (rt->dst.error &&
> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
> +              rt != net->ipv6.ip6_prohibit_entry &&
> +              rt != net->ipv6.ip6_blk_hole_entry &&
> +#endif
> +             )) {
>                 err = rt->dst.error;
>                 ip6_rt_put(rt);
>                 goto errout;
> 

I don't think so. If I add a prohibit route and use the fibmatch
attribute, I want to see the route from the FIB that was matched.
Roopa Prabhu July 26, 2017, 6:55 p.m. UTC | #10
On Wed, Jul 26, 2017 at 11:49 AM, David Ahern <dsahern@gmail.com> wrote:
> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
>> agreed...so looks like the check in v3 should be
>>
>>
>> +       if ( rt == net->ipv6.ip6_null_entry ||
>> +            (rt->dst.error &&
>> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>> +              rt != net->ipv6.ip6_prohibit_entry &&
>> +              rt != net->ipv6.ip6_blk_hole_entry &&
>> +#endif
>> +             )) {
>>                 err = rt->dst.error;
>>                 ip6_rt_put(rt);
>>                 goto errout;
>>
>
> I don't think so. If I add a prohibit route and use the fibmatch
> attribute, I want to see the route from the FIB that was matched.


yes, exactly. wouldn't  'rt != net->ipv6.ip6_prohibit_entry' above let
it fall through to the route fill code ?

ah...but i guess you are saying that they will have rt6_info's of
their own and will not match. got it. ack.
Cong Wang July 28, 2017, 4:56 a.m. UTC | #11
On Wed, Jul 26, 2017 at 11:49 AM, David Ahern <dsahern@gmail.com> wrote:
> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
>> agreed...so looks like the check in v3 should be
>>
>>
>> +       if ( rt == net->ipv6.ip6_null_entry ||
>> +            (rt->dst.error &&
>> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>> +              rt != net->ipv6.ip6_prohibit_entry &&
>> +              rt != net->ipv6.ip6_blk_hole_entry &&
>> +#endif
>> +             )) {
>>                 err = rt->dst.error;
>>                 ip6_rt_put(rt);
>>                 goto errout;
>>
>
> I don't think so. If I add a prohibit route and use the fibmatch
> attribute, I want to see the route from the FIB that was matched.

But net->ipv6.ip6_prohibit_entry is not the prohibit route you can
add in user-space, it is only used by rule actions. So do you really
want to dump it?? My gut feeling is no, but I am definitely not sure.

When you add a prohibit route, a new rt is allocated dynamically,
net->ipv6.ip6_prohibit_entry is relatively static, internal and is the
only one per netns. (Same for net->ipv6.ip6_blk_hole_entry)

I think Hangbin's example doesn't have ip rules, so this case
is not shown up.
Hangbin Liu July 28, 2017, 11:04 a.m. UTC | #12
On Thu, Jul 27, 2017 at 09:56:08PM -0700, Cong Wang wrote:
> On Wed, Jul 26, 2017 at 11:49 AM, David Ahern <dsahern@gmail.com> wrote:
> > On 7/26/17 12:27 PM, Roopa Prabhu wrote:
> >> agreed...so looks like the check in v3 should be
> >>
> >>
> >> +       if ( rt == net->ipv6.ip6_null_entry ||
> >> +            (rt->dst.error &&
> >> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
> >> +              rt != net->ipv6.ip6_prohibit_entry &&
> >> +              rt != net->ipv6.ip6_blk_hole_entry &&
> >> +#endif
> >> +             )) {
> >>                 err = rt->dst.error;
> >>                 ip6_rt_put(rt);
> >>                 goto errout;
> >>
> >
> > I don't think so. If I add a prohibit route and use the fibmatch
> > attribute, I want to see the route from the FIB that was matched.
> 
> But net->ipv6.ip6_prohibit_entry is not the prohibit route you can
> add in user-space, it is only used by rule actions. So do you really
> want to dump it?? My gut feeling is no, but I am definitely not sure.
> 
> When you add a prohibit route, a new rt is allocated dynamically,
> net->ipv6.ip6_prohibit_entry is relatively static, internal and is the
> only one per netns. (Same for net->ipv6.ip6_blk_hole_entry)
> 
> I think Hangbin's example doesn't have ip rules, so this case
> is not shown up.

I mixed the rule entry and route entry these days. And with your help I can
separate them now.

When first time I find the rt->dst.error return directly issue, I was testing
ip rule actually.

e.g.
+ ip netns exec client ip -6 rule add to 2003::1/64 table 100 unreachable
+ ip netns exec server ip -6 rule add to 2001::1/64 table 100 prohibit
+ ip netns exec client ip -6 route get 2003::1
RTNETLINK answers: Network is unreachable
+ ip netns exec client ip -6 route get 2001::1
RTNETLINK answers: Permission denied

After check I thought we returned net->ipv6.ip6_null_entry /
net->ipv6.ip6_blk_hole_entry in function fib6_rule_action().

That's the reason I want to delete both rt->dst.error and
net->ipv6.ip6_null_entry check in patch v2 and v3.

Then with David's comments I realise we also need to take care about ip route
entrys.

my last mail's comment:
> Thanks for your explains. Now I know where I made the mistake. I mis-looked
> FR_ACT_UNREACHABLE to RTN_UNREACHABLE and thought we return rt =
> net->ipv6.ip6_null_entry in fib6_rule_action().

But then I fall in to the code logic and get lost... And thought
FR_ACT_UNREACHABLE and RTN_UNREACHABLE are not same. Today I re-check the
code and realise RTN_UNREACHABLE is defined in user space and FR_ACT_UNREACHABLE
is in kernel. Actually they are the same.

So after PATCH v4, we fixed the route side. And part of ip rule(prohibit and
blk hole). I will think over of this.

Thanks
Hangbin
David Ahern July 28, 2017, 3:10 p.m. UTC | #13
On 7/27/17 10:56 PM, Cong Wang wrote:
> On Wed, Jul 26, 2017 at 11:49 AM, David Ahern <dsahern@gmail.com> wrote:
>> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
>>> agreed...so looks like the check in v3 should be
>>>
>>>
>>> +       if ( rt == net->ipv6.ip6_null_entry ||
>>> +            (rt->dst.error &&
>>> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>>> +              rt != net->ipv6.ip6_prohibit_entry &&
>>> +              rt != net->ipv6.ip6_blk_hole_entry &&
>>> +#endif
>>> +             )) {
>>>                 err = rt->dst.error;
>>>                 ip6_rt_put(rt);
>>>                 goto errout;
>>>
>>
>> I don't think so. If I add a prohibit route and use the fibmatch
>> attribute, I want to see the route from the FIB that was matched.
> 
> But net->ipv6.ip6_prohibit_entry is not the prohibit route you can
> add in user-space, it is only used by rule actions. So do you really
> want to dump it?? My gut feeling is no, but I am definitely not sure.
> 
> When you add a prohibit route, a new rt is allocated dynamically,
> net->ipv6.ip6_prohibit_entry is relatively static, internal and is the
> only one per netns. (Same for net->ipv6.ip6_blk_hole_entry)
> 
> I think Hangbin's example doesn't have ip rules, so this case
> is not shown up.
> 

Understood. The v4 patch returns getroute to the original behavior. The
original behavior returned a route entry not just an error code. The
following is at 5dafc87f40d7 which the commit before roopa's patch set:

# ip -6 ru add to 6000::/120 prohibit
# ip -6 ro get 6000::1
prohibit 6000::1 from :: dev lo proto kernel src 2001:db8::3 metric
4294967295  error -13 pref medium

# ip -6 ro add vrf red prohibit 5000::1/120
# ip -6 ro get vrf red 5000::1
prohibit 5000::1 from :: dev lo table red src 2001:db8::3 metric 1024
error -13 pref medium

Generically, the only time you get just an error response is when the
lookup fails to find a match and returns the null_entry which has
dst.error = -ENETUNREACH.


Now to your point about the new fibmatch option I have gone back and
forth but in the end I think returning the route associated with the FIB
rule is better than just failing with an error code.

Roopa?
Roopa Prabhu July 28, 2017, 5:13 p.m. UTC | #14
On Fri, Jul 28, 2017 at 8:10 AM, David Ahern <dsahern@gmail.com> wrote:
> On 7/27/17 10:56 PM, Cong Wang wrote:
>> On Wed, Jul 26, 2017 at 11:49 AM, David Ahern <dsahern@gmail.com> wrote:
>>> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
>>>> agreed...so looks like the check in v3 should be
>>>>
>>>>
>>>> +       if ( rt == net->ipv6.ip6_null_entry ||
>>>> +            (rt->dst.error &&
>>>> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>>>> +              rt != net->ipv6.ip6_prohibit_entry &&
>>>> +              rt != net->ipv6.ip6_blk_hole_entry &&
>>>> +#endif
>>>> +             )) {
>>>>                 err = rt->dst.error;
>>>>                 ip6_rt_put(rt);
>>>>                 goto errout;
>>>>
>>>
>>> I don't think so. If I add a prohibit route and use the fibmatch
>>> attribute, I want to see the route from the FIB that was matched.
>>
>> But net->ipv6.ip6_prohibit_entry is not the prohibit route you can
>> add in user-space, it is only used by rule actions. So do you really
>> want to dump it?? My gut feeling is no, but I am definitely not sure.
>>
>> When you add a prohibit route, a new rt is allocated dynamically,
>> net->ipv6.ip6_prohibit_entry is relatively static, internal and is the
>> only one per netns. (Same for net->ipv6.ip6_blk_hole_entry)
>>
>> I think Hangbin's example doesn't have ip rules, so this case
>> is not shown up.
>>
>
> Understood. The v4 patch returns getroute to the original behavior. The
> original behavior returned a route entry not just an error code. The
> following is at 5dafc87f40d7 which the commit before roopa's patch set:
>
> # ip -6 ru add to 6000::/120 prohibit
> # ip -6 ro get 6000::1
> prohibit 6000::1 from :: dev lo proto kernel src 2001:db8::3 metric
> 4294967295  error -13 pref medium
>
> # ip -6 ro add vrf red prohibit 5000::1/120
> # ip -6 ro get vrf red 5000::1
> prohibit 5000::1 from :: dev lo table red src 2001:db8::3 metric 1024
> error -13 pref medium
>
> Generically, the only time you get just an error response is when the
> lookup fails to find a match and returns the null_entry which has
> dst.error = -ENETUNREACH.
>
>
> Now to your point about the new fibmatch option I have gone back and
> forth but in the end I think returning the route associated with the FIB
> rule is better than just failing with an error code.
>
> Roopa?

for fibmatch, my original intent was to return with an error code.
This is similar
to the ipv4 behavior. One option is to keep the check in there and put
the 'fibmatch'
condition around it. But, i do want to make sure that for the fibmatch case,
it does not return an error directly on an existing prohibit route
entry in the fib.
This is probably doable by checking for appropriate
net->ipv6.ip6_prohibit_entry entries.
David Ahern July 28, 2017, 5:39 p.m. UTC | #15
On 7/28/17 11:13 AM, Roopa Prabhu wrote:
> for fibmatch, my original intent was to return with an error code.
> This is similar
> to the ipv4 behavior. One option is to keep the check in there and put
> the 'fibmatch'
> condition around it. But, i do want to make sure that for the fibmatch case,
> it does not return an error directly on an existing prohibit route
> entry in the fib.
> This is probably doable by checking for appropriate
> net->ipv6.ip6_prohibit_entry entries.
> 

IPv4 does not have the notion of null_entry or prohibit route entries
which makes IPv4 and IPv6 inconsistent - something we really need to be
avoiding from a user experience.

We have the following cases:

# ip -4 rule  add to 172.16.60.0/24 prohibit
# ip -4 route add prohibit 172.16.50.0/24
# ip -6 rule  add to 6000::/120 prohibit
# ip -6 route add prohibit 5000::/120


Behavior before Roopa's patch set:
  Rule match:
    # ip ro get 172.16.60.1
    RTNETLINK answers: Permission denied

    # ip -6 ro get 6000::1
    prohibit 6000::1 from :: dev lo proto kernel src 2001:db8::3 metric
4294967295  error -13 pref medium

  Route match:
    # ip ro get 172.16.50.1
    RTNETLINK answers: Permission denied

    # ip -6 ro get 5000::1
    prohibit 5000::1 from :: dev lo table red src 2001:db8::3 metric
1024  error -13 pref medium


Behavior after Roopa's patch set:
  Rule match:
    # ip ro get 172.16.60.1
    RTNETLINK answers: Permission denied

    # ip -6 ro get 6000::1
    RTNETLINK answers: Permission denied

  Route match:
    # ip ro get 172.16.50.1
    RTNETLINK answers: Permission denied

    # ip -6 ro get 5000::1
    RTNETLINK answers: Permission denied


So Roopa's fibmatch patches brings consistency between IPv4 and IPv6 at
the cost of breaking backwards compatibility for IPv6 when the prohibit
or blackhole routes are hit.

If that is not acceptable, then let's wrap the change in 'if (fibmatch)'
so that when fibmatch is requested we have consistency between IPv4 and
IPv6 when it is set.
Roopa Prabhu July 28, 2017, 7:52 p.m. UTC | #16
On Fri, Jul 28, 2017 at 10:39 AM, David Ahern <dsahern@gmail.com> wrote:
> On 7/28/17 11:13 AM, Roopa Prabhu wrote:
>> for fibmatch, my original intent was to return with an error code.
>> This is similar
>> to the ipv4 behavior. One option is to keep the check in there and put
>> the 'fibmatch'
>> condition around it. But, i do want to make sure that for the fibmatch case,
>> it does not return an error directly on an existing prohibit route
>> entry in the fib.
>> This is probably doable by checking for appropriate
>> net->ipv6.ip6_prohibit_entry entries.
>>
>
> IPv4 does not have the notion of null_entry or prohibit route entries
> which makes IPv4 and IPv6 inconsistent - something we really need to be
> avoiding from a user experience.
>
> We have the following cases:
>
> # ip -4 rule  add to 172.16.60.0/24 prohibit
> # ip -4 route add prohibit 172.16.50.0/24
> # ip -6 rule  add to 6000::/120 prohibit
> # ip -6 route add prohibit 5000::/120
>
>
> Behavior before Roopa's patch set:
>   Rule match:
>     # ip ro get 172.16.60.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 6000::1
>     prohibit 6000::1 from :: dev lo proto kernel src 2001:db8::3 metric
> 4294967295  error -13 pref medium
>
>   Route match:
>     # ip ro get 172.16.50.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 5000::1
>     prohibit 5000::1 from :: dev lo table red src 2001:db8::3 metric
> 1024  error -13 pref medium
>
>
> Behavior after Roopa's patch set:
>   Rule match:
>     # ip ro get 172.16.60.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 6000::1
>     RTNETLINK answers: Permission denied
>
>   Route match:
>     # ip ro get 172.16.50.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 5000::1
>     RTNETLINK answers: Permission denied
>
>
> So Roopa's fibmatch patches brings consistency between IPv4 and IPv6 at
> the cost of breaking backwards compatibility for IPv6 when the prohibit
> or blackhole routes are hit.
>
> If that is not acceptable, then let's wrap the change in 'if (fibmatch)'
> so that when fibmatch is requested we have consistency between IPv4 and
> IPv6 when it is set.


David, Thanks for listing all the cases and options.

for the route match fibmatch case, if a prohibit route entry exists
(added by user), I was hoping fibmatch can return that entry...

 # ip -6 ro get fibmatch 5000::1
    prohibit 5000::1 from :: dev lo

because the semantics of fibmatch is to return the matching route
entry if exists.
I am assuming that is possible with appropriate checks around the
dst.error check for fibmatch. what do you say ?
I need to verify if this can work for ipv4 the same way.
David Ahern July 29, 2017, 2:41 p.m. UTC | #17
On 7/28/17 1:52 PM, Roopa Prabhu wrote:
> On Fri, Jul 28, 2017 at 10:39 AM, David Ahern <dsahern@gmail.com> wrote:
>> IPv4 does not have the notion of null_entry or prohibit route entries
>> which makes IPv4 and IPv6 inconsistent - something we really need to be
>> avoiding from a user experience.
>>
>> We have the following cases:
>>
>> # ip -4 rule  add to 172.16.60.0/24 prohibit
>> # ip -4 route add prohibit 172.16.50.0/24
>> # ip -6 rule  add to 6000::/120 prohibit
>> # ip -6 route add prohibit 5000::/120
>>
>>
>> Behavior before Roopa's patch set:
>>   Rule match:
>>     # ip ro get 172.16.60.1
>>     RTNETLINK answers: Permission denied
>>
>>     # ip -6 ro get 6000::1
>>     prohibit 6000::1 from :: dev lo proto kernel src 2001:db8::3 metric
>> 4294967295  error -13 pref medium
>>
>>   Route match:
>>     # ip ro get 172.16.50.1
>>     RTNETLINK answers: Permission denied
>>
>>     # ip -6 ro get 5000::1
>>     prohibit 5000::1 from :: dev lo table red src 2001:db8::3 metric
>> 1024  error -13 pref medium
>>
>>
>> Behavior after Roopa's patch set:
>>   Rule match:
>>     # ip ro get 172.16.60.1
>>     RTNETLINK answers: Permission denied
>>
>>     # ip -6 ro get 6000::1
>>     RTNETLINK answers: Permission denied
>>
>>   Route match:
>>     # ip ro get 172.16.50.1
>>     RTNETLINK answers: Permission denied
>>
>>     # ip -6 ro get 5000::1
>>     RTNETLINK answers: Permission denied
>>
>>
>> So Roopa's fibmatch patches brings consistency between IPv4 and IPv6 at
>> the cost of breaking backwards compatibility for IPv6 when the prohibit
>> or blackhole routes are hit.
>>
>> If that is not acceptable, then let's wrap the change in 'if (fibmatch)'
>> so that when fibmatch is requested we have consistency between IPv4 and
>> IPv6 when it is set.
> 
> 
> David, Thanks for listing all the cases and options.
> 
> for the route match fibmatch case, if a prohibit route entry exists
> (added by user), I was hoping fibmatch can return that entry...
> 
>  # ip -6 ro get fibmatch 5000::1
>     prohibit 5000::1 from :: dev lo
> 
> because the semantics of fibmatch is to return the matching route
> entry if exists.
> I am assuming that is possible with appropriate checks around the
> dst.error check for fibmatch. what do you say ?
> I need to verify if this can work for ipv4 the same way.
> 

Routes such as prohibit, blackhole and unreachable cause
fib_table_lookup to return an error (see the fib_props reference in
fib_table_lookup) and the IPv4 code does not generate an rtable for
them. IMO it does not make sense to complicate the general IPv4 lookup
code to create an rtable for these 'special' routes just for the
fibmatch getroute case. Furthermore, I think consistency between IPv4
and IPv6 is important from a programmability perspective which is what
we have now.

DaveM: as I outlined above, Roopa's fibmatch patches caused a change in
user behavior in IPv6 getroute for prohibit, blackhole and unreachable
route entries. Opinions on whether we should limit that new behavior to
just the fibmatch lookup in which case a patch is needed or take the new
behavior and consistency in which case nothing is needed?
Cong Wang July 31, 2017, 6:37 p.m. UTC | #18
On Fri, Jul 28, 2017 at 10:39 AM, David Ahern <dsahern@gmail.com> wrote:
> On 7/28/17 11:13 AM, Roopa Prabhu wrote:
>> for fibmatch, my original intent was to return with an error code.
>> This is similar
>> to the ipv4 behavior. One option is to keep the check in there and put
>> the 'fibmatch'
>> condition around it. But, i do want to make sure that for the fibmatch case,
>> it does not return an error directly on an existing prohibit route
>> entry in the fib.
>> This is probably doable by checking for appropriate
>> net->ipv6.ip6_prohibit_entry entries.
>>
>
> IPv4 does not have the notion of null_entry or prohibit route entries
> which makes IPv4 and IPv6 inconsistent - something we really need to be
> avoiding from a user experience.
>
> We have the following cases:
>
> # ip -4 rule  add to 172.16.60.0/24 prohibit
> # ip -4 route add prohibit 172.16.50.0/24
> # ip -6 rule  add to 6000::/120 prohibit
> # ip -6 route add prohibit 5000::/120
>
>
> Behavior before Roopa's patch set:
>   Rule match:
>     # ip ro get 172.16.60.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 6000::1
>     prohibit 6000::1 from :: dev lo proto kernel src 2001:db8::3 metric
> 4294967295  error -13 pref medium
>
>   Route match:
>     # ip ro get 172.16.50.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 5000::1
>     prohibit 5000::1 from :: dev lo table red src 2001:db8::3 metric
> 1024  error -13 pref medium
>
>
> Behavior after Roopa's patch set:
>   Rule match:
>     # ip ro get 172.16.60.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 6000::1
>     RTNETLINK answers: Permission denied
>
>   Route match:
>     # ip ro get 172.16.50.1
>     RTNETLINK answers: Permission denied
>
>     # ip -6 ro get 5000::1
>     RTNETLINK answers: Permission denied
>

There must be a reason why we allocate prohibit entries
dynamically for IPv6 despite we already have a (relatively)
static one.

From this point of view, we need to dump them, that is,
restore the behavior before Roopa's patch.


>
> So Roopa's fibmatch patches brings consistency between IPv4 and IPv6 at
> the cost of breaking backwards compatibility for IPv6 when the prohibit
> or blackhole routes are hit.
>

There are already many differences between IPv4 and
IPv6 behaviors, I don't see why this one is so special
that we have to make it consistent.
David Ahern July 31, 2017, 6:40 p.m. UTC | #19
On 7/31/17 12:37 PM, Cong Wang wrote:
> 
>> So Roopa's fibmatch patches brings consistency between IPv4 and IPv6 at
>> the cost of breaking backwards compatibility for IPv6 when the prohibit
>> or blackhole routes are hit.
>>
> There are already many differences between IPv4 and
> IPv6 behaviors, I don't see why this one is so special
> that we have to make it consistent.

yes, and I have sent a number of patches closing the gap.
diff mbox

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d30c96..b5e3fe0 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3637,17 +3637,6 @@  static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
                dst = ip6_route_lookup(net, &fl6, 0);

        rt = container_of(dst, struct rt6_info, dst);
-       if (rt->dst.error) {
-               err = rt->dst.error;
-               ip6_rt_put(rt);
-               goto errout;
-       }
-
-       if (rt == net->ipv6.ip6_null_entry) {
-               err = rt->dst.error;
-               ip6_rt_put(rt);
-               goto errout;
-       }

        skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
        if (!skb) {