Patchwork [1/2] ipv6: avoid blackhole and prohibited entries upon prefix purge [v3]

login
register
mail settings
Submitter Romain KUNTZ
Date Jan. 9, 2013, 2:37 p.m.
Message ID <6A08EDC1-08A0-411D-90CF-6DB1CB7FA3A0@ipflavors.com>
Download mbox | patch
Permalink /patch/210715/
State Superseded
Delegated to: David Miller
Headers show

Comments

Romain KUNTZ - Jan. 9, 2013, 2:37 p.m.
On Jan 8, 2013, at 18:18 , YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> wrote:
> Nicolas Dichtel wrote:
>> Le 08/01/2013 12:38, Romain KUNTZ a écrit :
>>> On Jan 7, 2013, at 16:43 , Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>> Le 07/01/2013 12:30, Romain KUNTZ a écrit :
>>>>> Hello Nicolas,
>>>>> 
>>>>> On Jan 7, 2013, at 11:25 , Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>>> 
>>>>>> Le 05/01/2013 22:44, Romain KUNTZ a écrit :
>>>>>>> Mobile IPv6 provokes a kernel Oops since commit 64c6d08e (ipv6:
>>>>>>> del unreachable route when an addr is deleted on lo), because
>>>>>>> ip6_route_lookup() may also return blackhole and prohibited
>>>>>>> entry. However, these entries have a NULL rt6i_table argument,
>>>>>>> which provokes an Oops in __ip6_del_rt() when trying to lock
>>>>>>> rt6i_table->tb6_lock.
>>>>>>> 
>>>>>>> Beside, when purging a prefix, blakhole and prohibited entries
>>>>>>> should not be selected because they are not what we are looking
>>>>>>> for.
>>>>>>> 
>>>>>>> We fix this by adding two new lookup flags (RT6_LOOKUP_F_NO_BLK_HOLE
>>>>>>> and RT6_LOOKUP_F_NO_PROHIBIT) in order to ensure that such entries
>>>>>>> are skipped during lookup and that the correct entry is returned.
>>>>>>> 
>>>>>>> [v2]: use 'goto out;' instead of 'goto again;' to avoid unnecessary
>>>>>>> oprations on rt (as suggested by Eric Dumazet).
>>>>>>> 
>>>>>>> Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
>>>>>>> ---
>>>>>>>  include/net/ip6_route.h |    2 ++
>>>>>>>  net/ipv6/addrconf.c     |    4 +++-
>>>>>>>  net/ipv6/fib6_rules.c   |    4 ++++
>>>>>>>  3 files changed, 9 insertions(+), 1 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
>>>>>>> index 27d8318..3c93743 100644
>>>>>>> --- a/include/net/ip6_route.h
>>>>>>> +++ b/include/net/ip6_route.h
>>>>>>> @@ -30,6 +30,8 @@ struct route_info {
>>>>>>>  #define RT6_LOOKUP_F_SRCPREF_TMP    0x00000008
>>>>>>>  #define RT6_LOOKUP_F_SRCPREF_PUBLIC    0x00000010
>>>>>>>  #define RT6_LOOKUP_F_SRCPREF_COA    0x00000020
>>>>>>> +#define RT6_LOOKUP_F_NO_BLK_HOLE    0x00000040
>>>>>>> +#define RT6_LOOKUP_F_NO_PROHIBIT    0x00000080
>>>>>>> 
>>>>>>>  /*
>>>>>>>   * rt6_srcprefs2flags() and rt6_flags2srcprefs() translate
>>>>>>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>>>>>>> index 408cac4a..1891e23 100644
>>>>>>> --- a/net/ipv6/addrconf.c
>>>>>>> +++ b/net/ipv6/addrconf.c
>>>>>>> @@ -948,7 +948,9 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
>>>>>>>          fl6.flowi6_oif = ifp->idev->dev->ifindex;
>>>>>>>          fl6.daddr = prefix;
>>>>>>>          rt = (struct rt6_info *)ip6_route_lookup(net, &fl6,
>>>>>>> -                             RT6_LOOKUP_F_IFACE);
>>>>>>> +                        RT6_LOOKUP_F_IFACE |
>>>>>>> +                        RT6_LOOKUP_F_NO_BLK_HOLE |
>>>>>>> +                        RT6_LOOKUP_F_NO_PROHIBIT);
>>>>>>> 
>>>>>>>          if (rt != net->ipv6.ip6_null_entry &&
>>>>>> Is it not simpler to test the result here (net->ipv6.ip6_blk_hole_entry and
>>>>>> net->ipv6.ip6_prohibit_entry) like for the null_entry?
>>>>>> It will also avoid adding more flags.
>>>>> 
>>>>> Your proposal would only solve part of the problem (the Oops in __ip6_del_rt()). Another problem here is that blackhole and prohibited rules should not be selected when trying to purge a prefix (correct me if I'm wrong) because they are not what we are looking for. This can prevent the targeted prefix from being purged.
>>>> In fact, I'm not sure to get the scenario. This part of the code just tries
>>>> to remove the connected prefix, added by the kernel when the address was added.
>>>> Can you describe your scenario?
>>> 
>>> 
>>> I should have given more details from the beginning, my mistake. The scenario where this happens is quite simple:
>>> 
>>> - install a blackhole rule (e.g. "from 2001:db8::1000 blackhole" - the source address does not matter at all) with the FIB_RULE_FIND_SADDR flag set (setting this flag is not possible with iproute2, but for test purpose you can use the enclosed patch against the latest iproute2 tree and then use "./ip -6 rule add from 2001:db8::1000/128 blackhole prio 1000").
>>> 
>>> - try to delete an address from one of your interface (any address, it can be different from the one you used for the blackhole rule): "ip -6 addr del <v6-addr>/64 dev eth<x>"
>>> 
>>> and you get an Oops. When trying to remove the connected prefix, the fib6_rule_match() function will match the blackhole rule because RT6_LOOKUP_F_HAS_SADDR is not set and FIB_RULE_FIND_SADDR is set.
>>> 
>>> With your proposal, the Oops is fixed but the connected prefix route is not deleted. With my initial patch, the Oops is fixed and the connected prefix route is also deleted.
>> Ok, I get it. I thin,there is two bugs: the oops and the wrong lookup.
>> 
>> Your proposal fix only a particular case. Try this (with your ip route2 patch):
>> ip -6 addr add 2002::1/64 dev eth0
>> ip -6 route add 2002::/64 table 257 dev eth0

(you also need to add a rule such as this one:)
ip -6 rule to 2002::/64 table 257

>> ip -6 addr del 2002::1/64 dev eth0
>> 
>> The route deleted is not the connected prefix, but the route added in table 257.

You are right.

>> The connected prefix is still here in the main table. It's not what we want.
>> Maybe the lookup should be done directly into the right table, ie table RT6_TABLE_PREFIX. What do you think?
> 
> I agree.  I think we can use addrconf_get_prefix_route() here.

Right, thanks for the hint! What about the below patch? 

Note that addrconf_get_prefix_route() also requires a fix (I believe it does not handle the 'noflags' parameter correctly), I have sent a patch in a separate mail (subject "ipv6: fix the noflags test in addrconf_get_prefix_route").

Thanks,
Romain



From 2a79f191042ee8d48119b095b2ef7527a89817fc Mon Sep 17 00:00:00 2001
From: Romain Kuntz <r.kuntz@ipflavors.com>
Date: Wed, 9 Jan 2013 15:11:08 +0100
Subject: [PATCH 1/1] ipv6: use addrconf_get_prefix_route for prefix route lookup

Replace ip6_route_lookup() with addrconf_get_prefix_route() when
looking up for a prefix route. This ensures that the connected prefix
is looked up in the main table, and avoids the selection of other
matching route located in different tables.

As a consequence, the function addrconf_is_prefix_route() is not
used anymore and is removed.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
---
 net/ipv6/addrconf.c |   24 ++++++++++--------------
 1 files changed, 10 insertions(+), 14 deletions(-)
Nicolas Dichtel - Jan. 9, 2013, 3:11 p.m.
Le 09/01/2013 15:37, Romain KUNTZ a écrit :
> On Jan 8, 2013, at 18:18 , YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> wrote:
>> Nicolas Dichtel wrote:
>>> Le 08/01/2013 12:38, Romain KUNTZ a écrit :
>>>> On Jan 7, 2013, at 16:43 , Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>>> Le 07/01/2013 12:30, Romain KUNTZ a écrit :
>>>>>> Hello Nicolas,
>>>>>>
>>>>>> On Jan 7, 2013, at 11:25 , Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>>>>
>>>>>>> Le 05/01/2013 22:44, Romain KUNTZ a écrit :
>>>>>>>> Mobile IPv6 provokes a kernel Oops since commit 64c6d08e (ipv6:
>>>>>>>> del unreachable route when an addr is deleted on lo), because
>>>>>>>> ip6_route_lookup() may also return blackhole and prohibited
>>>>>>>> entry. However, these entries have a NULL rt6i_table argument,
>>>>>>>> which provokes an Oops in __ip6_del_rt() when trying to lock
>>>>>>>> rt6i_table->tb6_lock.
>>>>>>>>
>>>>>>>> Beside, when purging a prefix, blakhole and prohibited entries
>>>>>>>> should not be selected because they are not what we are looking
>>>>>>>> for.
>>>>>>>>
>>>>>>>> We fix this by adding two new lookup flags (RT6_LOOKUP_F_NO_BLK_HOLE
>>>>>>>> and RT6_LOOKUP_F_NO_PROHIBIT) in order to ensure that such entries
>>>>>>>> are skipped during lookup and that the correct entry is returned.
>>>>>>>>
>>>>>>>> [v2]: use 'goto out;' instead of 'goto again;' to avoid unnecessary
>>>>>>>> oprations on rt (as suggested by Eric Dumazet).
>>>>>>>>
>>>>>>>> Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
>>>>>>>> ---
>>>>>>>>   include/net/ip6_route.h |    2 ++
>>>>>>>>   net/ipv6/addrconf.c     |    4 +++-
>>>>>>>>   net/ipv6/fib6_rules.c   |    4 ++++
>>>>>>>>   3 files changed, 9 insertions(+), 1 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
>>>>>>>> index 27d8318..3c93743 100644
>>>>>>>> --- a/include/net/ip6_route.h
>>>>>>>> +++ b/include/net/ip6_route.h
>>>>>>>> @@ -30,6 +30,8 @@ struct route_info {
>>>>>>>>   #define RT6_LOOKUP_F_SRCPREF_TMP    0x00000008
>>>>>>>>   #define RT6_LOOKUP_F_SRCPREF_PUBLIC    0x00000010
>>>>>>>>   #define RT6_LOOKUP_F_SRCPREF_COA    0x00000020
>>>>>>>> +#define RT6_LOOKUP_F_NO_BLK_HOLE    0x00000040
>>>>>>>> +#define RT6_LOOKUP_F_NO_PROHIBIT    0x00000080
>>>>>>>>
>>>>>>>>   /*
>>>>>>>>    * rt6_srcprefs2flags() and rt6_flags2srcprefs() translate
>>>>>>>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>>>>>>>> index 408cac4a..1891e23 100644
>>>>>>>> --- a/net/ipv6/addrconf.c
>>>>>>>> +++ b/net/ipv6/addrconf.c
>>>>>>>> @@ -948,7 +948,9 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
>>>>>>>>           fl6.flowi6_oif = ifp->idev->dev->ifindex;
>>>>>>>>           fl6.daddr = prefix;
>>>>>>>>           rt = (struct rt6_info *)ip6_route_lookup(net, &fl6,
>>>>>>>> -                             RT6_LOOKUP_F_IFACE);
>>>>>>>> +                        RT6_LOOKUP_F_IFACE |
>>>>>>>> +                        RT6_LOOKUP_F_NO_BLK_HOLE |
>>>>>>>> +                        RT6_LOOKUP_F_NO_PROHIBIT);
>>>>>>>>
>>>>>>>>           if (rt != net->ipv6.ip6_null_entry &&
>>>>>>> Is it not simpler to test the result here (net->ipv6.ip6_blk_hole_entry and
>>>>>>> net->ipv6.ip6_prohibit_entry) like for the null_entry?
>>>>>>> It will also avoid adding more flags.
>>>>>>
>>>>>> Your proposal would only solve part of the problem (the Oops in __ip6_del_rt()). Another problem here is that blackhole and prohibited rules should not be selected when trying to purge a prefix (correct me if I'm wrong) because they are not what we are looking for. This can prevent the targeted prefix from being purged.
>>>>> In fact, I'm not sure to get the scenario. This part of the code just tries
>>>>> to remove the connected prefix, added by the kernel when the address was added.
>>>>> Can you describe your scenario?
>>>>
>>>>
>>>> I should have given more details from the beginning, my mistake. The scenario where this happens is quite simple:
>>>>
>>>> - install a blackhole rule (e.g. "from 2001:db8::1000 blackhole" - the source address does not matter at all) with the FIB_RULE_FIND_SADDR flag set (setting this flag is not possible with iproute2, but for test purpose you can use the enclosed patch against the latest iproute2 tree and then use "./ip -6 rule add from 2001:db8::1000/128 blackhole prio 1000").
>>>>
>>>> - try to delete an address from one of your interface (any address, it can be different from the one you used for the blackhole rule): "ip -6 addr del <v6-addr>/64 dev eth<x>"
>>>>
>>>> and you get an Oops. When trying to remove the connected prefix, the fib6_rule_match() function will match the blackhole rule because RT6_LOOKUP_F_HAS_SADDR is not set and FIB_RULE_FIND_SADDR is set.
>>>>
>>>> With your proposal, the Oops is fixed but the connected prefix route is not deleted. With my initial patch, the Oops is fixed and the connected prefix route is also deleted.
>>> Ok, I get it. I thin,there is two bugs: the oops and the wrong lookup.
>>>
>>> Your proposal fix only a particular case. Try this (with your ip route2 patch):
>>> ip -6 addr add 2002::1/64 dev eth0
>>> ip -6 route add 2002::/64 table 257 dev eth0
>
> (you also need to add a rule such as this one:)
> ip -6 rule to 2002::/64 table 257
>
>>> ip -6 addr del 2002::1/64 dev eth0
>>>
>>> The route deleted is not the connected prefix, but the route added in table 257.
>
> You are right.
>
>>> The connected prefix is still here in the main table. It's not what we want.
>>> Maybe the lookup should be done directly into the right table, ie table RT6_TABLE_PREFIX. What do you think?
>>
>> I agree.  I think we can use addrconf_get_prefix_route() here.
>
> Right, thanks for the hint! What about the below patch?
>
> Note that addrconf_get_prefix_route() also requires a fix (I believe it does not handle the 'noflags' parameter correctly), I have sent a patch in a separate mail (subject "ipv6: fix the noflags test in addrconf_get_prefix_route").
>
> Thanks,
> Romain
>
>
>
>  From 2a79f191042ee8d48119b095b2ef7527a89817fc Mon Sep 17 00:00:00 2001
> From: Romain Kuntz <r.kuntz@ipflavors.com>
> Date: Wed, 9 Jan 2013 15:11:08 +0100
> Subject: [PATCH 1/1] ipv6: use addrconf_get_prefix_route for prefix route lookup
>
> Replace ip6_route_lookup() with addrconf_get_prefix_route() when
> looking up for a prefix route. This ensures that the connected prefix
> is looked up in the main table, and avoids the selection of other
> matching route located in different tables.
>
> As a consequence, the function addrconf_is_prefix_route() is not
> used anymore and is removed.
Because this patch also fix an oops, I think it's interesting to tell it in the 
commit log and point the commit that introduce this oops.

>
> Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
> ---
>   net/ipv6/addrconf.c |   24 ++++++++++--------------
>   1 files changed, 10 insertions(+), 14 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 29ba4ff..409dd47 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -154,6 +154,10 @@ static void addrconf_type_change(struct net_device *dev,
>   				 unsigned long event);
>   static int addrconf_ifdown(struct net_device *dev, int how);
>
> +static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
> +				int plen, const struct net_device *dev,
> +				u32 flags, u32 noflags);
These args should be aligned to the previous '('.

> +
>   static void addrconf_dad_start(struct inet6_ifaddr *ifp);
>   static void addrconf_dad_timer(unsigned long data);
>   static void addrconf_dad_completed(struct inet6_ifaddr *ifp);
> @@ -250,12 +254,6 @@ static inline bool addrconf_qdisc_ok(const struct net_device *dev)
>   	return !qdisc_tx_is_noop(dev);
>   }
>
> -/* Check if a route is valid prefix route */
> -static inline int addrconf_is_prefix_route(const struct rt6_info *rt)
> -{
> -	return (rt->rt6i_flags & (RTF_GATEWAY | RTF_DEFAULT)) == 0;
> -}
> -
>   static void addrconf_del_timer(struct inet6_ifaddr *ifp)
>   {
>   	if (del_timer(&ifp->timer))
> @@ -941,17 +939,15 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
>   	if ((ifp->flags & IFA_F_PERMANENT) && onlink < 1) {
>   		struct in6_addr prefix;
>   		struct rt6_info *rt;
> -		struct net *net = dev_net(ifp->idev->dev);
> -		struct flowi6 fl6 = {};
>
>   		ipv6_addr_prefix(&prefix, &ifp->addr, ifp->prefix_len);
> -		fl6.flowi6_oif = ifp->idev->dev->ifindex;
> -		fl6.daddr = prefix;
> -		rt = (struct rt6_info *)ip6_route_lookup(net, &fl6,
> -							 RT6_LOOKUP_F_IFACE);
>
> -		if (rt != net->ipv6.ip6_null_entry &&
> -		    addrconf_is_prefix_route(rt)) {
> +		rt = addrconf_get_prefix_route(&prefix,
> +					ifp->prefix_len,
> +					ifp->idev->dev,
> +					0, RTF_GATEWAY | RTF_DEFAULT);
Same here.

> +
> +		if (rt) {
>   			if (onlink == 0) {
>   				ip6_del_rt(rt);
>   				rt = NULL;
>
After, you can add my "Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>" ;-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 29ba4ff..409dd47 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -154,6 +154,10 @@  static void addrconf_type_change(struct net_device *dev,
 				 unsigned long event);
 static int addrconf_ifdown(struct net_device *dev, int how);
 
+static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
+				int plen, const struct net_device *dev,
+				u32 flags, u32 noflags);
+
 static void addrconf_dad_start(struct inet6_ifaddr *ifp);
 static void addrconf_dad_timer(unsigned long data);
 static void addrconf_dad_completed(struct inet6_ifaddr *ifp);
@@ -250,12 +254,6 @@  static inline bool addrconf_qdisc_ok(const struct net_device *dev)
 	return !qdisc_tx_is_noop(dev);
 }
 
-/* Check if a route is valid prefix route */
-static inline int addrconf_is_prefix_route(const struct rt6_info *rt)
-{
-	return (rt->rt6i_flags & (RTF_GATEWAY | RTF_DEFAULT)) == 0;
-}
-
 static void addrconf_del_timer(struct inet6_ifaddr *ifp)
 {
 	if (del_timer(&ifp->timer))
@@ -941,17 +939,15 @@  static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 	if ((ifp->flags & IFA_F_PERMANENT) && onlink < 1) {
 		struct in6_addr prefix;
 		struct rt6_info *rt;
-		struct net *net = dev_net(ifp->idev->dev);
-		struct flowi6 fl6 = {};
 
 		ipv6_addr_prefix(&prefix, &ifp->addr, ifp->prefix_len);
-		fl6.flowi6_oif = ifp->idev->dev->ifindex;
-		fl6.daddr = prefix;
-		rt = (struct rt6_info *)ip6_route_lookup(net, &fl6,
-							 RT6_LOOKUP_F_IFACE);
 
-		if (rt != net->ipv6.ip6_null_entry &&
-		    addrconf_is_prefix_route(rt)) {
+		rt = addrconf_get_prefix_route(&prefix,
+					ifp->prefix_len,
+					ifp->idev->dev,
+					0, RTF_GATEWAY | RTF_DEFAULT);
+
+		if (rt) {
 			if (onlink == 0) {
 				ip6_del_rt(rt);
 				rt = NULL;