diff mbox

[v2,6/6] ipv6: Do route updating for redirect in ndisc layer

Message ID 5232806B.6050601@cn.fujitsu.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Duan Jiong Sept. 13, 2013, 3:03 a.m. UTC
From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Do the whole verification and route updating in ndisc
lay and then just call into icmpv6_notify() to notify
the upper protocols.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 include/net/ip6_route.h |  3 ---
 net/ipv6/ndisc.c        |  6 ++----
 net/ipv6/route.c        | 29 ++---------------------------
 3 files changed, 4 insertions(+), 34 deletions(-)

Comments

David Miller Sept. 18, 2013, 12:29 a.m. UTC | #1
From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri, 13 Sep 2013 11:03:07 +0800

> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> 
> Do the whole verification and route updating in ndisc
> lay and then just call into icmpv6_notify() to notify
> the upper protocols.
> 
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>

This is completely broken, and I believe your patch set fundamentally
is too.

We absolutely _must_ handle the redirect at the socket level when
we are able to, otherwise we cannot specify the mark properly and
the mark is an essential part of the key used to find the correct
route to work with.

I am not applying this patch series until you deal with this
deficiency.  I am not willing to consider changes which stop using the
more precise keying information available from a socket.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Sept. 18, 2013, 1:39 a.m. UTC | #2
On Tue, Sep 17, 2013 at 08:29:36PM -0400, David Miller wrote:
> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> Date: Fri, 13 Sep 2013 11:03:07 +0800
> 
> > From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> > 
> > Do the whole verification and route updating in ndisc
> > lay and then just call into icmpv6_notify() to notify
> > the upper protocols.
> > 
> > Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> 
> This is completely broken, and I believe your patch set fundamentally
> is too.
> 
> We absolutely _must_ handle the redirect at the socket level when
> we are able to, otherwise we cannot specify the mark properly and
> the mark is an essential part of the key used to find the correct
> route to work with.
> 
> I am not applying this patch series until you deal with this
> deficiency.  I am not willing to consider changes which stop using the
> more precise keying information available from a socket.

Oh, Duan, I am very sorry for not catching this earlier. We use the
sk->mark to select the proper routing table where we clone the rt6_info into.
And we only get that value out of the sockets. I missed that. We should leave
the redirect logic in the socket layer where it is possible.

But parts of this series are still valid. We need to fix redirects for tunnels
and I do think we can still simplify some code in the error handlers.

Thanks for pointing this out,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duan Jiong Sept. 18, 2013, 1:52 a.m. UTC | #3
于 2013年09月18日 09:39, Hannes Frederic Sowa 写道:
> On Tue, Sep 17, 2013 at 08:29:36PM -0400, David Miller wrote:
>> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>> Date: Fri, 13 Sep 2013 11:03:07 +0800
>>
>>> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>>>
>>> Do the whole verification and route updating in ndisc
>>> lay and then just call into icmpv6_notify() to notify
>>> the upper protocols.
>>>
>>> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>>
>> This is completely broken, and I believe your patch set fundamentally
>> is too.
>>
>> We absolutely _must_ handle the redirect at the socket level when
>> we are able to, otherwise we cannot specify the mark properly and
>> the mark is an essential part of the key used to find the correct
>> route to work with.
>>
>> I am not applying this patch series until you deal with this
>> deficiency.  I am not willing to consider changes which stop using the
>> more precise keying information available from a socket.
> 
> Oh, Duan, I am very sorry for not catching this earlier. We use the
> sk->mark to select the proper routing table where we clone the rt6_info into.
> And we only get that value out of the sockets. I missed that. We should leave
> the redirect logic in the socket layer where it is possible.
> 
> But parts of this series are still valid. We need to fix redirects for tunnels
> and I do think we can still simplify some code in the error handlers.
> 

I got it.

Thanks,
  Duan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Sept. 18, 2013, 4:13 a.m. UTC | #4
On Wed, Sep 18, 2013 at 09:52:42AM +0800, Duan Jiong wrote:
> 于 2013年09月18日 09:39, Hannes Frederic Sowa 写道:
> > On Tue, Sep 17, 2013 at 08:29:36PM -0400, David Miller wrote:
> >> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> >> Date: Fri, 13 Sep 2013 11:03:07 +0800
> >>
> >>> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> >>>
> >>> Do the whole verification and route updating in ndisc
> >>> lay and then just call into icmpv6_notify() to notify
> >>> the upper protocols.
> >>>
> >>> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> >>
> >> This is completely broken, and I believe your patch set fundamentally
> >> is too.
> >>
> >> We absolutely _must_ handle the redirect at the socket level when
> >> we are able to, otherwise we cannot specify the mark properly and
> >> the mark is an essential part of the key used to find the correct
> >> route to work with.
> >>
> >> I am not applying this patch series until you deal with this
> >> deficiency.  I am not willing to consider changes which stop using the
> >> more precise keying information available from a socket.
> > 
> > Oh, Duan, I am very sorry for not catching this earlier. We use the
> > sk->mark to select the proper routing table where we clone the rt6_info into.
> > And we only get that value out of the sockets. I missed that. We should leave
> > the redirect logic in the socket layer where it is possible.
> > 
> > But parts of this series are still valid. We need to fix redirects for tunnels
> > and I do think we can still simplify some code in the error handlers.
> > 
> 
> I got it.

I gave it a bit more thought:

RFC 4861 8.3:
"
   Redirect messages apply to all flows that are being sent to a given
   destination.  That is, upon receipt of a Redirect for a Destination
   Address, all Destination Cache entries to that address should be
   updated to use the specified next-hop, regardless of the contents of
   the Flow Label field that appears in the Redirected Header option.
"

Especially because redirects also help in the on-link determination (same
RFC, section 8), I changed my mind and am still in favour of updating it
in the ndisc layer. In my opinion we just have to consider all routing
tables and apply the update to every one which carries a valid next hop
to the source of the redirect (under consideration of the destination).

This will be important if we actually try to get linux to correctly
implement the ipv6 subnet model (RFC 5942, Section 4 Rule 1). In that
case we are not allowed to assume nodes on-link even if they would match
the same prefix as a locally configured address.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duan Jiong Sept. 18, 2013, 11:57 a.m. UTC | #5
于 2013年09月18日 12:13, Hannes Frederic Sowa 写道:
> On Wed, Sep 18, 2013 at 09:52:42AM +0800, Duan Jiong wrote:
>> 于 2013年09月18日 09:39, Hannes Frederic Sowa 写道:
>>> On Tue, Sep 17, 2013 at 08:29:36PM -0400, David Miller wrote:
>>>> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>>>> Date: Fri, 13 Sep 2013 11:03:07 +0800
>>>>
>>>>> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>>>>>
>>>>> Do the whole verification and route updating in ndisc
>>>>> lay and then just call into icmpv6_notify() to notify
>>>>> the upper protocols.
>>>>>
>>>>> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>>>>
>>>> This is completely broken, and I believe your patch set fundamentally
>>>> is too.
>>>>
>>>> We absolutely _must_ handle the redirect at the socket level when
>>>> we are able to, otherwise we cannot specify the mark properly and
>>>> the mark is an essential part of the key used to find the correct
>>>> route to work with.
>>>>
>>>> I am not applying this patch series until you deal with this
>>>> deficiency.  I am not willing to consider changes which stop using the
>>>> more precise keying information available from a socket.
>>>
>>> Oh, Duan, I am very sorry for not catching this earlier. We use the
>>> sk->mark to select the proper routing table where we clone the rt6_info into.
>>> And we only get that value out of the sockets. I missed that. We should leave
>>> the redirect logic in the socket layer where it is possible.
>>>
>>> But parts of this series are still valid. We need to fix redirects for tunnels
>>> and I do think we can still simplify some code in the error handlers.
>>>
>>
>> I got it.
> 
> I gave it a bit more thought:
> 
> RFC 4861 8.3:
> "
>    Redirect messages apply to all flows that are being sent to a given
>    destination.  That is, upon receipt of a Redirect for a Destination
>    Address, all Destination Cache entries to that address should be
>    updated to use the specified next-hop, regardless of the contents of
>    the Flow Label field that appears in the Redirected Header option.
> "
> 
> Especially because redirects also help in the on-link determination (same
> RFC, section 8), I changed my mind and am still in favour of updating it
> in the ndisc layer. In my opinion we just have to consider all routing
> tables and apply the update to every one which carries a valid next hop
> to the source of the redirect (under consideration of the destination).
> 
> This will be important if we actually try to get linux to correctly
> implement the ipv6 subnet model (RFC 5942, Section 4 Rule 1). In that
> case we are not allowed to assume nodes on-link even if they would match
> the same prefix as a locally configured address.
> 

I think this need a little time to discuss with David Miller, so i will send
the other patchs to fix redirects for tunnels and to fix the sk->sk_err
problems in udpv6_err()/rawv6_err().

Thanks,
  Duan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 9, 2013, 1:43 a.m. UTC | #6
Hi Duan!

On Wed, Sep 18, 2013 at 06:13:37AM +0200, Hannes Frederic Sowa wrote:
> Especially because redirects also help in the on-link determination (same
> RFC, section 8), I changed my mind and am still in favour of updating it
> in the ndisc layer. In my opinion we just have to consider all routing
> tables and apply the update to every one which carries a valid next hop
> to the source of the redirect (under consideration of the destination).
> 
> This will be important if we actually try to get linux to correctly
> implement the ipv6 subnet model (RFC 5942, Section 4 Rule 1). In that
> case we are not allowed to assume nodes on-link even if they would match
> the same prefix as a locally configured address.

I am playing around with a simple patch which does suppress adding routing
information for the on-link assumption we currently do in linux.

Are you intereseted in following up on this? I still do think we should update
not only the routing table the socket uses but all routing tables which have a
valid route towards the router which emitted the redirect.

I try to check if we actually handle redirect messages when ECMP routes are in
use correctly.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duan Jiong Oct. 9, 2013, 7 a.m. UTC | #7
于 2013年10月09日 09:43, Hannes Frederic Sowa 写道:
> Hi Duan!
> 
> On Wed, Sep 18, 2013 at 06:13:37AM +0200, Hannes Frederic Sowa wrote:
>> Especially because redirects also help in the on-link determination (same
>> RFC, section 8), I changed my mind and am still in favour of updating it
>> in the ndisc layer. In my opinion we just have to consider all routing
>> tables and apply the update to every one which carries a valid next hop
>> to the source of the redirect (under consideration of the destination).
>>
>> This will be important if we actually try to get linux to correctly
>> implement the ipv6 subnet model (RFC 5942, Section 4 Rule 1). In that
>> case we are not allowed to assume nodes on-link even if they would match
>> the same prefix as a locally configured address.
> 
> I am playing around with a simple patch which does suppress adding routing
> information for the on-link assumption we currently do in linux.
> 
> Are you intereseted in following up on this? I still do think we should update
> not only the routing table the socket uses but all routing tables which have a
> valid route towards the router which emitted the redirect.
> 

No, thanks, i have got other things on my mind, more important, but i will
pay attention to it.

Thanks,
  Duan.

> I try to check if we actually handle redirect messages when ECMP routes are in
> use correctly.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index f525e70..5db259e 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -133,9 +133,6 @@  extern void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 extern void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk,
 			       __be32 mtu);
 extern void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark);
-extern void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
-				   u32 mark);
-extern void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk);
 
 struct netlink_callback;
 
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index f8a55ff..6bd1b41 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1368,11 +1368,9 @@  static void ndisc_redirect_rcv(struct sk_buff *skb)
 	if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts))
 		return;
 
-	if (!ndopts.nd_opts_rh) {
-		ip6_redirect_no_header(skb, dev_net(skb->dev),
-					skb->dev->ifindex, 0);
+	ip6_redirect(skb, dev_net(skb->dev), skb->dev->ifindex, 0);
+	if (!ndopts.nd_opts_rh)
 		return;
-	}
 
 	hdr = (u8 *)ndopts.nd_opts_rh;
 	hdr += 8;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c979dd9..151bd6c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1227,27 +1227,7 @@  static struct dst_entry *ip6_route_redirect(struct net *net,
 				flags, __ip6_route_redirect);
 }
 
-void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark)
-{
-	const struct ipv6hdr *iph = (struct ipv6hdr *) skb->data;
-	struct dst_entry *dst;
-	struct flowi6 fl6;
-
-	memset(&fl6, 0, sizeof(fl6));
-	fl6.flowi6_oif = oif;
-	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
-	fl6.daddr = iph->daddr;
-	fl6.saddr = iph->saddr;
-	fl6.flowlabel = ip6_flowinfo(iph);
-
-	dst = ip6_route_redirect(net, &fl6, &ipv6_hdr(skb)->saddr);
-	rt6_do_redirect(dst, NULL, skb);
-	dst_release(dst);
-}
-EXPORT_SYMBOL_GPL(ip6_redirect);
-
-void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
+void ip6_redirect(struct sk_buff *skb, struct net *net, int oif,
 			    u32 mark)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -1266,12 +1246,7 @@  void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
 	rt6_do_redirect(dst, NULL, skb);
 	dst_release(dst);
 }
-
-void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk)
-{
-	ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, sk->sk_mark);
-}
-EXPORT_SYMBOL_GPL(ip6_sk_redirect);
+EXPORT_SYMBOL_GPL(ip6_redirect);
 
 static unsigned int ip6_default_advmss(const struct dst_entry *dst)
 {