diff mbox

[net-next,v6,3/3] ipv6: tcp_ipv6 policy route issue

Message ID 1396056451-5600-4-git-send-email-wangyufen@huawei.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Wang Yufen March 29, 2014, 1:27 a.m. UTC
From: Wang Yufen <wangyufen@huawei.com>

The issue raises when adding policy route, specify a particular
NIC as oif, the policy route did not take effect. The reason is
that fl6.oif is not set and route map failed. From the 
tcp_v6_send_response function, if the binding address is linklocal,
fl6.oif is set, but not for global address.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
---
 net/ipv6/tcp_ipv6.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Comments

Lorenzo Colitti April 10, 2014, 9:23 a.m. UTC | #1
On Sat, Mar 29, 2014 at 10:27 AM, Wangyufen <wangyufen@huawei.com> wrote:
> The issue raises when adding policy route, specify a particular
> NIC as oif, the policy route did not take effect. The reason is
> that fl6.oif is not set and route map failed. From the
> tcp_v6_send_response function, if the binding address is linklocal,
> fl6.oif is set, but not for global address.
>
> [...]
>
>         fl6.flowi6_proto = IPPROTO_TCP;
> -       if (ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LINKLOCAL)
> +       if (rt6_need_strict(&fl6.daddr) || !oif)
>                 fl6.flowi6_oif = inet6_iif(skb);

> +       else
> +               fl6.flowi6_oif = oif;

Shouldn't this be && !oif instead of || !oif? It seems to me that the
logic should be:

1. If sk->sk_bound_dev_if is set, use that interface.
2. Otherwise, if the connection came from a link-local address, use
the incoming interface.
3. Otherwise, use whatever route the system happens to have without
special regard to the incoming interface.

If so, then I think the code now does the wrong thing in two cases:

1. If the SYN comes from a global address, and sk->sk_bound_dev_if is
not set, the SYNACK is forced onto/prefers the interface the SYN came
in on instead of just doing a routing lookup with no interface.
2. If the SYN comes from a link-local address, and sk->sk_bound_dev_if
is set, then the SYNACK is forced onto/prefers the incoming interface
instead of the one specified by sk->sk_bound_dev_if.

If I am correct, then I'm happy to send out the trivial patch to fix
this. (Against what? net? net-next when the tree reopens?)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa April 10, 2014, 9:57 p.m. UTC | #2
Hi Lorenzo!

On Thu, Apr 10, 2014 at 06:23:35PM +0900, Lorenzo Colitti wrote:
> On Sat, Mar 29, 2014 at 10:27 AM, Wangyufen <wangyufen@huawei.com> wrote:
> > The issue raises when adding policy route, specify a particular
> > NIC as oif, the policy route did not take effect. The reason is
> > that fl6.oif is not set and route map failed. From the
> > tcp_v6_send_response function, if the binding address is linklocal,
> > fl6.oif is set, but not for global address.
> >
> > [...]
> >
> >         fl6.flowi6_proto = IPPROTO_TCP;
> > -       if (ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LINKLOCAL)
> > +       if (rt6_need_strict(&fl6.daddr) || !oif)
> >                 fl6.flowi6_oif = inet6_iif(skb);
> 
> > +       else
> > +               fl6.flowi6_oif = oif;
> 
> Shouldn't this be && !oif instead of || !oif? It seems to me that the
> logic should be:
> 
> 1. If sk->sk_bound_dev_if is set, use that interface.
> 2. Otherwise, if the connection came from a link-local address, use
> the incoming interface.
> 3. Otherwise, use whatever route the system happens to have without
> special regard to the incoming interface.
> 
> If so, then I think the code now does the wrong thing in two cases:
> 
> 1. If the SYN comes from a global address, and sk->sk_bound_dev_if is
> not set, the SYNACK is forced onto/prefers the interface the SYN came
> in on instead of just doing a routing lookup with no interface.

First a rule lookup is done on the oif (if needed). After that a address
lookup is done in the fib and only if rt6_need_strict evaluates to
true in routing code we take flowi6_oif match as mandatory (we may
evaluate sk_bound_dev_if!=0 there to make sure we really only use the
bounded interface for global addresses but keep the interface id which
is set in above code).

So we still would send out the syn packet on the path the global address
dictates in most cases (or in case of multipath routes, prefer the
incoming interface).  We differ if bound_dev is set or policy routes
are in place.

So it depends on what we give precedence and I have to agree, I would
prefer sk_bound_dev_if as we do in other output paths. I misjudged that
when I proposed the code snippet. Thanks for the heads-up.

> 2. If the SYN comes from a link-local address, and sk->sk_bound_dev_if
> is set, then the SYNACK is forced onto/prefers the incoming interface
> instead of the one specified by sk->sk_bound_dev_if.
> 
> If I am correct, then I'm happy to send out the trivial patch to fix
> this. (Against what? net? net-next when the tree reopens?)

-net tree is always open and I would welcome a patch very much.

Thank you,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Colitti April 11, 2014, 4:21 a.m. UTC | #3
On Fri, Apr 11, 2014 at 6:57 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> > If I am correct, then I'm happy to send out the trivial patch to fix
> > this. (Against what? net? net-next when the tree reopens?)
>
> -net tree is always open and I would welcome a patch very much.

Ack. I sent out http://patchwork.ozlabs.org/patch/338343/ .
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 10b7c04..5ca56ce 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -726,7 +726,7 @@  static const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops = {
 #endif
 
 static void tcp_v6_send_response(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
-				 u32 tsval, u32 tsecr,
+				 u32 tsval, u32 tsecr, int oif,
 				 struct tcp_md5sig_key *key, int rst, u8 tclass,
 				 u32 label)
 {
@@ -798,8 +798,10 @@  static void tcp_v6_send_response(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
 	__tcp_v6_send_check(buff, &fl6.saddr, &fl6.daddr);
 
 	fl6.flowi6_proto = IPPROTO_TCP;
-	if (ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LINKLOCAL)
+	if (rt6_need_strict(&fl6.daddr) || !oif)
 		fl6.flowi6_oif = inet6_iif(skb);
+	else
+		fl6.flowi6_oif = oif;
 	fl6.fl6_dport = t1->dest;
 	fl6.fl6_sport = t1->source;
 	security_skb_classify_flow(skb, flowi6_to_flowi(&fl6));
@@ -833,6 +835,7 @@  static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
 	int genhash;
 	struct sock *sk1 = NULL;
 #endif
+	int oif;
 
 	if (th->rst)
 		return;
@@ -876,7 +879,8 @@  static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
 		ack_seq = ntohl(th->seq) + th->syn + th->fin + skb->len -
 			  (th->doff << 2);
 
-	tcp_v6_send_response(skb, seq, ack_seq, 0, 0, 0, key, 1, 0, 0);
+	oif = sk ? sk->sk_bound_dev_if : 0;
+	tcp_v6_send_response(skb, seq, ack_seq, 0, 0, 0, oif, key, 1, 0, 0);
 
 #ifdef CONFIG_TCP_MD5SIG
 release_sk1:
@@ -888,11 +892,11 @@  release_sk1:
 }
 
 static void tcp_v6_send_ack(struct sk_buff *skb, u32 seq, u32 ack,
-			    u32 win, u32 tsval, u32 tsecr,
+			    u32 win, u32 tsval, u32 tsecr, int oif,
 			    struct tcp_md5sig_key *key, u8 tclass,
 			    u32 label)
 {
-	tcp_v6_send_response(skb, seq, ack, win, tsval, tsecr, key, 0, tclass,
+	tcp_v6_send_response(skb, seq, ack, win, tsval, tsecr, oif, key, 0, tclass,
 			     label);
 }
 
@@ -904,7 +908,7 @@  static void tcp_v6_timewait_ack(struct sock *sk, struct sk_buff *skb)
 	tcp_v6_send_ack(skb, tcptw->tw_snd_nxt, tcptw->tw_rcv_nxt,
 			tcptw->tw_rcv_wnd >> tw->tw_rcv_wscale,
 			tcp_time_stamp + tcptw->tw_ts_offset,
-			tcptw->tw_ts_recent, tcp_twsk_md5_key(tcptw),
+			tcptw->tw_ts_recent, tw->tw_bound_dev_if, tcp_twsk_md5_key(tcptw),
 			tw->tw_tclass, (tw->tw_flowlabel << 12));
 
 	inet_twsk_put(tw);
@@ -914,7 +918,7 @@  static void tcp_v6_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
 				  struct request_sock *req)
 {
 	tcp_v6_send_ack(skb, tcp_rsk(req)->snt_isn + 1, tcp_rsk(req)->rcv_isn + 1,
-			req->rcv_wnd, tcp_time_stamp, req->ts_recent,
+			req->rcv_wnd, tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if,
 			tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
 			0, 0);
 }