diff mbox

[2/3] route: set iif and oif information in flowi struct

Message ID 1322511292-1413-3-git-send-email-ulrich.weber@sophos.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Ulrich Weber Nov. 28, 2011, 8:14 p.m. UTC
Outgoing packets have loopback interface as incoming interface.

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
---
 net/ipv4/route.c        |    4 ++++
 net/ipv4/xfrm4_policy.c |   19 +++++++++++++++++--
 net/ipv6/fib6_rules.c   |   10 ++++++++--
 net/ipv6/xfrm6_policy.c |   18 ++++++++++++++++--
 4 files changed, 45 insertions(+), 6 deletions(-)

Comments

Julian Anastasov Nov. 28, 2011, 11:53 p.m. UTC | #1
Hello,

On Mon, 28 Nov 2011, Ulrich Weber wrote:

> Outgoing packets have loopback interface as incoming interface.
> 
> Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
> ---
>  net/ipv4/route.c        |    4 ++++
>  net/ipv4/xfrm4_policy.c |   19 +++++++++++++++++--
>  net/ipv6/fib6_rules.c   |   10 ++++++++--
>  net/ipv6/xfrm6_policy.c |   18 ++++++++++++++++--
>  4 files changed, 45 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index fb47c8f..1702ec0 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2744,6 +2744,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4)
>  				flp4->saddr = rth->rt_src;
>  			if (!flp4->daddr)
>  				flp4->daddr = rth->rt_dst;
> +			if (!flp4->flowi4_iif)
> +				flp4->flowi4_iif = net->loopback_dev->ifindex;
> +			if (!flp4->flowi4_oif)
> +				flp4->flowi4_oif = rth->rt_iif;

	May be setting flowi4_oif unconditionally here is more
correct because ip_route_output_slow fills flowi4_oif with
the selected oif, it can even change the provided original
oif in flowi4_oif. What about this?:

			flp4->flowi4_oif = rth->dst.dev->ifindex;

	OTOH, rt_iif has some complex semantic: original oif
or the selected oif. May be you prefer flowi4_oif to hold
the selected oif, right?

	I see one dangerous place that must be checked:
icmp_route_lookup. Before now __ip_route_output_key was
called after xfrm_decode_session_reverse with 0 in
flowi4_oif, i.e. no oif binding was used. But now when
decode_session sets flowi4_oif we will restrict the route
via this interface?

>  			return rth;
>  		}
>  		RT_CACHE_STAT_INC(out_hlist_search);

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 30, 2011, 12:01 a.m. UTC | #2
From: Ulrich Weber <ulrich.weber@Sophos.com>
Date: Mon, 28 Nov 2011 21:14:51 +0100

> +	}
> +	else {
 ...
> +	}
> +	else {

Besides Julian's comments, these are not properly formatted, use:

	} else {

instead.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulrich Weber Nov. 30, 2011, 5:21 p.m. UTC | #3
On 29.11.2011 00:53, Julian Anastasov wrote:
>
> 	May be setting flowi4_oif unconditionally here is more
> correct because ip_route_output_slow fills flowi4_oif with
> the selected oif, it can even change the provided original
> oif in flowi4_oif. What about this?:
>
> 			flp4->flowi4_oif = rth->dst.dev->ifindex;
>
> 	OTOH, rt_iif has some complex semantic: original oif
> or the selected oif. May be you prefer flowi4_oif to hold
> the selected oif, right?
I wasn't aware the ip_route_output_slow() might change the original oif.
You know why this might happen? Shouldn't fib_lookup only return
a route matching the given oif?

Anyway, if thats the case your code above is more correct. The packet
should always match the xfrm policy where it was originally routed.

> 	I see one dangerous place that must be checked:
> icmp_route_lookup. Before now __ip_route_output_key was
> called after xfrm_decode_session_reverse with 0 in
> flowi4_oif, i.e. no oif binding was used. But now when
> decode_session sets flowi4_oif we will restrict the route
> via this interface?
Thanks for the hint! Yes the current patch will force the ICMP packet
over the received interface.

Will add "fl4_dec.flowi4_oif = 0;" in case the saddr is local, so the
behavior will be the same. fl4_dec.flowi4_oif will then be set in
_ip_route_output_key() again.

Cheers
Ulrich
Julian Anastasov Nov. 30, 2011, 10:37 p.m. UTC | #4
Hello,

On Wed, 30 Nov 2011, Ulrich Weber wrote:

> > 	OTOH, rt_iif has some complex semantic: original oif
> > or the selected oif. May be you prefer flowi4_oif to hold
> > the selected oif, right?
> I wasn't aware the ip_route_output_slow() might change the original oif.
> You know why this might happen? Shouldn't fib_lookup only return
> a route matching the given oif?

	There are exceptions but only in this form:

	dev_out = net->loopback_dev;

	i.e. when traffic is diverted to loopback device.
In all others cases the provided oif is considered and never
changed.

> Anyway, if thats the case your code above is more correct. The packet
> should always match the xfrm policy where it was originally routed.
> 
> > 	I see one dangerous place that must be checked:
> > icmp_route_lookup. Before now __ip_route_output_key was
> > called after xfrm_decode_session_reverse with 0 in
> > flowi4_oif, i.e. no oif binding was used. But now when
> > decode_session sets flowi4_oif we will restrict the route
> > via this interface?
> Thanks for the hint! Yes the current patch will force the ICMP packet
> over the received interface.
> 
> Will add "fl4_dec.flowi4_oif = 0;" in case the saddr is local, so the
> behavior will be the same. fl4_dec.flowi4_oif will then be set in
> _ip_route_output_key() again.

	Yes, I think this should work. I now see another
unrelated problem with this code. _decode_session4 uses
fl4->flowi4_tos = iph->tos; But we then feed fl4_dec
to __ip_route_output_key without applying IPTOS_RT_MASK.
__ip_route_output_key does not work with plain TOS from IP
header.

	Currently, the situation around flowi4_tos is as follows:

- flowi4_tos:
	- (IPTOS_RT_MASK | RTO_ONLINK) bits when provided to
	__ip_route_output_key

	- on return above value is preserved if route is
	in cache, it is not changed

	- only IPTOS_RT_MASK bits are preserved by
	ip_route_output_slow if the route is not cached
	because flowi4_tos holds the bits that are
	matched by ip rules tos XX

	As result, we can not rely on the RTO_ONLINK bit
	being valid on return. IPTOS_RT_MASK bits are preserved.

- rt_key_tos:
	- (IPTOS_RT_MASK | RTO_ONLINK) bits

	- we have a bug here: ip_route_output_slow filters
	the provided flowi4_tos by removing RTO_ONLINK bit,
	then __mkroute_output tries to get original value
	with the RTO_ONLINK bit but ends up with the
	modified value. We need to provide orig_tos
	to __mkroute_output.

	In XFRM. Currently only xfrm_bundle_create calls
xfrm_get_tos to get a routable TOS value and such value is
provided to xfrm_dst_lookup. It means __xfrm4_dst_lookup gets
such filtered TOS value, lets name it rtos (routing tos) and
provides it to output routing correctly.

	xfrm4_fill_dst probably correctly copies
flowi4_tos into rt_key_tos if fl is result of output route
and not returned by decode_session. Of course, RTO_ONLINK
can be lost. But may be problem can happen at least for icmp.c
for the xfrm_decode_session_reverse -> xfrm_lookup code
where a full IP TOS can be copied by xfrm4_fill_dst:

	rt->rt_key_tos = fl4->flowi4_tos;

	What significance has this assignment? Should we have
valid RTO_ONLINK bit in flowi4_tos ?

	So, the fl4->flowi4_tos = iph->tos code does not
look nice, I don't know if this field is used for some
matching and we need to hold all IP TOS bits. If not, may be
we do not need to play games and have to use this instead:

	fl4->flowi4_tos = iph->tos & IPTOS_RT_MASK;

	because flowi4_tos is a rtos, not an IP TOS field.
Then we can reach xfrm_lookup or __ip_route_output_key safely
after _decode_session4.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fb47c8f..1702ec0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2744,6 +2744,10 @@  struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4)
 				flp4->saddr = rth->rt_src;
 			if (!flp4->daddr)
 				flp4->daddr = rth->rt_dst;
+			if (!flp4->flowi4_iif)
+				flp4->flowi4_iif = net->loopback_dev->ifindex;
+			if (!flp4->flowi4_oif)
+				flp4->flowi4_oif = rth->rt_iif;
 			return rth;
 		}
 		RT_CACHE_STAT_INC(out_hlist_search);
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index a0b4c5d..ad9c620 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -110,6 +110,8 @@  static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 static void
 _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse)
 {
+	const struct net *net = dev_net(skb->dev);
+	const struct rtable *rt = skb_rtable(skb);
 	const struct iphdr *iph = ip_hdr(skb);
 	u8 *xprth = skb_network_header(skb) + iph->ihl * 4;
 	struct flowi4 *fl4 = &fl->u.ip4;
@@ -185,9 +187,22 @@  _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse)
 		}
 	}
 	fl4->flowi4_proto = iph->protocol;
-	fl4->daddr = reverse ? iph->saddr : iph->daddr;
-	fl4->saddr = reverse ? iph->daddr : iph->saddr;
 	fl4->flowi4_tos = iph->tos;
+
+	if (reverse) {
+		fl4->daddr = iph->saddr;
+		fl4->saddr = iph->daddr;
+		fl4->flowi4_oif = skb->skb_iif ?: net->loopback_dev->ifindex;
+		if (rt)
+			fl4->flowi4_iif = rt->rt_oif ?: rt->dst.dev->ifindex;
+	}
+	else {
+		fl4->daddr = iph->daddr;
+		fl4->saddr = iph->saddr;
+		fl4->flowi4_iif = skb->skb_iif ?: net->loopback_dev->ifindex;
+		if (rt)
+			fl4->flowi4_oif = rt->rt_oif ?: rt->dst.dev->ifindex;
+	}
 }
 
 static inline int xfrm4_garbage_collect(struct dst_ops *ops)
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index b6c5731..fd520b5 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -108,8 +108,14 @@  again:
 discard_pkt:
 	dst_hold(&rt->dst);
 out:
-	arg->result = rt;
-	return rt == NULL ? -EAGAIN : 0;
+	if ((arg->result = rt)) {
+		if (!flp6->flowi6_iif)
+			flp6->flowi6_iif = net->loopback_dev->ifindex;
+		if (!flp6->flowi6_oif)
+			flp6->flowi6_oif = rt->rt6i_dev->ifindex;
+		return 0;
+	}
+	return -EAGAIN;
 }
 
 
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 8ea65e0..7c0196f 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -121,6 +121,8 @@  static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 static inline void
 _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 {
+	const struct net *net = dev_net(skb->dev);
+	const struct rt6_info *rt = (struct rt6_info*)skb_dst(skb);
 	struct flowi6 *fl6 = &fl->u.ip6;
 	int onlyproto = 0;
 	u16 offset = skb_network_header_len(skb);
@@ -132,8 +134,20 @@  _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 	memset(fl6, 0, sizeof(struct flowi6));
 	fl6->flowi6_mark = skb->mark;
 
-	fl6->daddr = reverse ? hdr->saddr : hdr->daddr;
-	fl6->saddr = reverse ? hdr->daddr : hdr->saddr;
+	if (reverse) {
+		fl6->daddr = hdr->saddr;
+		fl6->saddr = hdr->daddr;
+		fl6->flowi6_oif = skb->skb_iif ?: net->loopback_dev->ifindex;
+		if (rt)
+			fl6->flowi6_iif = rt->rt6i_dev->ifindex;
+	}
+	else {
+		fl6->daddr = hdr->daddr;
+		fl6->saddr = hdr->saddr;
+		fl6->flowi6_iif = skb->skb_iif ?: net->loopback_dev->ifindex;
+		if (rt)
+			fl6->flowi6_oif = rt->rt6i_dev->ifindex;
+	}
 
 	while (nh + offset + 1 < skb->data ||
 	       pskb_may_pull(skb, nh + offset + 1 - skb->data)) {