Message ID | 1322511292-1413-3-git-send-email-ulrich.weber@sophos.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Hello, On Mon, 28 Nov 2011, Ulrich Weber wrote: > Outgoing packets have loopback interface as incoming interface. > > Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> > --- > net/ipv4/route.c | 4 ++++ > net/ipv4/xfrm4_policy.c | 19 +++++++++++++++++-- > net/ipv6/fib6_rules.c | 10 ++++++++-- > net/ipv6/xfrm6_policy.c | 18 ++++++++++++++++-- > 4 files changed, 45 insertions(+), 6 deletions(-) > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index fb47c8f..1702ec0 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -2744,6 +2744,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4) > flp4->saddr = rth->rt_src; > if (!flp4->daddr) > flp4->daddr = rth->rt_dst; > + if (!flp4->flowi4_iif) > + flp4->flowi4_iif = net->loopback_dev->ifindex; > + if (!flp4->flowi4_oif) > + flp4->flowi4_oif = rth->rt_iif; May be setting flowi4_oif unconditionally here is more correct because ip_route_output_slow fills flowi4_oif with the selected oif, it can even change the provided original oif in flowi4_oif. What about this?: flp4->flowi4_oif = rth->dst.dev->ifindex; OTOH, rt_iif has some complex semantic: original oif or the selected oif. May be you prefer flowi4_oif to hold the selected oif, right? I see one dangerous place that must be checked: icmp_route_lookup. Before now __ip_route_output_key was called after xfrm_decode_session_reverse with 0 in flowi4_oif, i.e. no oif binding was used. But now when decode_session sets flowi4_oif we will restrict the route via this interface? > return rth; > } > RT_CACHE_STAT_INC(out_hlist_search); Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Ulrich Weber <ulrich.weber@Sophos.com> Date: Mon, 28 Nov 2011 21:14:51 +0100 > + } > + else { ... > + } > + else { Besides Julian's comments, these are not properly formatted, use: } else { instead. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 29.11.2011 00:53, Julian Anastasov wrote: > > May be setting flowi4_oif unconditionally here is more > correct because ip_route_output_slow fills flowi4_oif with > the selected oif, it can even change the provided original > oif in flowi4_oif. What about this?: > > flp4->flowi4_oif = rth->dst.dev->ifindex; > > OTOH, rt_iif has some complex semantic: original oif > or the selected oif. May be you prefer flowi4_oif to hold > the selected oif, right? I wasn't aware the ip_route_output_slow() might change the original oif. You know why this might happen? Shouldn't fib_lookup only return a route matching the given oif? Anyway, if thats the case your code above is more correct. The packet should always match the xfrm policy where it was originally routed. > I see one dangerous place that must be checked: > icmp_route_lookup. Before now __ip_route_output_key was > called after xfrm_decode_session_reverse with 0 in > flowi4_oif, i.e. no oif binding was used. But now when > decode_session sets flowi4_oif we will restrict the route > via this interface? Thanks for the hint! Yes the current patch will force the ICMP packet over the received interface. Will add "fl4_dec.flowi4_oif = 0;" in case the saddr is local, so the behavior will be the same. fl4_dec.flowi4_oif will then be set in _ip_route_output_key() again. Cheers Ulrich
Hello, On Wed, 30 Nov 2011, Ulrich Weber wrote: > > OTOH, rt_iif has some complex semantic: original oif > > or the selected oif. May be you prefer flowi4_oif to hold > > the selected oif, right? > I wasn't aware the ip_route_output_slow() might change the original oif. > You know why this might happen? Shouldn't fib_lookup only return > a route matching the given oif? There are exceptions but only in this form: dev_out = net->loopback_dev; i.e. when traffic is diverted to loopback device. In all others cases the provided oif is considered and never changed. > Anyway, if thats the case your code above is more correct. The packet > should always match the xfrm policy where it was originally routed. > > > I see one dangerous place that must be checked: > > icmp_route_lookup. Before now __ip_route_output_key was > > called after xfrm_decode_session_reverse with 0 in > > flowi4_oif, i.e. no oif binding was used. But now when > > decode_session sets flowi4_oif we will restrict the route > > via this interface? > Thanks for the hint! Yes the current patch will force the ICMP packet > over the received interface. > > Will add "fl4_dec.flowi4_oif = 0;" in case the saddr is local, so the > behavior will be the same. fl4_dec.flowi4_oif will then be set in > _ip_route_output_key() again. Yes, I think this should work. I now see another unrelated problem with this code. _decode_session4 uses fl4->flowi4_tos = iph->tos; But we then feed fl4_dec to __ip_route_output_key without applying IPTOS_RT_MASK. __ip_route_output_key does not work with plain TOS from IP header. Currently, the situation around flowi4_tos is as follows: - flowi4_tos: - (IPTOS_RT_MASK | RTO_ONLINK) bits when provided to __ip_route_output_key - on return above value is preserved if route is in cache, it is not changed - only IPTOS_RT_MASK bits are preserved by ip_route_output_slow if the route is not cached because flowi4_tos holds the bits that are matched by ip rules tos XX As result, we can not rely on the RTO_ONLINK bit being valid on return. IPTOS_RT_MASK bits are preserved. - rt_key_tos: - (IPTOS_RT_MASK | RTO_ONLINK) bits - we have a bug here: ip_route_output_slow filters the provided flowi4_tos by removing RTO_ONLINK bit, then __mkroute_output tries to get original value with the RTO_ONLINK bit but ends up with the modified value. We need to provide orig_tos to __mkroute_output. In XFRM. Currently only xfrm_bundle_create calls xfrm_get_tos to get a routable TOS value and such value is provided to xfrm_dst_lookup. It means __xfrm4_dst_lookup gets such filtered TOS value, lets name it rtos (routing tos) and provides it to output routing correctly. xfrm4_fill_dst probably correctly copies flowi4_tos into rt_key_tos if fl is result of output route and not returned by decode_session. Of course, RTO_ONLINK can be lost. But may be problem can happen at least for icmp.c for the xfrm_decode_session_reverse -> xfrm_lookup code where a full IP TOS can be copied by xfrm4_fill_dst: rt->rt_key_tos = fl4->flowi4_tos; What significance has this assignment? Should we have valid RTO_ONLINK bit in flowi4_tos ? So, the fl4->flowi4_tos = iph->tos code does not look nice, I don't know if this field is used for some matching and we need to hold all IP TOS bits. If not, may be we do not need to play games and have to use this instead: fl4->flowi4_tos = iph->tos & IPTOS_RT_MASK; because flowi4_tos is a rtos, not an IP TOS field. Then we can reach xfrm_lookup or __ip_route_output_key safely after _decode_session4. Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index fb47c8f..1702ec0 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2744,6 +2744,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4) flp4->saddr = rth->rt_src; if (!flp4->daddr) flp4->daddr = rth->rt_dst; + if (!flp4->flowi4_iif) + flp4->flowi4_iif = net->loopback_dev->ifindex; + if (!flp4->flowi4_oif) + flp4->flowi4_oif = rth->rt_iif; return rth; } RT_CACHE_STAT_INC(out_hlist_search); diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index a0b4c5d..ad9c620 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -110,6 +110,8 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, static void _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) { + const struct net *net = dev_net(skb->dev); + const struct rtable *rt = skb_rtable(skb); const struct iphdr *iph = ip_hdr(skb); u8 *xprth = skb_network_header(skb) + iph->ihl * 4; struct flowi4 *fl4 = &fl->u.ip4; @@ -185,9 +187,22 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) } } fl4->flowi4_proto = iph->protocol; - fl4->daddr = reverse ? iph->saddr : iph->daddr; - fl4->saddr = reverse ? iph->daddr : iph->saddr; fl4->flowi4_tos = iph->tos; + + if (reverse) { + fl4->daddr = iph->saddr; + fl4->saddr = iph->daddr; + fl4->flowi4_oif = skb->skb_iif ?: net->loopback_dev->ifindex; + if (rt) + fl4->flowi4_iif = rt->rt_oif ?: rt->dst.dev->ifindex; + } + else { + fl4->daddr = iph->daddr; + fl4->saddr = iph->saddr; + fl4->flowi4_iif = skb->skb_iif ?: net->loopback_dev->ifindex; + if (rt) + fl4->flowi4_oif = rt->rt_oif ?: rt->dst.dev->ifindex; + } } static inline int xfrm4_garbage_collect(struct dst_ops *ops) diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c index b6c5731..fd520b5 100644 --- a/net/ipv6/fib6_rules.c +++ b/net/ipv6/fib6_rules.c @@ -108,8 +108,14 @@ again: discard_pkt: dst_hold(&rt->dst); out: - arg->result = rt; - return rt == NULL ? -EAGAIN : 0; + if ((arg->result = rt)) { + if (!flp6->flowi6_iif) + flp6->flowi6_iif = net->loopback_dev->ifindex; + if (!flp6->flowi6_oif) + flp6->flowi6_oif = rt->rt6i_dev->ifindex; + return 0; + } + return -EAGAIN; } diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 8ea65e0..7c0196f 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -121,6 +121,8 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, static inline void _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse) { + const struct net *net = dev_net(skb->dev); + const struct rt6_info *rt = (struct rt6_info*)skb_dst(skb); struct flowi6 *fl6 = &fl->u.ip6; int onlyproto = 0; u16 offset = skb_network_header_len(skb); @@ -132,8 +134,20 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse) memset(fl6, 0, sizeof(struct flowi6)); fl6->flowi6_mark = skb->mark; - fl6->daddr = reverse ? hdr->saddr : hdr->daddr; - fl6->saddr = reverse ? hdr->daddr : hdr->saddr; + if (reverse) { + fl6->daddr = hdr->saddr; + fl6->saddr = hdr->daddr; + fl6->flowi6_oif = skb->skb_iif ?: net->loopback_dev->ifindex; + if (rt) + fl6->flowi6_iif = rt->rt6i_dev->ifindex; + } + else { + fl6->daddr = hdr->daddr; + fl6->saddr = hdr->saddr; + fl6->flowi6_iif = skb->skb_iif ?: net->loopback_dev->ifindex; + if (rt) + fl6->flowi6_oif = rt->rt6i_dev->ifindex; + } while (nh + offset + 1 < skb->data || pskb_may_pull(skb, nh + offset + 1 - skb->data)) {
Outgoing packets have loopback interface as incoming interface. Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> --- net/ipv4/route.c | 4 ++++ net/ipv4/xfrm4_policy.c | 19 +++++++++++++++++-- net/ipv6/fib6_rules.c | 10 ++++++++-- net/ipv6/xfrm6_policy.c | 18 ++++++++++++++++-- 4 files changed, 45 insertions(+), 6 deletions(-)