diff mbox

return of ip_rt_bug()

Message ID alpine.LFD.2.00.1108070104440.1413@ja.ssi.bg
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Julian Anastasov Aug. 6, 2011, 10:14 p.m. UTC
Hello,

	OK, after a bit of digging here is the problem.
It is evident that ip_rt_bug reports skb->dev = NULL which
is impossible to pass ip_route_input. It means, we got this
input route no matter our skb->dev = NULL. Here is how
that happened.

	For the routing cache compare_keys matches
rt_key_dst, rt_key_src, rt_mark, rt_key_tos, rt_oif, rt_iif

	Consider the following two examples:

1. Received traffic from 0.0.0.0 to 255.255.255.255, one example is DHCP

	ip_route_input_slow caches the things as follows:

	rt_key_dst = 255.255.255.255 (iph->daddr)
	rt_key_src = 0.0.0.0 (iph->saddr)
	rt_mark = 0
	rt_key_tos = 0 (RT TOS from iph->tos)
	rt_oif = 0 (always for input route)
	rt_iif = eth0 (input device)

	not compared by compare_keys:
	rt_route_iif = eth0 (input device)

	use hash chain based on some keys and iif

2. Local traffic from ANY LOCAL IP to 255.255.255.255, our example
	is broadcast for EPSON printer where the socket is not
	bound to source address

	__mkroute_output caches the things as follows:

	rt_key_dst = 255.255.255.255 (orig_daddr)
	rt_key_src = 0.0.0.0 (orig_saddr), because not bound
	rt_mark = 0
	rt_key_tos = 0 (RT TOS from iph->tos)
	rt_oif = 0 (orig_oif), because not bound to output device
	rt_iif = eth0 (orig_oif or dev_out->ifindex), dev_out in our case

	not compared by compare_keys:
	rt_route_iif = 0 (always for output route)

	use hash chain based on some keys and orig_oif

	Now when we put rt_intern_hash in the game, it tries to
reuse existing entries in the cache by using compare_keys.
It is hard to hit the problem because input and output
routes use different hashing based on iif/orig_oif.

	The problem: if we have input route in the cache
it can be returned to callers that request output route.
That is why dst_output points to ip_rt_bug.

	As noted above, compare_keys must consider rt_route_iif.
It must be also considered by ip_route_input_common.

	The appended patch fixes the problem for me. I was
able to reproduce ip_rt_bug by using rhash_entries=1 (resulting
in rt_hash_mask=1) and increasing gc_thresh to 8, so that
I can send these 2 packets with custom programs and the
cache entries to live longer in cache.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Aug. 8, 2011, 5:20 a.m. UTC | #1
From: Julian Anastasov <ja@ssi.bg>
Date: Sun, 7 Aug 2011 01:14:22 +0300 (EEST)

> 	The problem: if we have input route in the cache
> it can be returned to callers that request output route.
> That is why dst_output points to ip_rt_bug.

Good spotting Julian.

This is my fault entirely.  First I removed the thing we now call
->rt_route_iif which led to bug fix:

commit 1b86a58f9d7ce4fe2377687f378fbfb53bdc9b6c
Author: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Date:   Thu Apr 7 14:04:08 2011 -0700

    ipv4: Fix "Set rt->rt_iif more sanely on output routes."

but I forgot to make sure we also added back the key comparison
on lookups as well :-/

Applied and queued up for -stable, thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Aug. 9, 2011, 1:51 p.m. UTC | #2
Hello,

On Sun, 7 Aug 2011, David Miller wrote:

> commit 1b86a58f9d7ce4fe2377687f378fbfb53bdc9b6c
> Author: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> Date:   Thu Apr 7 14:04:08 2011 -0700
> 
>     ipv4: Fix "Set rt->rt_iif more sanely on output routes."

	I now checked these changes back to 2.6.38.
As rt_iif is used to provide input device even for loopback packets
that come with output route, may be we can optimize further
the code to save some CPU cycles. In fact, it restores
some route.c functions to 2.6.38 semantics. The conversion was:

fl.iif -> rt_route_iif
rt_iif -> preserved

	There are other places that used fl.iif (0 for output
routes) but are now using rt_iif instead of rt_route_iif,
not sure if this change is fatal for them because:

- net/sched/cls_route.c, route4_classify() gets optional
iif, so it can be 0, may be to match output route? And
later route4_classify does exact match for rt_iif. Does
it mean that now we can not match output packets without
providing "fromif OUTDEV" ?

- net/sched/em_meta.c: now int_rtiif (being rt_iif) is
always != 0, may be not good to match output routes?

	In short, the fl.iif -> rt_iif conversion is risky
at some places.

	For now posting patch for route.c in another thread...

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 11, 2011, 1 p.m. UTC | #3
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 9 Aug 2011 16:51:26 +0300 (EEST)

> 	There are other places that used fl.iif (0 for output
> routes) but are now using rt_iif instead of rt_route_iif,
> not sure if this change is fatal for them because:
> 
> - net/sched/cls_route.c, route4_classify() gets optional
> iif, so it can be 0, may be to match output route? And
> later route4_classify does exact match for rt_iif. Does
> it mean that now we can not match output packets without
> providing "fromif OUTDEV" ?
> 
> - net/sched/em_meta.c: now int_rtiif (being rt_iif) is
> always != 0, may be not good to match output routes?
> 
> 	In short, the fl.iif -> rt_iif conversion is risky
> at some places.

If we convert em_meta.c and cls_route.c to use rt_route_iif
we should be OK right?  Please patches to do this if so.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Aug. 11, 2011, 4:36 p.m. UTC | #4
Hello,

On Thu, 11 Aug 2011, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> Date: Tue, 9 Aug 2011 16:51:26 +0300 (EEST)
> 
> > 	There are other places that used fl.iif (0 for output
> > routes) but are now using rt_iif instead of rt_route_iif,
> > not sure if this change is fatal for them because:
> > 
> > - net/sched/cls_route.c, route4_classify() gets optional
> > iif, so it can be 0, may be to match output route? And
> > later route4_classify does exact match for rt_iif. Does
> > it mean that now we can not match output packets without
> > providing "fromif OUTDEV" ?

	It seems the user space for route filter treats
0 as error, so "fromif if0" was never supported. So, using
rt_iif is a better choice here.

> > - net/sched/em_meta.c: now int_rtiif (being rt_iif) is
> > always != 0, may be not good to match output routes?

	May be using 'rt_iif eq 0' is silly for the meta match.
It is preferred to use rt_iif instead of rt_route_iif so that
one can match even packets from loopback.

> > 	In short, the fl.iif -> rt_iif conversion is risky
> > at some places.
> 
> If we convert em_meta.c and cls_route.c to use rt_route_iif
> we should be OK right?  Please patches to do this if so.

	It seems no patches are needed. Sorry for the confusion.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 12, 2011, 1:01 a.m. UTC | #5
From: Julian Anastasov <ja@ssi.bg>
Date: Thu, 11 Aug 2011 19:36:37 +0300 (EEST)

> 	It seems no patches are needed. Sorry for the confusion.

Ok, thanks for the clarification.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

===============================================================

[PATCH] ipv4: fix the reusing of routing cache entries

	compare_keys and ip_route_input_common rely on
rt_oif for distinguishing of input and output routes
with same keys values. But sometimes the input route has
also same hash chain (keyed by iif != 0) with the output
routes (keyed by orig_oif=0). Problem visible if running
with small number of rhash_entries.

	Fix them to use rt_route_iif instead. By this way
input route can not be returned to users that request
output route.

	The patch fixes the ip_rt_bug errors that were
reported in ip_local_out context, mostly for 255.255.255.255
destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---

	This is for 3.0, didn't checked net-next yet.

diff -urp v3.0/linux/net/ipv4/route.c linux/net/ipv4/route.c
--- v3.0/linux/net/ipv4/route.c	2011-07-22 09:43:33.000000000 +0300
+++ linux/net/ipv4/route.c	2011-08-06 18:15:17.841066642 +0300
@@ -725,6 +725,7 @@  static inline int compare_keys(struct rt
 		((__force u32)rt1->rt_key_src ^ (__force u32)rt2->rt_key_src) |
 		(rt1->rt_mark ^ rt2->rt_mark) |
 		(rt1->rt_key_tos ^ rt2->rt_key_tos) |
+		(rt1->rt_route_iif ^ rt2->rt_route_iif) |
 		(rt1->rt_oif ^ rt2->rt_oif) |
 		(rt1->rt_iif ^ rt2->rt_iif)) == 0;
 }
@@ -2281,8 +2282,8 @@  int ip_route_input_common(struct sk_buff
 		if ((((__force u32)rth->rt_key_dst ^ (__force u32)daddr) |
 		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
 		     (rth->rt_iif ^ iif) |
-		     rth->rt_oif |
 		     (rth->rt_key_tos ^ tos)) == 0 &&
+		    rt_is_input_route(rth) &&
 		    rth->rt_mark == skb->mark &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {