diff mbox

[DANGER,8/7] : ipv4: Cache output routes in fib_info nexthops.

Message ID 20120712.104758.253504708560401123.davem@davemloft.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller July 12, 2012, 5:47 p.m. UTC
Signed-off-by: David S. Miller <davem@davemloft.net>
---

If you really feel like playing with fire, try this patch
on top of the routing cache removal patches.

It gets the output route lookup down to 888 cycles for me.

Something is flaky about it, when I ssh into my test system
for the first time after a boot there is a strange delay of
some sort.  It's as if the SYN-ACK is dropped on the way out
of the test machine, and my desktop has to retry the initial
SYN.  I plan to investigate this after some sleep.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Vijay Subramanian July 13, 2012, 12:52 a.m. UTC | #1
>
> Something is flaky about it, when I ssh into my test system
> for the first time after a boot there is a strange delay of
> some sort.  It's as if the SYN-ACK is dropped on the way out
> of the test machine, and my desktop has to retry the initial
> SYN.  I plan to investigate this after some sleep.


Dave,
I applied these patches and got the same symptoms. It takes a long
time for ssh to work right after boot but it starts working after
about a minute.
(I am sshing into the machine with the patches applied). My  knowledge
of routing code is rudimentary but I traced the following.

 I think this is because the SYN packets do not even reach the TCP
handler. It looks like ip_route_input_slow() sets the dst.input
function to ip_error().

The code path I saw was as follows:
ip_rcv_finish() eventually calls ip_route_input_slow() wherein
fib_lookup() initially returns a res.type of RTN_UNICAST (why not
RTN_LOCAL?).
However, the following code
 if (!IN_DEV_FORWARD(in_dev))
                goto no_route;

is executed and sets the res.type to RTN_UNREACHABLE.
After the jump to local_input, rth->dst.input is set first to
ip_local_deliver() but again to ip_error().
Due to this, the SYN packet does not even make it to ip_local_deliver
and so the TCP handler is never called.

I did not get a chance to see why it suddenly starts working. Hope
this helps. I will dig around more.

Thanks,
Vijay
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 13, 2012, 1:06 a.m. UTC | #2
From: Vijay Subramanian <subramanian.vijay@gmail.com>
Date: Thu, 12 Jul 2012 17:52:54 -0700

> ip_rcv_finish() eventually calls ip_route_input_slow() wherein
> fib_lookup() initially returns a res.type of RTN_UNICAST (why not
> RTN_LOCAL?).

Strange, since only this specific patch causes the problem there must
be some issue with the size of struct fib_info or something like that
triggering the problem.  Something might be corrupted in the nexthops.

I wonder if there is some uninitialized data somewhere, or something
silly like that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index e91fedd..d133110 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -45,6 +45,7 @@  struct fib_config {
  };
 
 struct fib_info;
+struct rtable;
 
 struct fib_nh {
 	struct net_device	*nh_dev;
@@ -63,6 +64,8 @@  struct fib_nh {
 	__be32			nh_gw;
 	__be32			nh_saddr;
 	int			nh_saddr_genid;
+
+	struct rtable		*rth;
 };
 
 /*
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index d71bfbd..d1240a0 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -148,6 +148,8 @@  static void free_fib_info_rcu(struct rcu_head *head)
 	change_nexthops(fi) {
 		if (nexthop_nh->nh_dev)
 			dev_put(nexthop_nh->nh_dev);
+		if (nexthop_nh->rth)
+			dst_release(&nexthop_nh->rth->dst);
 	} endfor_nexthops(fi);
 
 	release_net(fi->fib_net);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c4b2df6..53b006a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -983,6 +983,8 @@  static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *fl4,
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = FIB_RES_NH(*res).nh_tclassid;
 #endif
+		FIB_RES_NH(*res).rth = rt;
+		dst_clone(&rt->dst);
 	}
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
@@ -1468,6 +1470,13 @@  static struct rtable *__mkroute_output(const struct fib_result *res,
 			fi = NULL;
 	}
 
+	if (fi) {
+		rth = FIB_RES_NH(*res).rth;
+		if (rth) {
+			dst_use(&rth->dst, jiffies);
+			return rth;
+		}
+	}
 	rth = rt_dst_alloc(dev_out,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY),
 			   IN_DEV_CONF_GET(in_dev, NOXFRM));