diff mbox

[11/16] ipv4: Cache input routes in fib_info nexthops.

Message ID 20120720.142622.1447419081262029885.davem@davemloft.net
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller July 20, 2012, 9:26 p.m. UTC
Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions.  (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_fib.h     |    1 +
 net/ipv4/fib_semantics.c |    2 ++
 net/ipv4/route.c         |   55 ++++++++++++++++++++++++++++++++++++----------
 3 files changed, 46 insertions(+), 12 deletions(-)

Comments

Julian Anastasov July 23, 2012, 9:13 a.m. UTC | #1
Hello,

On Fri, 20 Jul 2012, David Miller wrote:

> 
> Caching input routes is slightly simpler than output routes, since we
> don't need to be concerned with nexthop exceptions.  (locally
> destined, and routed packets, never trigger PMTU events or redirects
> that will be processed by us).
> 
> However, we have to elide caching for the DIRECTSRC and non-zero itag
> cases.

	I see only one user for RTCF_DIRECTSRC:
icmp_address_reply, may be we can do some magic there and
avoid using this flag. By this way we can cache in
nh_rth_input not depending on it.

	The problem with rt_iif is worse. May be we
can cache only the first iif, other packets will see
different iif in nh_rth_input and will get non-cached
result. For boxes with 2 or more interfaces only one
can use the cache. One setup can have large traffic
from LAN, other can be server for remote clients.

	For forwarding such ambiguity should be lower
and also rt_iif is mostly used for local targets.

>  local_input:
> +	do_cache = false;
> +	if (res.fi) {
> +		if (!(flags & RTCF_DIRECTSRC) && !itag) {
> +			rth = FIB_RES_NH(res).nh_rth_input;

			rt_iif here should be same!!!

> +			if (rt_cache_valid(rth)) {
> +				dst_use(&rth->dst, jiffies);
> +				goto set_and_out;
> +			}
> +			do_cache = true;
> +		}
> +	}
> +
>  	rth = rt_dst_alloc(net->loopback_dev,
> -			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, false);
> +			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
>  	if (!rth)
>  		goto e_nobufs;
>  
> @@ -1622,6 +1651,9 @@ local_input:
>  		rth->dst.error= -err;
>  		rth->rt_flags 	&= ~RTCF_LOCAL;
>  	}
> +	if (do_cache)
> +		rt_cache_route(&FIB_RES_NH(res), rth);

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 23, 2012, 7:58 p.m. UTC | #2
From: Julian Anastasov <ja@ssi.bg>
Date: Mon, 23 Jul 2012 23:00:57 +0300 (EEST)

> 	May be we can replace rt_iif with skb->dev->ifindex ?
> If we have dst, there must be skb->dev ? Only for
> output routes we should check places that rely on
> inet_iif().

Right, just as I was drinking my first coffee after reading
your previous email I was thinking about this.

The only time skb->dev->ifindex can change from rt->rt_iif is when we
demux a tunnel, but at that point we would first reinject and do
another route lookup, at which point rt->rt_iif would match again.

And this is the intended semantic of this field anyways.

I'll look into what we can do here, and I'll also try to come up
with some ideas wrt. the DIRECTSRC issue as well.

Thanks Julian.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov July 23, 2012, 8 p.m. UTC | #3
Hello,

On Mon, 23 Jul 2012, Julian Anastasov wrote:

> 	The problem with rt_iif is worse. May be we
> can cache only the first iif, other packets will see
> different iif in nh_rth_input and will get non-cached
> result. For boxes with 2 or more interfaces only one
> can use the cache. One setup can have large traffic
> from LAN, other can be server for remote clients.

	May be we can replace rt_iif with skb->dev->ifindex ?
If we have dst, there must be skb->dev ? Only for
output routes we should check places that rely on
inet_iif().

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 23, 2012, 9:06 p.m. UTC | #4
From: David Miller <davem@davemloft.net>
Date: Mon, 23 Jul 2012 12:58:37 -0700 (PDT)

> The only time skb->dev->ifindex can change from rt->rt_iif is when we
> demux a tunnel, but at that point we would first reinject and do
> another route lookup, at which point rt->rt_iif would match again.
> 
> And this is the intended semantic of this field anyways.

Ok Julian, you probably already say the DIRECTSRC patches and upcoming
are two patches to take care of the rt->rt_iif thing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index fb62c59..e69c3a4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -82,6 +82,7 @@  struct fib_nh {
 	__be32			nh_saddr;
 	int			nh_saddr_genid;
 	struct rtable		*nh_rth_output;
+	struct rtable		*nh_rth_input;
 	struct fnhe_hash_bucket	*nh_exceptions;
 };
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 83d0f42..e55171f 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -173,6 +173,8 @@  static void free_fib_info_rcu(struct rcu_head *head)
 			free_nh_exceptions(nexthop_nh);
 		if (nexthop_nh->nh_rth_output)
 			dst_release(&nexthop_nh->nh_rth_output->dst);
+		if (nexthop_nh->nh_rth_input)
+			dst_release(&nexthop_nh->nh_rth_input->dst);
 	} endfor_nexthops(fi);
 
 	release_net(fi->fib_net);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8a02600..97cca8a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1231,6 +1231,9 @@  static void rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 {
 	struct rtable *orig, *prev, **p = &nh->nh_rth_output;
 
+	if (rt_is_input_route(rt))
+		p = &nh->nh_rth_input;
+
 	orig = *p;
 
 	prev = cmpxchg(p, orig, rt);
@@ -1241,6 +1244,11 @@  static void rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 	}
 }
 
+static bool rt_cache_valid(struct rtable *rt)
+{
+	return (rt && rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK);
+}
+
 static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			   const struct fib_result *res,
 			   struct fib_nh_exception *fnhe,
@@ -1257,8 +1265,7 @@  static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;
 #endif
-		if (!(rt->dst.flags & DST_HOST) &&
-		    rt_is_output_route(rt))
+		if (!(rt->dst.flags & DST_HOST))
 			rt_cache_route(nh, rt);
 	}
 
@@ -1384,11 +1391,11 @@  static int __mkroute_input(struct sk_buff *skb,
 			   __be32 daddr, __be32 saddr, u32 tos,
 			   struct rtable **result)
 {
-	struct fib_nh_exception *fnhe;
 	struct rtable *rth;
 	int err;
 	struct in_device *out_dev;
 	unsigned int flags = 0;
+	bool do_cache;
 	u32 itag;
 
 	/* get a working reference to the output device */
@@ -1431,13 +1438,21 @@  static int __mkroute_input(struct sk_buff *skb,
 		}
 	}
 
-	fnhe = NULL;
-	if (res->fi)
-		fnhe = find_exception(&FIB_RES_NH(*res), daddr);
+	do_cache = false;
+	if (res->fi) {
+		if (!(flags & RTCF_DIRECTSRC) && !itag) {
+			rth = FIB_RES_NH(*res).nh_rth_input;
+			if (rt_cache_valid(rth)) {
+				dst_use(&rth->dst, jiffies);
+				goto out;
+			}
+			do_cache = true;
+		}
+	}
 
 	rth = rt_dst_alloc(out_dev->dev,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY),
-			   IN_DEV_CONF_GET(out_dev, NOXFRM), false);
+			   IN_DEV_CONF_GET(out_dev, NOXFRM), do_cache);
 	if (!rth) {
 		err = -ENOBUFS;
 		goto cleanup;
@@ -1456,8 +1471,8 @@  static int __mkroute_input(struct sk_buff *skb,
 	rth->dst.input = ip_forward;
 	rth->dst.output = ip_output;
 
-	rt_set_nexthop(rth, daddr, res, fnhe, res->fi, res->type, itag);
-
+	rt_set_nexthop(rth, daddr, res, NULL, res->fi, res->type, itag);
+out:
 	*result = rth;
 	err = 0;
  cleanup:
@@ -1509,6 +1524,7 @@  static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	struct rtable	*rth;
 	int		err = -EINVAL;
 	struct net    *net = dev_net(dev);
+	bool do_cache;
 
 	/* IP on this device is disabled. */
 
@@ -1522,6 +1538,7 @@  static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
 		goto martian_source;
 
+	res.fi = NULL;
 	if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
 		goto brd_input;
 
@@ -1597,8 +1614,20 @@  brd_input:
 	RT_CACHE_STAT_INC(in_brd);
 
 local_input:
+	do_cache = false;
+	if (res.fi) {
+		if (!(flags & RTCF_DIRECTSRC) && !itag) {
+			rth = FIB_RES_NH(res).nh_rth_input;
+			if (rt_cache_valid(rth)) {
+				dst_use(&rth->dst, jiffies);
+				goto set_and_out;
+			}
+			do_cache = true;
+		}
+	}
+
 	rth = rt_dst_alloc(net->loopback_dev,
-			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, false);
+			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
 	if (!rth)
 		goto e_nobufs;
 
@@ -1622,6 +1651,9 @@  local_input:
 		rth->dst.error= -err;
 		rth->rt_flags 	&= ~RTCF_LOCAL;
 	}
+	if (do_cache)
+		rt_cache_route(&FIB_RES_NH(res), rth);
+set_and_out:
 	skb_dst_set(skb, &rth->dst);
 	err = 0;
 	goto out;
@@ -1756,8 +1788,7 @@  static struct rtable *__mkroute_output(const struct fib_result *res,
 		fnhe = find_exception(&FIB_RES_NH(*res), fl4->daddr);
 		if (!fnhe) {
 			rth = FIB_RES_NH(*res).nh_rth_output;
-			if (rth &&
-			    rth->dst.obsolete == DST_OBSOLETE_FORCE_CHK) {
+			if (rt_cache_valid(rth)) {
 				dst_use(&rth->dst, jiffies);
 				return rth;
 			}