diff mbox

[net] ipv6: do not overwrite inetpeer metrics prematurely

Message ID 20140310.010313.703650662340510015.davem@davemloft.net
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller March 10, 2014, 5:03 a.m. UTC
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 10 Mar 2014 01:52:20 +0100

> So we need to change __ip6_ins_rt to take fib6_config as the second
> argument instead, still have the info pointer in fc_nlinfo and only have
> to wrap the info pointer in ip6_ins_rt into another structure. fib6_add
> seems to be straightforward from there.

fib6_config is too large to place on the stack I think, I was more
thinking something along these lines:

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Michal Kubecek March 10, 2014, 8:15 a.m. UTC | #1
On Mon, Mar 10, 2014 at 01:03:13AM -0400, David Miller wrote:
> @@ -743,6 +743,30 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
>  		BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
>  	}
>  
> +	if (mx) {
> +		struct nlattr *nla;
> +		bool was_writable;
> +		int remaining;
> +		u32 *mp;
> +
> +		was_writable = !dst_metrics_read_only(&rt->dst);
> +		mp = dst_metrics_write_ptr(&rt->dst);
> +
> +		if (was_writable)
> +			memset(mp, 0, RTAX_MAX * sizeof(u32));
> +
> +		nla_for_each_attr(nla, mx, mx_len, remaining) {
> +			int type = nla_type(nla);
> +
> +			if (type) {
> +				if (type > RTAX_MAX)
> +					return -EINVAL;
> +
> +				mp[type - 1] = nla_get_u32(nla);
> +			}
> +		}
> +	}
> +
>  	/*
>  	 *	insert node
>  	 */

This is too early. After this point, the insertion can still fail if
(!found && !add) (i.e. when trying to modify a non-existent route with
"ip route change").

> @@ -1546,14 +1547,6 @@ int ip6_route_add(struct fib6_config *cfg)
>  	if (rt->rt6i_dst.plen == 128)
>  	       rt->dst.flags |= DST_HOST;
>  
> -	if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
> -		u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
> -		if (!metrics) {
> -			err = -ENOMEM;
> -			goto out;
> -		}
> -		dst_init_metrics(&rt->dst, metrics, 0);
> -	}
>  #ifdef CONFIG_IPV6_SUBTREES
>  	ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
>  	rt->rt6i_src.plen = cfg->fc_src_len;

I think this should stay. We still need to allocate a separate copy of
metrics in the non-DST_HOST case, otherwise we would reintroduce the 
issue fixed by commit 8e2ec6391 ("ipv6: don't use inetpeer to store
metrics for routes."). Or rather get a null pointer dereference in 
fib6_add_rt2node() as dst_metrics_write_ptr() would return NULL. An 
alternative solution (and perhaps a more suitable one) would be to 
allocate the block in ip6_cow_metrics() instead of returning NULL.

Other than that, I believe this should work. Actually, I was also 
considering this approach but I wasn't brave enough to propose passing 
those extra parameters all the way down to rt6_add_rt2node(). But if you
are OK with it, I agree that saving the extra kzalloc()/kfree() is worth
that bit of ugliness. We could also extract the metrics from the info 
parameter we already have but it would be inefficient to parse the whole
message again.

                                                        Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa March 10, 2014, noon UTC | #2
On Mon, Mar 10, 2014 at 09:15:33AM +0100, Michal Kubecek wrote:
> On Mon, Mar 10, 2014 at 01:03:13AM -0400, David Miller wrote:
> > @@ -743,6 +743,30 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> >  		BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
> >  	}
> >  
> > +	if (mx) {
> > +		struct nlattr *nla;
> > +		bool was_writable;
> > +		int remaining;
> > +		u32 *mp;
> > +
> > +		was_writable = !dst_metrics_read_only(&rt->dst);
> > +		mp = dst_metrics_write_ptr(&rt->dst);
> > +
> > +		if (was_writable)
> > +			memset(mp, 0, RTAX_MAX * sizeof(u32));
> > +
> > +		nla_for_each_attr(nla, mx, mx_len, remaining) {
> > +			int type = nla_type(nla);
> > +
> > +			if (type) {
> > +				if (type > RTAX_MAX)
> > +					return -EINVAL;
> > +
> > +				mp[type - 1] = nla_get_u32(nla);
> > +			}
> > +		}
> > +	}
> > +
> >  	/*
> >  	 *	insert node
> >  	 */
> 
> This is too early. After this point, the insertion can still fail if
> (!found && !add) (i.e. when trying to modify a non-existent route with
> "ip route change").

Yep, but we can make this a static function to ip6_fib.c and call it from the
code points you already proposed in your original patch?

> > @@ -1546,14 +1547,6 @@ int ip6_route_add(struct fib6_config *cfg)
> >  	if (rt->rt6i_dst.plen == 128)
> >  	       rt->dst.flags |= DST_HOST;
> >  
> > -	if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
> > -		u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
> > -		if (!metrics) {
> > -			err = -ENOMEM;
> > -			goto out;
> > -		}
> > -		dst_init_metrics(&rt->dst, metrics, 0);
> > -	}
> >  #ifdef CONFIG_IPV6_SUBTREES
> >  	ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
> >  	rt->rt6i_src.plen = cfg->fc_src_len;
> 
> I think this should stay. We still need to allocate a separate copy of
> metrics in the non-DST_HOST case, otherwise we would reintroduce the 
> issue fixed by commit 8e2ec6391 ("ipv6: don't use inetpeer to store
> metrics for routes.").

Full ACK, we must only intantiate inetpeers for DST_HOST dsts but it seems to
be possible at this point, too.

The reason why I liked David's proposal is that it removes the ambiguity
when looking up the inetpeer (we only get the inetpeer if we are sure
a new route is added). I don't mind the kzalloc that much. The only
overhead it produces would be if a user inserts a route for a /128 target
and specifies metrics.

Maybe we can remove this ambiguity by checking the already present
info pointer.

> Or rather get a null pointer dereference in 
> fib6_add_rt2node() as dst_metrics_write_ptr() would return NULL. An 
> alternative solution (and perhaps a more suitable one) would be to 
> allocate the block in ip6_cow_metrics() instead of returning NULL.

I am not sure of the advantage here. If this patch targets net I would
like to keep the changes a bit more straightforward. In case it gives
a nice advantage, maybe another patch for net-next? Haven't had a close look
on that yet.

> Other than that, I believe this should work. Actually, I was also 
> considering this approach but I wasn't brave enough to propose passing 
> those extra parameters all the way down to rt6_add_rt2node(). But if you
> are OK with it, I agree that saving the extra kzalloc()/kfree() is worth
> that bit of ugliness. We could also extract the metrics from the info 
> parameter we already have but it would be inefficient to parse the whole
> message again.

Great! :)

Thanks,

  Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek March 10, 2014, 1:15 p.m. UTC | #3
On Mon, Mar 10, 2014 at 01:00:16PM +0100, Hannes Frederic Sowa wrote:
> On Mon, Mar 10, 2014 at 09:15:33AM +0100, Michal Kubecek wrote:
> > On Mon, Mar 10, 2014 at 01:03:13AM -0400, David Miller wrote:
> > > @@ -743,6 +743,30 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> > >  		BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
> > >  	}
> > >  
> > > +	if (mx) {
> > > +		struct nlattr *nla;
> > > +		bool was_writable;
> > > +		int remaining;
> > > +		u32 *mp;
> > > +
> > > +		was_writable = !dst_metrics_read_only(&rt->dst);
> > > +		mp = dst_metrics_write_ptr(&rt->dst);
> > > +
> > > +		if (was_writable)
> > > +			memset(mp, 0, RTAX_MAX * sizeof(u32));
> > > +
> > > +		nla_for_each_attr(nla, mx, mx_len, remaining) {
> > > +			int type = nla_type(nla);
> > > +
> > > +			if (type) {
> > > +				if (type > RTAX_MAX)
> > > +					return -EINVAL;
> > > +
> > > +				mp[type - 1] = nla_get_u32(nla);
> > > +			}
> > > +		}
> > > +	}
> > > +
> > >  	/*
> > >  	 *	insert node
> > >  	 */
> > 
> > This is too early. After this point, the insertion can still fail if
> > (!found && !add) (i.e. when trying to modify a non-existent route with
> > "ip route change").
> 
> Yep, but we can make this a static function to ip6_fib.c and call it from the
> code points you already proposed in your original patch?

Yes, this sounds like the best option.

> The reason why I liked David's proposal is that it removes the ambiguity
> when looking up the inetpeer (we only get the inetpeer if we are sure
> a new route is added). I don't mind the kzalloc that much. The only
> overhead it produces would be if a user inserts a route for a /128 target
> and specifies metrics.

Right, I didn't realize that.

> > Or rather get a null pointer dereference in 
> > fib6_add_rt2node() as dst_metrics_write_ptr() would return NULL. An 
> > alternative solution (and perhaps a more suitable one) would be to 
> > allocate the block in ip6_cow_metrics() instead of returning NULL.
> 
> I am not sure of the advantage here. If this patch targets net I would
> like to keep the changes a bit more straightforward. In case it gives
> a nice advantage, maybe another patch for net-next? Haven't had a close look
> on that yet.

Agreed. It would only delay the allocation for non-DST_HOST routes to
the point we are sure we need it; on the other hand, it would require
reorganizing the ip6_cow_metrics() code so I guess it can wait for
net-next (if at all).

                                                       Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aca0c27..9bcb220 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -284,7 +284,8 @@  struct fib6_node *fib6_locate(struct fib6_node *root,
 void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
 		    void *arg);
 
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+	     struct nlattr *mx, int mx_len);
 
 int fib6_del(struct rt6_info *rt, struct nl_info *info);
 
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..941df4f 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -643,7 +643,7 @@  static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
  */
 
 static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
-			    struct nl_info *info)
+			    struct nl_info *info, struct nlattr *mx, int mx_len)
 {
 	struct rt6_info *iter = NULL;
 	struct rt6_info **ins;
@@ -743,6 +743,30 @@  static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
 	}
 
+	if (mx) {
+		struct nlattr *nla;
+		bool was_writable;
+		int remaining;
+		u32 *mp;
+
+		was_writable = !dst_metrics_read_only(&rt->dst);
+		mp = dst_metrics_write_ptr(&rt->dst);
+
+		if (was_writable)
+			memset(mp, 0, RTAX_MAX * sizeof(u32));
+
+		nla_for_each_attr(nla, mx, mx_len, remaining) {
+			int type = nla_type(nla);
+
+			if (type) {
+				if (type > RTAX_MAX)
+					return -EINVAL;
+
+				mp[type - 1] = nla_get_u32(nla);
+			}
+		}
+	}
+
 	/*
 	 *	insert node
 	 */
@@ -806,7 +830,8 @@  void fib6_force_start_gc(struct net *net)
  *	with source addr info in sub-trees
  */
 
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+	     struct nlattr *mx, int mx_len)
 {
 	struct fib6_node *fn, *pn = NULL;
 	int err = -ENOMEM;
@@ -900,7 +925,7 @@  int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
 	}
 #endif
 
-	err = fib6_add_rt2node(fn, rt, info);
+	err = fib6_add_rt2node(fn, rt, info, mx, mx_len);
 	if (!err) {
 		fib6_start_gc(info->nl_net, rt);
 		if (!(rt->rt6i_flags & RTF_CACHE))
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..a39d3da 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -857,14 +857,15 @@  EXPORT_SYMBOL(rt6_lookup);
    be destroyed.
  */
 
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
+			struct nlattr *mx, int mx_len)
 {
 	int err;
 	struct fib6_table *table;
 
 	table = rt->rt6i_table;
 	write_lock_bh(&table->tb6_lock);
-	err = fib6_add(&table->tb6_root, rt, info);
+	err = fib6_add(&table->tb6_root, rt, info, mx, mx_len);
 	write_unlock_bh(&table->tb6_lock);
 
 	return err;
@@ -875,7 +876,7 @@  int ip6_ins_rt(struct rt6_info *rt)
 	struct nl_info info = {
 		.nl_net = dev_net(rt->dst.dev),
 	};
-	return __ip6_ins_rt(rt, &info);
+	return __ip6_ins_rt(rt, &info, NULL, 0);
 }
 
 static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort,
@@ -1546,14 +1547,6 @@  int ip6_route_add(struct fib6_config *cfg)
 	if (rt->rt6i_dst.plen == 128)
 	       rt->dst.flags |= DST_HOST;
 
-	if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
-		u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
-		if (!metrics) {
-			err = -ENOMEM;
-			goto out;
-		}
-		dst_init_metrics(&rt->dst, metrics, 0);
-	}
 #ifdef CONFIG_IPV6_SUBTREES
 	ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
 	rt->rt6i_src.plen = cfg->fc_src_len;
@@ -1672,31 +1665,13 @@  int ip6_route_add(struct fib6_config *cfg)
 	rt->rt6i_flags = cfg->fc_flags;
 
 install_route:
-	if (cfg->fc_mx) {
-		struct nlattr *nla;
-		int remaining;
-
-		nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
-			int type = nla_type(nla);
-
-			if (type) {
-				if (type > RTAX_MAX) {
-					err = -EINVAL;
-					goto out;
-				}
-
-				dst_metric_set(&rt->dst, type, nla_get_u32(nla));
-			}
-		}
-	}
-
 	rt->dst.dev = dev;
 	rt->rt6i_idev = idev;
 	rt->rt6i_table = table;
 
 	cfg->fc_nlinfo.nl_net = dev_net(dev);
 
-	return __ip6_ins_rt(rt, &cfg->fc_nlinfo);
+	return __ip6_ins_rt(rt, &cfg->fc_nlinfo, cfg->fc_mx, cfg->fc_mx_len);
 
 out:
 	if (dev)