diff mbox

[RFC] ipv6: Split from and expires field in dst_entry out of union [net-next]

Message ID 1361305694-8303-1-git-send-email-nhorman@tuxdriver.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman Feb. 19, 2013, 8:28 p.m. UTC
I've been looking into some random ipv6 crashes lately that occur around
ip6_link_failure.  This and simmilar partial stack traces:
ip6_link_failure+0xbe/0xd0
ipip6_tunnel_xmit+0x7e7/0x860 [sit]
dev_hard_start_xmit+0x3e3/0x690
dev_queue_xmit+0x38f/0x610
neigh_direct_output+0x11/0x20
ip6_finish_output2+0x90/0x340
? ac6_proc_exit+0x20/0x20
ip6_finish_output+0x98/0xc0
ip6_output
? __ip6_local_out
ip6_local_out
ip6_push_pending
? ip6_append_data
udp_v6_push_pending_frames
udpv6_sendmsg

were all I had.  Eric D. Made this click with me this morning however, noting a
possible/likely race condition in the ipv6 code.  It appears that, if a dst
entry is accessed by multiple cpus (and it appears that certainly can happen, as
routes created via ip6_rt_copy are hashed back into the fib, for future
lookups), then the use of the rt6i_flags field can race, leading to multiple
conflicting uses of the aforementioned union (jiffies being interpreted as the
from pointer and vice versa).

Eric suggested that this be fixed in rt6_update_expires, but looking at the
other uses of this union I don't think thats a complete fix.  All the accessors
for the expired|from union in the dst entry seem to rely on the RTF_EXPIRES flag
being accessed and updated atomically, and thats just not the case.

I think the only fix here, that doesn't involve additional locking, is to
separate the expires and from fields into their own storage, so as not to
trample one another.

This is currently untested, but I've given it to several people who can
reproduce this problem and are testing it now.  I'll post again when I have
results from them.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: eric.dumazet@gmail.com
CC: David Miller <davem@davemloft.net>
CC: Gao feng <gaofeng@cn.fujitsu.com>
CC: Jiri Bohac <jbohac@suse.cz>
---
 include/net/dst.h     |  9 +++------
 include/net/ip6_fib.h | 13 ++++++++-----
 2 files changed, 11 insertions(+), 11 deletions(-)

Comments

Eric Dumazet Feb. 19, 2013, 9:17 p.m. UTC | #1
On Tue, 2013-02-19 at 15:28 -0500, Neil Horman wrote:

>  static inline void rt6_update_expires(struct rt6_info *rt, int timeout)
>  {
>  	if (!(rt->rt6i_flags & RTF_EXPIRES)) {
> -		if (rt->dst.from)
> -			dst_release(rt->dst.from);
> +		dst_release(rt->dst.from);
>  		/* dst_set_expires relies on expires == 0 
>  		 * if it has not been set previously.
>  		 */
>  		rt->dst.expires = 0;
> +		rt6->dst.from = NULL;
>  	}
>  

Sorry you didnt really address the problem, only reduce the race window.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Feb. 19, 2013, 9:49 p.m. UTC | #2
On Tue, Feb 19, 2013 at 01:17:45PM -0800, Eric Dumazet wrote:
> On Tue, 2013-02-19 at 15:28 -0500, Neil Horman wrote:
> 
> >  static inline void rt6_update_expires(struct rt6_info *rt, int timeout)
> >  {
> >  	if (!(rt->rt6i_flags & RTF_EXPIRES)) {
> > -		if (rt->dst.from)
> > -			dst_release(rt->dst.from);
> > +		dst_release(rt->dst.from);
> >  		/* dst_set_expires relies on expires == 0 
> >  		 * if it has not been set previously.
> >  		 */
> >  		rt->dst.expires = 0;
> > +		rt6->dst.from = NULL;
> >  	}
> >  
> 
> Sorry you didnt really address the problem, only reduce the race window.
> 
I kinda had a feeling you would say that, but the only other solution I see here
is to either introduce some locking to protect the from pointer, or two revert
the patch that introduced the from pointer alltogether, neither of which sounds
appealing to me.  I suppose we could use an xchng to atomically update the from
pointer, so there was only ever one context that was able to free it in
rt6_update_path.  Does that seem reasonable to you?

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 19, 2013, 9:55 p.m. UTC | #3
On Tue, 2013-02-19 at 16:49 -0500, Neil Horman wrote:
> On Tue, Feb 19, 2013 at 01:17:45PM -0800, Eric Dumazet wrote:
> > On Tue, 2013-02-19 at 15:28 -0500, Neil Horman wrote:
> > 
> > >  static inline void rt6_update_expires(struct rt6_info *rt, int timeout)
> > >  {
> > >  	if (!(rt->rt6i_flags & RTF_EXPIRES)) {
> > > -		if (rt->dst.from)
> > > -			dst_release(rt->dst.from);
> > > +		dst_release(rt->dst.from);
> > >  		/* dst_set_expires relies on expires == 0 
> > >  		 * if it has not been set previously.
> > >  		 */
> > >  		rt->dst.expires = 0;
> > > +		rt6->dst.from = NULL;
> > >  	}
> > >  
> > 
> > Sorry you didnt really address the problem, only reduce the race window.
> > 
> I kinda had a feeling you would say that, but the only other solution I see here
> is to either introduce some locking to protect the from pointer, or two revert
> the patch that introduced the from pointer alltogether, neither of which sounds
> appealing to me.  I suppose we could use an xchng to atomically update the from
> pointer, so there was only ever one context that was able to free it in
> rt6_update_path.  Does that seem reasonable to you?

I believe the setting of rt6->dst.from is safe :
It should be done at :
- dst creation time, when we are the only user.
- dst destry time, when we are the only user.

We only have to do the dst_release() at the right place, when we are the
last user of the dst.

So rt6_update_expires() should not mess with rt6->dst.from at all.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gao feng Feb. 20, 2013, 6:33 a.m. UTC | #4
Hi Eric & Neil,
On 2013/02/20 05:55, Eric Dumazet wrote:
> On Tue, 2013-02-19 at 16:49 -0500, Neil Horman wrote:
>> On Tue, Feb 19, 2013 at 01:17:45PM -0800, Eric Dumazet wrote:
>>> On Tue, 2013-02-19 at 15:28 -0500, Neil Horman wrote:
>>>
>>>>  static inline void rt6_update_expires(struct rt6_info *rt, int timeout)
>>>>  {
>>>>  	if (!(rt->rt6i_flags & RTF_EXPIRES)) {
>>>> -		if (rt->dst.from)
>>>> -			dst_release(rt->dst.from);
>>>> +		dst_release(rt->dst.from);
>>>>  		/* dst_set_expires relies on expires == 0 
>>>>  		 * if it has not been set previously.
>>>>  		 */
>>>>  		rt->dst.expires = 0;
>>>> +		rt6->dst.from = NULL;
>>>>  	}
>>>>  
>>>
>>> Sorry you didnt really address the problem, only reduce the race window.
>>>
>> I kinda had a feeling you would say that, but the only other solution I see here
>> is to either introduce some locking to protect the from pointer, or two revert
>> the patch that introduced the from pointer alltogether, neither of which sounds
>> appealing to me.  I suppose we could use an xchng to atomically update the from
>> pointer, so there was only ever one context that was able to free it in
>> rt6_update_path.  Does that seem reasonable to you?
> 

Thanks for your research on this problem.

> I believe the setting of rt6->dst.from is safe :
> It should be done at :
> - dst creation time, when we are the only user.
> - dst destry time, when we are the only user.
> 
> We only have to do the dst_release() at the right place, when we are the
> last user of the dst.
> 
> So rt6_update_expires() should not mess with rt6->dst.from at all.
> 

How can we?
one usage of rt6_update_expires and rt6_set_expires
is changing rt6->dst.from to rt6->dst.expires, we should release the
already holded reference of rt6->dst.from.

I think maybe we should rework the commit 1716a96101c49186b
(ipv6: fix problem with expired dst cache). just force setting flag
RTF_EXPIRES and expires for the rt6 which copied from the rt6(which
is generated from RA). The only problem of this situation is this
copied rt6's expire time may be imprecise, we may receive a new
RA with a new expire time.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Feb. 20, 2013, 10:55 a.m. UTC | #5
>  static inline void rt6_clean_expires(struct rt6_info *rt)
>  {
> -	if (!(rt->rt6i_flags & RTF_EXPIRES) && rt->dst.from)
> +	if (!rt->rt6i_flags & RTF_EXPIRES)
>  		dst_release(rt->dst.from);
> 
>  	rt->rt6i_flags &= ~RTF_EXPIRES;
>  	rt->dst.from = NULL;
> +	rt->dst.expires = 0;
>  }
> 
>  static inline void rt6_set_expires(struct rt6_info *rt, unsigned long expires)
>  {
> -	if (!(rt->rt6i_flags & RTF_EXPIRES) && rt->dst.from)
> +	if (!rt->rt6i_flags & RTF_EXPIRES)
>  		dst_release(rt->dst.from);
> 
>  	rt->rt6i_flags |= RTF_EXPIRES;
> +	rt->dst.from = NULL;
>  	rt->dst.expires = expires;
>  }
> 
>  static inline void rt6_update_expires(struct rt6_info *rt, int timeout)
>  {
>  	if (!(rt->rt6i_flags & RTF_EXPIRES)) {
> -		if (rt->dst.from)
> -			dst_release(rt->dst.from);
> +		dst_release(rt->dst.from);
>  		/* dst_set_expires relies on expires == 0
>  		 * if it has not been set previously.
>  		 */
>  		rt->dst.expires = 0;
> +		rt6->dst.from = NULL;
>  	}

Aren't there also problems with setting and clearing RTF_EXPIRES?
Since that flag looks as though it was the descriminant for the union
it probably isn't needed - provided dst.expires is never 0 when valid.

	David




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Feb. 20, 2013, 12:02 p.m. UTC | #6
On Wed, Feb 20, 2013 at 10:55:35AM -0000, David Laight wrote:
> >  static inline void rt6_clean_expires(struct rt6_info *rt)
> >  {
> > -	if (!(rt->rt6i_flags & RTF_EXPIRES) && rt->dst.from)
> > +	if (!rt->rt6i_flags & RTF_EXPIRES)
> >  		dst_release(rt->dst.from);
> > 
> >  	rt->rt6i_flags &= ~RTF_EXPIRES;
> >  	rt->dst.from = NULL;
> > +	rt->dst.expires = 0;
> >  }
> > 
> >  static inline void rt6_set_expires(struct rt6_info *rt, unsigned long expires)
> >  {
> > -	if (!(rt->rt6i_flags & RTF_EXPIRES) && rt->dst.from)
> > +	if (!rt->rt6i_flags & RTF_EXPIRES)
> >  		dst_release(rt->dst.from);
> > 
> >  	rt->rt6i_flags |= RTF_EXPIRES;
> > +	rt->dst.from = NULL;
> >  	rt->dst.expires = expires;
> >  }
> > 
> >  static inline void rt6_update_expires(struct rt6_info *rt, int timeout)
> >  {
> >  	if (!(rt->rt6i_flags & RTF_EXPIRES)) {
> > -		if (rt->dst.from)
> > -			dst_release(rt->dst.from);
> > +		dst_release(rt->dst.from);
> >  		/* dst_set_expires relies on expires == 0
> >  		 * if it has not been set previously.
> >  		 */
> >  		rt->dst.expires = 0;
> > +		rt6->dst.from = NULL;
> >  	}
> 
> Aren't there also problems with setting and clearing RTF_EXPIRES?
> Since that flag looks as though it was the descriminant for the union
> it probably isn't needed - provided dst.expires is never 0 when valid.
> 
> 	David
The use of RTF_EXPIRES is weak at this point, if there are multiple accessors,
but I think the point is moot, in that the only thing we ever do when we change
the flag is release dst.from, which is safe.
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Feb. 20, 2013, 12:08 p.m. UTC | #7
> > Aren't there also problems with setting and clearing RTF_EXPIRES?
> > Since that flag looks as though it was the descriminant for the union
> > it probably isn't needed - provided dst.expires is never 0 when valid.
> >
> > 	David
>
> The use of RTF_EXPIRES is weak at this point, if there are multiple accessors,
> but I think the point is moot, in that the only thing we ever do when we change
> the flag is release dst.from, which is safe.
>
> Neil

I was also worried about the RMW cycles of rt6i_flags itself.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/dst.h b/include/net/dst.h
index 3da47e0..6b7ebcf 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -36,13 +36,10 @@  struct dst_entry {
 	struct net_device       *dev;
 	struct  dst_ops	        *ops;
 	unsigned long		_metrics;
-	union {
-		unsigned long           expires;
-		/* point to where the dst_entry copied from */
-		struct dst_entry        *from;
-	};
+	unsigned long           expires;
+	/* point to where the dst_entry copied from */
+	struct dst_entry        *from;
 	struct dst_entry	*path;
-	void			*__pad0;
 #ifdef CONFIG_XFRM
 	struct xfrm_state	*xfrm;
 #else
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 6919a50..a285e37 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -164,31 +164,33 @@  static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
 
 static inline void rt6_clean_expires(struct rt6_info *rt)
 {
-	if (!(rt->rt6i_flags & RTF_EXPIRES) && rt->dst.from)
+	if (!rt->rt6i_flags & RTF_EXPIRES)
 		dst_release(rt->dst.from);
 
 	rt->rt6i_flags &= ~RTF_EXPIRES;
 	rt->dst.from = NULL;
+	rt->dst.expires = 0;
 }
 
 static inline void rt6_set_expires(struct rt6_info *rt, unsigned long expires)
 {
-	if (!(rt->rt6i_flags & RTF_EXPIRES) && rt->dst.from)
+	if (!rt->rt6i_flags & RTF_EXPIRES)
 		dst_release(rt->dst.from);
 
 	rt->rt6i_flags |= RTF_EXPIRES;
+	rt->dst.from = NULL;
 	rt->dst.expires = expires;
 }
 
 static inline void rt6_update_expires(struct rt6_info *rt, int timeout)
 {
 	if (!(rt->rt6i_flags & RTF_EXPIRES)) {
-		if (rt->dst.from)
-			dst_release(rt->dst.from);
+		dst_release(rt->dst.from);
 		/* dst_set_expires relies on expires == 0 
 		 * if it has not been set previously.
 		 */
 		rt->dst.expires = 0;
+		rt6->dst.from = NULL;
 	}
 
 	dst_set_expires(&rt->dst, timeout);
@@ -199,7 +201,7 @@  static inline void rt6_set_from(struct rt6_info *rt, struct rt6_info *from)
 {
 	struct dst_entry *new = (struct dst_entry *) from;
 
-	if (!(rt->rt6i_flags & RTF_EXPIRES) && rt->dst.from) {
+	if (!rt->rt6i_flags & RTF_EXPIRES) {
 		if (new == rt->dst.from)
 			return;
 		dst_release(rt->dst.from);
@@ -207,6 +209,7 @@  static inline void rt6_set_from(struct rt6_info *rt, struct rt6_info *from)
 
 	rt->rt6i_flags &= ~RTF_EXPIRES;
 	rt->dst.from = new;
+	rt->dst.expires = 0;
 	dst_hold(new);
 }