diff mbox

[RFC] udp: don't rereference dst_entry dev pointer on rcv

Message ID 1363223884.29475.0.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet March 14, 2013, 1:18 a.m. UTC
On Wed, 2013-03-13 at 23:27 +0000, Tom Parkin wrote:

> I've been working to this end, and while I don't have a root cause as
> yet, I do have some more information.
> 
> I think what's happening is that the dst_entry refcounting is getting
> screwed up either by the ip defrag code, or by something before that
> in the rcv path.  What I see from ftrace debugging is that an skb
> fragment ends up queued on the reassembly queue while pointing to a
> dst_entry with a refcount of 0.  If the dst_entry should be deleted before
> the final fragment in the frame arrives, then we end up accessing
> free'd memory.
> 
> So far as I can make out, the l2tp code isn't doing anything untoward
> which is causing this bug.  My stress test simply makes it easier to
> reproduce because I'm setting up and tearing down routes and devices
> a lot while passing data.  I'm lucky in that my dev branch seems to
> reproduce this more easily than net-next master does, although the
> same oops occurs on master if you're prepared to wait around for long
> enough.
> 
> This ftrace debug log snippet shows the sort of behaviour I'm seeing.
> The numbers in brackets after some dst pointer values represent the
> refcount for that dst:
> 
> # The dst_entry is created with a refcount of 1
>   <idle>-0     [000] ..s2   112.770192: dst_alloc: dst ffff880012bbb0c0, refcnt 1
> # First fragment is queued
>   <idle>-0     [000] ..s2   112.770193: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0(1) : is fragment
>   <idle>-0     [000] ..s2   112.770206: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0 : fragment queued
> # Second and final fragment arrives, reassemble
>       ip-10970 [000] ..s1   112.770678: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : is fragment
> # skb_morph bumps refcount to 2, skb_consume drops it back down to 1
>       ip-10970 [000] ..s2   112.770682: ip_defrag: >>> clone skb ffff880010937e00 with dst ffff880012bbb0c0
>       ip-10970 [000] ..s2   112.770691: __copy_skb_header: don't dst_clone ffff880012bbb0c0
>       ip-10970 [000] ..s2   112.770691: ip_defrag: >>> morph skb ffff880010937e00 from ffff880012864600
>       ip-10970 [000] ..s2   112.770692: skb_release_head_state: drop skb ffff880010937e00 dst ref
>       ip-10970 [000] ..s2   112.770692: __copy_skb_header: cloning dst ffff880012bbb0c0 (skb ffff880012864600 -> skb ffff880010937e00)
>       ip-10970 [000] ..s2   112.770692: ip_defrag: >>> consume skb ffff880012864600
>       ip-10970 [000] ..s2   112.770693: skb_release_head_state: drop skb ffff880012864600 dst ref
>       ip-10970 [000] ..s2   112.770693: dst_release: dst ffff880012bbb0c0 newrefcnt 1
>       ip-10970 [000] ..s2   112.770698: ip_defrag: >>> coalesce loop
>       ip-10970 [000] ..s2   112.770698: ip_defrag: kfree_skb_partial(ffff880010937500, false)
>       ip-10970 [000] ..s2   112.770699: skb_release_head_state: drop skb ffff880010937500 dst ref
> # skb is reassembled and delivered, dst has refcount of 1 now
>       ip-10970 [000] ..s1   112.770705: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : queue defragmented
> # l2tp_eth uses dev_forward_skb, which calls skb_dst_drop
>       ip-10970 [000] ..s1   112.770707: skb_release_head_state: drop skb ffff880010937e00 dst ref
>       ip-10970 [000] ..s1   112.770708: dst_release: dst ffff880012bbb0c0 newrefcnt 0
> # Another skb arrives; dst refcount remains at 0
>   <idle>-0     [000] ..s2   112.771481: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0(0) : is fragment
>   <idle>-0     [000] ..s2   112.771494: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0 : fragment queued
> 
> The strange thing is that once the dst refcount reaches zero, another
> skb hitting ip_input doesn't bump the refcount back up.  This is
> partially why I'm not sure whether the error is caused by the defrag
> code, or by something prior to that in the rcv path.

Ah thanks for this, as this definitely makes more sense ;)

Could you try the following fix ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tom Parkin March 14, 2013, 2:45 p.m. UTC | #1
On Thu, Mar 14, 2013 at 02:18:04AM +0100, Eric Dumazet wrote:
> Ah thanks for this, as this definitely makes more sense ;)
> 
> Could you try the following fix ?
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index b6d30ac..87f4ecb 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -529,6 +529,7 @@ found:
>  	    qp->q.meat == qp->q.len)
>  		return ip_frag_reasm(qp, prev, dev);
>  
> +	skb_dst_force(skb);
>  	inet_frag_lru_move(&qp->q);
>  	return -EINPROGRESS;
>

Thanks Eric, with this patch I can no longer reproduce the oops :-)
Eric Dumazet March 14, 2013, 3:05 p.m. UTC | #2
On Thu, 2013-03-14 at 14:45 +0000, Tom Parkin wrote:
> On Thu, Mar 14, 2013 at 02:18:04AM +0100, Eric Dumazet wrote:
> > Ah thanks for this, as this definitely makes more sense ;)
> > 
> > Could you try the following fix ?
> > 
> > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> > index b6d30ac..87f4ecb 100644
> > --- a/net/ipv4/ip_fragment.c
> > +++ b/net/ipv4/ip_fragment.c
> > @@ -529,6 +529,7 @@ found:
> >  	    qp->q.meat == qp->q.len)
> >  		return ip_frag_reasm(qp, prev, dev);
> >  
> > +	skb_dst_force(skb);
> >  	inet_frag_lru_move(&qp->q);
> >  	return -EINPROGRESS;
> >
> 
> Thanks Eric, with this patch I can no longer reproduce the oops :-)

Thanks for testing.

I am considering an alternative patch :

We can drop the reference instead, and use the dst of the last skb.

This would help to not dirty the dst refcount.

I'll send an updated version.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index b6d30ac..87f4ecb 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -529,6 +529,7 @@  found:
 	    qp->q.meat == qp->q.len)
 		return ip_frag_reasm(qp, prev, dev);
 
+	skb_dst_force(skb);
 	inet_frag_lru_move(&qp->q);
 	return -EINPROGRESS;