diff mbox

net: Frag list lost on head expansion.

Message ID 20100902.204332.02275687.davem@davemloft.net
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller Sept. 3, 2010, 3:43 a.m. UTC
When pskb_expand_head() releases the data, with skb_release_data(), it
tries to properly preserve any fragment list using
skb_clone_fraglist().

Although skb_clone_fraglist() will properly grab a reference to all of
the fragment list SKBs, it will not block skb_release_data() from
NULL'ing out the ->frag_list pointer when it calls skb_drop_list() via
skb_drop_fraglist().

As a result we lose the fragment SKBs and they are leaked forever.

Instead, hide the fragment list pointer around the skb_release_data()
call and restore it afterwards.  This fixes the bug and also makes
it cheaper since we won't grab and release every single fragment
list SKB reference.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

I found this via pure code inspection, please review.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Sept. 3, 2010, 5:48 a.m. UTC | #1
Le jeudi 02 septembre 2010 à 20:43 -0700, David Miller a écrit :
> When pskb_expand_head() releases the data, with skb_release_data(), it
> tries to properly preserve any fragment list using
> skb_clone_fraglist().
> 
> Although skb_clone_fraglist() will properly grab a reference to all of
> the fragment list SKBs, it will not block skb_release_data() from
> NULL'ing out the ->frag_list pointer when it calls skb_drop_list() via
> skb_drop_fraglist().
> 
> As a result we lose the fragment SKBs and they are leaked forever.
> 
> Instead, hide the fragment list pointer around the skb_release_data()
> call and restore it afterwards.  This fixes the bug and also makes
> it cheaper since we won't grab and release every single fragment
> list SKB reference.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
> 
> I found this via pure code inspection, please review.
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 26396ff..def2e49 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -789,6 +789,7 @@ EXPORT_SYMBOL(pskb_copy);
>  int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  		     gfp_t gfp_mask)
>  {
> +	struct sk_buff *frag_list;
>  	int i;
>  	u8 *data;
>  #ifdef NET_SKBUFF_DATA_USES_OFFSET
> @@ -822,11 +823,13 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
>  		get_page(skb_shinfo(skb)->frags[i].page);
>  
> -	if (skb_has_frags(skb))
> -		skb_clone_fraglist(skb);
> +	frag_list = skb_shinfo(skb)->frag_list;
> +	skb_shinfo(skb)->frag_list = NULL;
>  
>  	skb_release_data(skb);
>  
> +	skb_shinfo(skb)->frag_list = frag_list;
> +
>  	off = (data + nhead) - skb->head;
>  
>  	skb->head     = data;

David, I had this same idea some days ago when reviewing this code,
but when I came to conclusion we could not avoid the get_page /
put_page() on skb_shinfo(skb)->frags[i].page. I thought it was not worth
trying to avoid the frag_list grab/release operation.

But we are the only user of this skb and skb_shinfo(skb)->frag_list, so
your patch seems good to me.

Please note you are not fixing a bug, because the new frag_list pointer
was correctly copied in the

memcpy(data + size, skb_end_pointer(skb), 
	offsetof(struct skb_shared_info,
		frags[skb_shinfo(skb)->nr_frags]));

Please rewrite the changelog to say this patch is an optimization, to
avoid the atomic ops on each skb found in frag_list ?

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 3, 2010, 6:19 a.m. UTC | #2
Le vendredi 03 septembre 2010 à 07:48 +0200, Eric Dumazet a écrit :

> David, I had this same idea some days ago when reviewing this code,
> but when I came to conclusion we could not avoid the get_page /
> put_page() on skb_shinfo(skb)->frags[i].page. I thought it was not worth
> trying to avoid the frag_list grab/release operation.
> 
> But we are the only user of this skb and skb_shinfo(skb)->frag_list, so
> your patch seems good to me.
> 
> Please note you are not fixing a bug, because the new frag_list pointer
> was correctly copied in the
> 
> memcpy(data + size, skb_end_pointer(skb), 
> 	offsetof(struct skb_shared_info,
> 		frags[skb_shinfo(skb)->nr_frags]));
> 
> Please rewrite the changelog to say this patch is an optimization, to
> avoid the atomic ops on each skb found in frag_list ?
> 

Well, skb is not shared, but we have no guarantee skb_shinfo(skb) is
not.

To optimize this thing, you'll need to add a new parameter to
skb_release_data() ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 26396ff..def2e49 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -789,6 +789,7 @@  EXPORT_SYMBOL(pskb_copy);
 int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		     gfp_t gfp_mask)
 {
+	struct sk_buff *frag_list;
 	int i;
 	u8 *data;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
@@ -822,11 +823,13 @@  int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
 		get_page(skb_shinfo(skb)->frags[i].page);
 
-	if (skb_has_frags(skb))
-		skb_clone_fraglist(skb);
+	frag_list = skb_shinfo(skb)->frag_list;
+	skb_shinfo(skb)->frag_list = NULL;
 
 	skb_release_data(skb);
 
+	skb_shinfo(skb)->frag_list = frag_list;
+
 	off = (data + nhead) - skb->head;
 
 	skb->head     = data;