Message ID | 20100902.204332.02275687.davem@davemloft.net |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
Le jeudi 02 septembre 2010 à 20:43 -0700, David Miller a écrit : > When pskb_expand_head() releases the data, with skb_release_data(), it > tries to properly preserve any fragment list using > skb_clone_fraglist(). > > Although skb_clone_fraglist() will properly grab a reference to all of > the fragment list SKBs, it will not block skb_release_data() from > NULL'ing out the ->frag_list pointer when it calls skb_drop_list() via > skb_drop_fraglist(). > > As a result we lose the fragment SKBs and they are leaked forever. > > Instead, hide the fragment list pointer around the skb_release_data() > call and restore it afterwards. This fixes the bug and also makes > it cheaper since we won't grab and release every single fragment > list SKB reference. > > Signed-off-by: David S. Miller <davem@davemloft.net> > --- > > I found this via pure code inspection, please review. > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 26396ff..def2e49 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -789,6 +789,7 @@ EXPORT_SYMBOL(pskb_copy); > int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > gfp_t gfp_mask) > { > + struct sk_buff *frag_list; > int i; > u8 *data; > #ifdef NET_SKBUFF_DATA_USES_OFFSET > @@ -822,11 +823,13 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) > get_page(skb_shinfo(skb)->frags[i].page); > > - if (skb_has_frags(skb)) > - skb_clone_fraglist(skb); > + frag_list = skb_shinfo(skb)->frag_list; > + skb_shinfo(skb)->frag_list = NULL; > > skb_release_data(skb); > > + skb_shinfo(skb)->frag_list = frag_list; > + > off = (data + nhead) - skb->head; > > skb->head = data; David, I had this same idea some days ago when reviewing this code, but when I came to conclusion we could not avoid the get_page / put_page() on skb_shinfo(skb)->frags[i].page. I thought it was not worth trying to avoid the frag_list grab/release operation. But we are the only user of this skb and skb_shinfo(skb)->frag_list, so your patch seems good to me. Please note you are not fixing a bug, because the new frag_list pointer was correctly copied in the memcpy(data + size, skb_end_pointer(skb), offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags])); Please rewrite the changelog to say this patch is an optimization, to avoid the atomic ops on each skb found in frag_list ? Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 03 septembre 2010 à 07:48 +0200, Eric Dumazet a écrit : > David, I had this same idea some days ago when reviewing this code, > but when I came to conclusion we could not avoid the get_page / > put_page() on skb_shinfo(skb)->frags[i].page. I thought it was not worth > trying to avoid the frag_list grab/release operation. > > But we are the only user of this skb and skb_shinfo(skb)->frag_list, so > your patch seems good to me. > > Please note you are not fixing a bug, because the new frag_list pointer > was correctly copied in the > > memcpy(data + size, skb_end_pointer(skb), > offsetof(struct skb_shared_info, > frags[skb_shinfo(skb)->nr_frags])); > > Please rewrite the changelog to say this patch is an optimization, to > avoid the atomic ops on each skb found in frag_list ? > Well, skb is not shared, but we have no guarantee skb_shinfo(skb) is not. To optimize this thing, you'll need to add a new parameter to skb_release_data() ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 26396ff..def2e49 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -789,6 +789,7 @@ EXPORT_SYMBOL(pskb_copy); int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, gfp_t gfp_mask) { + struct sk_buff *frag_list; int i; u8 *data; #ifdef NET_SKBUFF_DATA_USES_OFFSET @@ -822,11 +823,13 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) get_page(skb_shinfo(skb)->frags[i].page); - if (skb_has_frags(skb)) - skb_clone_fraglist(skb); + frag_list = skb_shinfo(skb)->frag_list; + skb_shinfo(skb)->frag_list = NULL; skb_release_data(skb); + skb_shinfo(skb)->frag_list = frag_list; + off = (data + nhead) - skb->head; skb->head = data;
When pskb_expand_head() releases the data, with skb_release_data(), it tries to properly preserve any fragment list using skb_clone_fraglist(). Although skb_clone_fraglist() will properly grab a reference to all of the fragment list SKBs, it will not block skb_release_data() from NULL'ing out the ->frag_list pointer when it calls skb_drop_list() via skb_drop_fraglist(). As a result we lose the fragment SKBs and they are leaked forever. Instead, hide the fragment list pointer around the skb_release_data() call and restore it afterwards. This fixes the bug and also makes it cheaper since we won't grab and release every single fragment list SKB reference. Signed-off-by: David S. Miller <davem@davemloft.net> --- I found this via pure code inspection, please review. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html