diff mbox

[net-next-2.6] net: pskb_expand_head() optimization

Message ID 1283504972.2453.257.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 3, 2010, 9:09 a.m. UTC
Le vendredi 03 septembre 2010 à 07:48 +0200, Eric Dumazet a écrit :

> David, I had this same idea some days ago when reviewing this code,
> but when I came to conclusion we could not avoid the get_page /
> put_page() on skb_shinfo(skb)->frags[i].page. I thought it was not worth
> trying to avoid the frag_list grab/release operation.
> 

Here is the patch I cooked, for net-next-2.6

[PATCH net-next-2.6] net: pskb_expand_head() optimization

pskb_expand_head() blindly takes references on fragments before calling
skb_release_data(), potentially releasing these references.

We can add a fast path, avoiding these atomic operations, if we own the
last reference on skb->head.

Based on a previous patch from David

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/skbuff.c |   25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 3, 2010, 1:46 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 03 Sep 2010 11:09:32 +0200

> Le vendredi 03 septembre 2010 à 07:48 +0200, Eric Dumazet a écrit :
> 
>> David, I had this same idea some days ago when reviewing this code,
>> but when I came to conclusion we could not avoid the get_page /
>> put_page() on skb_shinfo(skb)->frags[i].page. I thought it was not worth
>> trying to avoid the frag_list grab/release operation.
>> 
> 
> Here is the patch I cooked, for net-next-2.6
> 
> [PATCH net-next-2.6] net: pskb_expand_head() optimization
> 
> pskb_expand_head() blindly takes references on fragments before calling
> skb_release_data(), potentially releasing these references.
> 
> We can add a fast path, avoiding these atomic operations, if we own the
> last reference on skb->head.
> 
> Based on a previous patch from David
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Ok, I see how this works now, thanks Eric.  But this looks to be a bit
too complicated to be worthwhile, I think, so let's hold off on this
optimization for a while.

Let me tell you why I was swimming around in this area and what my
ultimate motivation is :-)

In trying to eventually convert sk_buff to list_head one of the
troubling areas is this frag_list business.

Amusingly, if you look, virtually all of the code which constructs
frag_list SKBs wants a doubly linked list so it can insert at the tail
(all LRO gathering, GRO, you name it).

There are only two operations that make a double-linked list currently
impossible.  This pskb_expand_head() thing, and pskb_copy().

They cause the situation where the list is shared between multiple SKB
data shared areas.

And because of this a doubly-linked list like list_head won't work at
all.

For the optimized case of pskb_expand_head() you discovered we can avoid
this sharing.  And at this point I imagine that for the remaining cases
we can simply copy the full SKBs in the frag list to avoid this list
sharing.

So that's where I'm trying to go in all of this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 7, 2010, 1:25 a.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 03 Sep 2010 11:09:32 +0200

> [PATCH net-next-2.6] net: pskb_expand_head() optimization
> 
> pskb_expand_head() blindly takes references on fragments before calling
> skb_release_data(), potentially releasing these references.
> 
> We can add a fast path, avoiding these atomic operations, if we own the
> last reference on skb->head.
> 
> Based on a previous patch from David
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Eric, I ended up applying this patch after all, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 231dff0..59b96fe 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -781,6 +781,7 @@  int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	u8 *data;
 	int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail;
 	long off;
+	bool fastpath;
 
 	BUG_ON(nhead < 0);
 
@@ -802,14 +803,28 @@  int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	       skb_shinfo(skb),
 	       offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags]));
 
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-		get_page(skb_shinfo(skb)->frags[i].page);
+	/* Check if we can avoid taking references on fragments if we own
+	 * the last reference on skb->head. (see skb_release_data())
+	 */
+	if (!skb->cloned)
+		fastpath = true;
+	else {
+		int delta = skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1;
 
-	if (skb_has_frag_list(skb))
-		skb_clone_fraglist(skb);
+		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
+	}
 
-	skb_release_data(skb);
+	if (fastpath) {
+		kfree(skb->head);
+	} else {
+		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+			get_page(skb_shinfo(skb)->frags[i].page);
 
+		if (skb_has_frag_list(skb))
+			skb_clone_fraglist(skb);
+
+		skb_release_data(skb);
+	}
 	off = (data + nhead) - skb->head;
 
 	skb->head     = data;