diff mbox

[PATCHv2,1/2,NET] add skb_recycle_check() to enable netdriver skb recycling

Message ID 20080928204633.GB19070@xi.wantstofly.org
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Lennert Buytenhek Sept. 28, 2008, 8:46 p.m. UTC
This patch adds skb_recycle_check(), which can be used by a network
driver after transmitting an skb to check whether this skb can be
recycled as a receive buffer.

skb_recycle_check() checks that the skb is not shared or cloned, and
that it is linear and its head portion large enough (as determined by
the driver) to be recycled as a receive buffer.  If these conditions
are met, it does any necessary reference count dropping and cleans
up the skbuff as if it just came from __alloc_skb().

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
---
 include/linux/skbuff.h |    2 ++
 net/core/skbuff.c      |   41 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 41 insertions(+), 2 deletions(-)

Comments

Evgeniy Polyakov Sept. 28, 2008, 9:38 p.m. UTC | #1
Hi Lennert.

On Sun, Sep 28, 2008 at 10:46:33PM +0200, Lennert Buytenhek (buytenh@wantstofly.org) wrote:
> This patch adds skb_recycle_check(), which can be used by a network
> driver after transmitting an skb to check whether this skb can be
> recycled as a receive buffer.
> 
> skb_recycle_check() checks that the skb is not shared or cloned, and
> that it is linear and its head portion large enough (as determined by
> the driver) to be recycled as a receive buffer.  If these conditions
> are met, it does any necessary reference count dropping and cleans
> up the skbuff as if it just came from __alloc_skb().

Shouldn't it also perfrom all actions kfree_skb() does except actual
freeing, since given skb can be from socket, so it may leak dst entries
and break socket memory accounting?
Lennert Buytenhek Sept. 28, 2008, 9:50 p.m. UTC | #2
On Mon, Sep 29, 2008 at 01:38:24AM +0400, Evgeniy Polyakov wrote:

> Hi Lennert.

Hi,


> > This patch adds skb_recycle_check(), which can be used by a network
> > driver after transmitting an skb to check whether this skb can be
> > recycled as a receive buffer.
> > 
> > skb_recycle_check() checks that the skb is not shared or cloned, and
> > that it is linear and its head portion large enough (as determined by
> > the driver) to be recycled as a receive buffer.  If these conditions
> > are met, it does any necessary reference count dropping and cleans
> > up the skbuff as if it just came from __alloc_skb().
> 
> Shouldn't it also perfrom all actions kfree_skb() does except actual
> freeing, since given skb can be from socket, so it may leak dst entries
> and break socket memory accounting?

Doesn't it?  It calls skb_release_head_state() (new function), which
should do that.


cheers,
Lennert
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Sept. 29, 2008, 5:39 a.m. UTC | #3
Hi Lennert.

On Sun, Sep 28, 2008 at 11:50:35PM +0200, Lennert Buytenhek (buytenh@wantstofly.org) wrote:
> Doesn't it?  It calls skb_release_head_state() (new function), which
> should do that.

Argh, you are right, I missed this hunk:
-/* Free everything but the sk_buff shell. */
-static void skb_release_all(struct sk_buff *skb)
+static void skb_release_head_state(struct sk_buff *skb) 

Patches look good.
Lennert Buytenhek Sept. 29, 2008, 9:46 a.m. UTC | #4
On Mon, Sep 29, 2008 at 09:39:31AM +0400, Evgeniy Polyakov wrote:

> Hi Lennert.

Hi Evgeniy,


> > Doesn't it?  It calls skb_release_head_state() (new function), which
> > should do that.
> 
> Argh, you are right, I missed this hunk:
> -/* Free everything but the sk_buff shell. */
> -static void skb_release_all(struct sk_buff *skb)
> +static void skb_release_head_state(struct sk_buff *skb) 
> 
> Patches look good.

Cool, thanks for having a look.

Could you (or anyone else) try this patch on one of your boxes and
see what kind of effect it has on performance?


cheers,
Lennert
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Sept. 29, 2008, 2:33 p.m. UTC | #5
Hi Lennert.

On Mon, Sep 29, 2008 at 11:46:50AM +0200, Lennert Buytenhek (buytenh@wantstofly.org) wrote:
> Could you (or anyone else) try this patch on one of your boxes and
> see what kind of effect it has on performance?

Well, this one does not have any effect, since it is not used in any
datapath, and I do not have mv643 boards :)
Jesse Brandeburg Sept. 29, 2008, 8:21 p.m. UTC | #6
Lennert Buytenhek wrote:
> Could you (or anyone else) try this patch on one of your boxes and
> see what kind of effect it has on performance?

Lennert, we're very interested in this kind of work, and I'll see what
we can come up with by porting ixgbe or igb to use the patch.  Bad news
is it may be as far away as a couple of weeks, because of this e1000e
corrupt NVM thing I'm working on.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lennert Buytenhek Sept. 30, 2008, 2:57 a.m. UTC | #7
On Mon, Sep 29, 2008 at 01:21:05PM -0700, Brandeburg, Jesse wrote:

> > Could you (or anyone else) try this patch on one of your boxes and
> > see what kind of effect it has on performance?
> 
> Lennert, we're very interested in this kind of work, and I'll see what
> we can come up with by porting ixgbe or igb to use the patch. 

Cool, thanks.  I'd be very interested what kind of difference it
makes (if any) on larger systems than the ones I've tried it on.


> Bad news is it may be as far away as a couple of weeks, because of
> this e1000e corrupt NVM thing I'm working on.

I really hope you'll figure that one out soon..
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lennert Buytenhek Sept. 30, 2008, 3:08 a.m. UTC | #8
On Mon, Sep 29, 2008 at 06:33:27PM +0400, Evgeniy Polyakov wrote:

> Hi Lennert.

Hi Evgeniy,


> On Mon, Sep 29, 2008 at 11:46:50AM +0200, Lennert Buytenhek (buytenh@wantstofly.org) wrote:
> > Could you (or anyone else) try this patch on one of your boxes and
> > see what kind of effect it has on performance?
> 
> Well, this one does not have any effect, since it is not used in any
> datapath, and I do not have mv643 boards :)

I meant to port it to one of the drivers you use, of course. :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Oct. 2, 2008, 7:17 a.m. UTC | #9
Hi Lennert.

On Tue, Sep 30, 2008 at 05:08:36AM +0200, Lennert Buytenhek (buytenh@wantstofly.org) wrote:
> > Well, this one does not have any effect, since it is not used in any
> > datapath, and I do not have mv643 boards :)
> 
> I meant to port it to one of the drivers you use, of course. :)

I can try to work with e1000 if time permits, that's the only NICs I
have connected to fast links. But main part of the sentence is "if time
permits" :)
Terry Oct. 8, 2008, 1:24 a.m. UTC | #10
Brandeburg, Jesse wrote:
> Lennert Buytenhek wrote:
>   
>> Could you (or anyone else) try this patch on one of your boxes and
>> see what kind of effect it has on performance?
>>     
>
> Lennert, we're very interested in this kind of work, and I'll see what
> we can come up with by porting ixgbe or igb to use the patch.  Bad news
> is it may be as far away as a couple of weeks, because of this e1000e
> corrupt NVM thing I'm working on.
>
>   
ixgbe's skb recycle patch means a lot to me :-)

Terry

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9099237..b8a5ac7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -369,6 +369,8 @@  static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
 	return __alloc_skb(size, priority, 1, -1);
 }
 
+extern int skb_recycle_check(struct sk_buff *skb, int skb_size);
+
 extern struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src);
 extern struct sk_buff *skb_clone(struct sk_buff *skb,
 				 gfp_t priority);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ca1ccdf..2c218a0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -363,8 +363,7 @@  static void kfree_skbmem(struct sk_buff *skb)
 	}
 }
 
-/* Free everything but the sk_buff shell. */
-static void skb_release_all(struct sk_buff *skb)
+static void skb_release_head_state(struct sk_buff *skb)
 {
 	dst_release(skb->dst);
 #ifdef CONFIG_XFRM
@@ -388,6 +387,12 @@  static void skb_release_all(struct sk_buff *skb)
 	skb->tc_verd = 0;
 #endif
 #endif
+}
+
+/* Free everything but the sk_buff shell. */
+static void skb_release_all(struct sk_buff *skb)
+{
+	skb_release_head_state(skb);
 	skb_release_data(skb);
 }
 
@@ -424,6 +429,38 @@  void kfree_skb(struct sk_buff *skb)
 	__kfree_skb(skb);
 }
 
+int skb_recycle_check(struct sk_buff *skb, int skb_size)
+{
+	struct skb_shared_info *shinfo;
+
+	if (skb_is_nonlinear(skb) || skb->fclone != SKB_FCLONE_UNAVAILABLE)
+		return 0;
+
+	skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD);
+	if (skb_end_pointer(skb) - skb->head < skb_size)
+		return 0;
+
+	if (skb_shared(skb) || skb_cloned(skb))
+		return 0;
+
+	skb_release_head_state(skb);
+	shinfo = skb_shinfo(skb);
+	atomic_set(&shinfo->dataref, 1);
+	shinfo->nr_frags = 0;
+	shinfo->gso_size = 0;
+	shinfo->gso_segs = 0;
+	shinfo->gso_type = 0;
+	shinfo->ip6_frag_id = 0;
+	shinfo->frag_list = NULL;
+
+	memset(skb, 0, offsetof(struct sk_buff, tail));
+	skb_reset_tail_pointer(skb);
+	skb->data = skb->head + NET_SKB_PAD;
+
+	return 1;
+}
+EXPORT_SYMBOL(skb_recycle_check);
+
 static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 {
 	new->tstamp		= old->tstamp;