Question about __alloc_skb() speedup

Submitted by Junchang Wang on Dec. 3, 2010, 10:14 a.m.

Details

Message ID 20101203101450.GA9573@Desktop-Junchang
State RFC
Delegated to: David Miller
Headers show

Commit Message

Junchang Wang Dec. 3, 2010, 10:14 a.m.
Hi Eric,

I'm reading your patch (ec7d2f2cf3a1 __alloc_skb() speedup),
in which you prefetch skb and the shinfo part. I'm very
curious why we don't prefetch skb->data. It seems that will
help tx path a lot.

I added the following code


and the pktgen in my server (A Intel SR1625 server with two E5530 
4-core processors and a single ixgbe-based NIC) goes from 7.6Mpps to
8.4Mpps (64 byte), with 10% performance gain.

For rx path, I did experiments on both ixgbe and igb with pktgen+kute,
and there is no change in system performance.

welcome any suggestions and corrections.

Thanks.

--Junchang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Dec. 3, 2010, 10:50 a.m.
Le vendredi 03 décembre 2010 à 18:14 +0800, Junchang Wang a écrit :
> Hi Eric,
> 
> I'm reading your patch (ec7d2f2cf3a1 __alloc_skb() speedup),
> in which you prefetch skb and the shinfo part. I'm very
> curious why we don't prefetch skb->data. It seems that will
> help tx path a lot.
> 
> I added the following code
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 104f844..c60a808 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -222,6 +222,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  
>  		child->fclone = SKB_FCLONE_UNAVAILABLE;
>  	}
> +	prefetchw(data);
> +
>  out:
>  	return skb;
>  nodata:
> 
> and the pktgen in my server (A Intel SR1625 server with two E5530 
> 4-core processors and a single ixgbe-based NIC) goes from 7.6Mpps to
> 8.4Mpps (64 byte), with 10% performance gain.
> 
> For rx path, I did experiments on both ixgbe and igb with pktgen+kute,
> and there is no change in system performance.
> 
> welcome any suggestions and corrections.
> 
> Thanks.

This is because __alloc_skb() is generic :

We dont know if the skb->data is going to be used right after or not at
all.

For example, NIC drivers call __alloc_skb() to refill their RX ring
buffer. There is no gain to prefetch data in this case since the data is
going to be written by the NIC hardware. The reverse would be needed
actually : ask to local cpu to evict data from its cache, so that device
can DMA it faster (less bus transactions)

By the way, adding prefetchw() right before the "return skb;" is
probably not very useful. You can certainly try to add the prefetchw()
in pktgen itself, since you know for sure you are going to write the
data.

I dont understand your 10% speedup because pktgen actually uses
__netdev_alloc_skb(), so it calls skb_reserve(skb, NET_SKB_PAD) : your
prefetchw is bringing a cache line that wont be used at all by pktgen.

I would say 10% sounds highly suspect to me...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 104f844..c60a808 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -222,6 +222,8 @@  struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 		child->fclone = SKB_FCLONE_UNAVAILABLE;
 	}
+	prefetchw(data);
+
 out:
 	return skb;
 nodata: