diff mbox

[net-next,09/11] sfc: Prefetch RX skb header area

Message ID 1371680432.1956.116.camel@bwh-desktop.uk.level5networks.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Ben Hutchings June 19, 2013, 10:20 p.m. UTC
When we pre-allocate skbs, the sk_buff structures themselves will be
cache-hot (having just been filled in) but the header area will be
cache-cold.  The skb cache is small and we're likely to use each skb
quite soon, so prefetch the start of the header area, where we will
copy the headers, and the skb_shared_info structure, where we will
store page fragments.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/rx.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Eric Dumazet June 19, 2013, 10:36 p.m. UTC | #1
On Wed, 2013-06-19 at 23:20 +0100, Ben Hutchings wrote:
> When we pre-allocate skbs, the sk_buff structures themselves will be
> cache-hot (having just been filled in) but the header area will be
> cache-cold.  The skb cache is small and we're likely to use each skb
> quite soon, so prefetch the start of the header area, where we will
> copy the headers, and the skb_shared_info structure, where we will
> store page fragments.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
>  drivers/net/ethernet/sfc/rx.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
> index a28f2fe..0dc4a80 100644
> --- a/drivers/net/ethernet/sfc/rx.c
> +++ b/drivers/net/ethernet/sfc/rx.c
> @@ -106,11 +106,17 @@ void efx_rx_config_page_split(struct efx_nic *efx)
>  
>  static void efx_refill_skb_cache(struct efx_rx_queue *rx_queue)
>  {
> +	struct sk_buff *skb;
>  	int i;
> +
>  	for (i = 0; i < SKB_CACHE_SIZE; ++i) {
> -		rx_queue->skb_cache[i] =
> -			netdev_alloc_skb(rx_queue->efx->net_dev,
> -					 EFX_SKB_HEADERS + EFX_PAGE_SKB_ALIGN);
> +		skb = netdev_alloc_skb(rx_queue->efx->net_dev,
> +				       EFX_SKB_HEADERS + EFX_PAGE_SKB_ALIGN);
> +		rx_queue->skb_cache[i] = skb;
> +		if (skb) {
> +			prefetch(skb->data);

This is really suspect.

This part will be dirtied anyway by the NIC once frame is received.

> +			prefetch(skb_shinfo(skb));
> +		}
>  	}
>  	rx_queue->skb_cache_next_unused = 0;
>  }
> 
> 

Quite frankly this cache sounds suspect to me.

Why is it needed only for sfc ?

Have you tried build_skb() instead ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings June 19, 2013, 10:39 p.m. UTC | #2
On Wed, 2013-06-19 at 15:36 -0700, Eric Dumazet wrote:
> On Wed, 2013-06-19 at 23:20 +0100, Ben Hutchings wrote:
> > When we pre-allocate skbs, the sk_buff structures themselves will be
> > cache-hot (having just been filled in) but the header area will be
> > cache-cold.  The skb cache is small and we're likely to use each skb
> > quite soon, so prefetch the start of the header area, where we will
> > copy the headers, and the skb_shared_info structure, where we will
> > store page fragments.
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > ---
> >  drivers/net/ethernet/sfc/rx.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
> > index a28f2fe..0dc4a80 100644
> > --- a/drivers/net/ethernet/sfc/rx.c
> > +++ b/drivers/net/ethernet/sfc/rx.c
> > @@ -106,11 +106,17 @@ void efx_rx_config_page_split(struct efx_nic *efx)
> >  
> >  static void efx_refill_skb_cache(struct efx_rx_queue *rx_queue)
> >  {
> > +	struct sk_buff *skb;
> >  	int i;
> > +
> >  	for (i = 0; i < SKB_CACHE_SIZE; ++i) {
> > -		rx_queue->skb_cache[i] =
> > -			netdev_alloc_skb(rx_queue->efx->net_dev,
> > -					 EFX_SKB_HEADERS + EFX_PAGE_SKB_ALIGN);
> > +		skb = netdev_alloc_skb(rx_queue->efx->net_dev,
> > +				       EFX_SKB_HEADERS + EFX_PAGE_SKB_ALIGN);
> > +		rx_queue->skb_cache[i] = skb;
> > +		if (skb) {
> > +			prefetch(skb->data);
> 
> This is really suspect.
> 
> This part will be dirtied anyway by the NIC once frame is received.

This is not used for DMA.

> > +			prefetch(skb_shinfo(skb));
> > +		}
> >  	}
> >  	rx_queue->skb_cache_next_unused = 0;
> >  }
> > 
> > 
> 
> Quite frankly this cache sounds suspect to me.
> 
> Why is it needed only for sfc ?
> 
> Have you tried build_skb() instead ?

You can't use build_skb() if you put multiple RX buffers in a page.  See
<1365805319.2791.18.camel@bwh-desktop.uk.solarflarecom.com>.

Ben.
Eric Dumazet June 19, 2013, 10:56 p.m. UTC | #3
On Wed, 2013-06-19 at 23:39 +0100, Ben Hutchings wrote:

> You can't use build_skb() if you put multiple RX buffers in a page.  See
> <1365805319.2791.18.camel@bwh-desktop.uk.solarflarecom.com>.

Well, you use small skb (used to have 64 bytes headroom and you changed
it to 128) and attach a frag on it, so anyway build_skb() brings
nothing.

I suspect you have a win on workloads with flood of rx, but on moderate
load sk_buff will be cold, so the cache adds latencies.

My feeling is that this kind of tradeoff should not be per driver.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings June 19, 2013, 11:10 p.m. UTC | #4
On Wed, 2013-06-19 at 15:56 -0700, Eric Dumazet wrote:
> On Wed, 2013-06-19 at 23:39 +0100, Ben Hutchings wrote:
> 
> > You can't use build_skb() if you put multiple RX buffers in a page.  See
> > <1365805319.2791.18.camel@bwh-desktop.uk.solarflarecom.com>.
> 
> Well, you use small skb (used to have 64 bytes headroom and you changed
> it to 128) and attach a frag on it, so anyway build_skb() brings
> nothing.
> 
> I suspect you have a win on workloads with flood of rx, but on moderate
> load sk_buff will be cold, so the cache adds latencies.
> 
> My feeling is that this kind of tradeoff should not be per driver.

Jon's testing showed a significant latency improvement.

In Linux 3.10 sfc has reduced UDP performance (both latency and
throughput) as we removed the support for skb pre-allocation in
preparation for the implementation for DMA scatter and recycling of
IOMMU-mapped buffers.  The RX changes in this series are intended to
recover that, and our benchmark results suggest that we succeeded.

Ben.
diff mbox

Patch

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index a28f2fe..0dc4a80 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -106,11 +106,17 @@  void efx_rx_config_page_split(struct efx_nic *efx)
 
 static void efx_refill_skb_cache(struct efx_rx_queue *rx_queue)
 {
+	struct sk_buff *skb;
 	int i;
+
 	for (i = 0; i < SKB_CACHE_SIZE; ++i) {
-		rx_queue->skb_cache[i] =
-			netdev_alloc_skb(rx_queue->efx->net_dev,
-					 EFX_SKB_HEADERS + EFX_PAGE_SKB_ALIGN);
+		skb = netdev_alloc_skb(rx_queue->efx->net_dev,
+				       EFX_SKB_HEADERS + EFX_PAGE_SKB_ALIGN);
+		rx_queue->skb_cache[i] = skb;
+		if (skb) {
+			prefetch(skb->data);
+			prefetch(skb_shinfo(skb));
+		}
 	}
 	rx_queue->skb_cache_next_unused = 0;
 }