diff mbox

[net-next] niu: fix skb truesize underestimation

Message ID 1318545567.2533.46.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 13, 2011, 10:39 p.m. UTC
Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size
by the caller to properly account frag sizes in skb->truesize

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
Please David double check this one as I am not very familiar with NIU
code. Thanks !

 drivers/net/ethernet/sun/niu.c |   12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Oct. 14, 2011, 2:26 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 14 Oct 2011 00:39:27 +0200

> Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size
> by the caller to properly account frag sizes in skb->truesize
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> Please David double check this one as I am not very familiar with NIU
> code. Thanks !

It looks perfect!  And if it's not I'll soon find out :-)

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 14, 2011, 3:33 a.m. UTC | #2
Le jeudi 13 octobre 2011 à 22:26 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 14 Oct 2011 00:39:27 +0200
> 
> > Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size
> > by the caller to properly account frag sizes in skb->truesize
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> > Please David double check this one as I am not very familiar with NIU
> > code. Thanks !
> 
> It looks perfect!  And if it's not I'll soon find out :-)
> 

Thanks !

By the way, I noticed NIU uses a get_page() every time a chunk is
attached to a skb (only the last chunk of a page is given without the
get_page())

So I thought it might incur false sharing if a previous SKB using a
chunk from same page is processed by another CPU.

But then I see you also do in niu_rbr_add_page(), rigth after the
alloc_page(), the thing I was thinking to add : (perform all needed
get_page() in a single shot)

atomic_add(rp->rbr_blocks_per_page - 1,
	&compound_head(page)->_count);

So I am a bit lost here. Arent you doing too many page->_count
increases ?

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 14, 2011, 4:34 a.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 14 Oct 2011 05:33:51 +0200

> But then I see you also do in niu_rbr_add_page(), rigth after the
> alloc_page(), the thing I was thinking to add : (perform all needed
> get_page() in a single shot)
> 
> atomic_add(rp->rbr_blocks_per_page - 1,
> 	&compound_head(page)->_count);
> 
> So I am a bit lost here. Arent you doing too many page->_count
> increases ?

It would be pretty amazing for a leak of this magnitude to exist for
so long. :-)

A page can be split into multiple blocks, each block is some power
of two in size.

The chip splits up "blocks" into smaller (also power of two)
fragments, and these fragments are what we en-tail to the SKBs.

So at the top level we give the chip blocks.  We try to make this
equal to PAGE_SIZE.  But if PAGE_SIZE is really large we limit the
block size to 1 << 15.  Note that it is only when we enforce this
block size limit that the compount_page(page)->_count atomic increment
will occur.  As long as PAGE_SIZE <= 1 << 15, rbr_blocks_per_page
will be 1.

When the chip takes a block and starts using it, it decides which
fragment size to use for that block.  Once a fragment size has been
choosen for a block, it will not change.

The fragment sizes the chip can use is stored in rp->rbr_sizes[].  We
always configure the chip to use 256 byte and 1024 byte blocks, then
depending upon the MTU and the PAGE_SIZE we'll optionally enable other
sizes such as 2048, 4096, and 8192.

When we get an RX packet the descriptor tells us the DMA address
and the fragment size in use for the block that the memory at
DMA address belongs to.

So the two seperate page reference count grabs you see are handling
references for memory being chopped up at two different levels.

I can't see how we could optimize the intra-block refcounts any
further.  Part of the problem is that we don't know apriori what
fragment size the chip will use for a given block.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 14, 2011, 6:27 p.m. UTC | #4
Le vendredi 14 octobre 2011 à 00:34 -0400, David Miller a écrit :
> 
> It would be pretty amazing for a leak of this magnitude to exist for
> so long. :-)
> 
> A page can be split into multiple blocks, each block is some power
> of two in size.
> 
> The chip splits up "blocks" into smaller (also power of two)
> fragments, and these fragments are what we en-tail to the SKBs.
> 
> So at the top level we give the chip blocks.  We try to make this
> equal to PAGE_SIZE.  But if PAGE_SIZE is really large we limit the
> block size to 1 << 15.  Note that it is only when we enforce this
> block size limit that the compount_page(page)->_count atomic increment
> will occur.  As long as PAGE_SIZE <= 1 << 15, rbr_blocks_per_page
> will be 1.
> 
> When the chip takes a block and starts using it, it decides which
> fragment size to use for that block.  Once a fragment size has been
> choosen for a block, it will not change.
> 
> The fragment sizes the chip can use is stored in rp->rbr_sizes[].  We
> always configure the chip to use 256 byte and 1024 byte blocks, then
> depending upon the MTU and the PAGE_SIZE we'll optionally enable other
> sizes such as 2048, 4096, and 8192.
> 
> When we get an RX packet the descriptor tells us the DMA address
> and the fragment size in use for the block that the memory at
> DMA address belongs to.
> 
> So the two seperate page reference count grabs you see are handling
> references for memory being chopped up at two different levels.
> 
> I can't see how we could optimize the intra-block refcounts any
> further.  Part of the problem is that we don't know apriori what
> fragment size the chip will use for a given block.
> 

Thanks for taking the time to explain this David :)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
index d133888..23740e8 100644
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -3287,17 +3287,13 @@  static u16 tcam_get_valid_entry_cnt(struct niu *np)
 }
 
 static void niu_rx_skb_append(struct sk_buff *skb, struct page *page,
-			      u32 offset, u32 size)
+			      u32 offset, u32 size, u32 truesize)
 {
-	int i = skb_shinfo(skb)->nr_frags;
-
-	__skb_fill_page_desc(skb, i, page, offset, size);
+	skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags, page, offset, size);
 
 	skb->len += size;
 	skb->data_len += size;
-	skb->truesize += size;
-
-	skb_shinfo(skb)->nr_frags = i + 1;
+	skb->truesize += truesize;
 }
 
 static unsigned int niu_hash_rxaddr(struct rx_ring_info *rp, u64 a)
@@ -3480,7 +3476,7 @@  static int niu_process_rx_pkt(struct napi_struct *napi, struct niu *np,
 		} else if (!(val & RCR_ENTRY_MULTI))
 			append_size = len - skb->len;
 
-		niu_rx_skb_append(skb, page, off, append_size);
+		niu_rx_skb_append(skb, page, off, append_size, rcr_size);
 		if ((page->index + rp->rbr_block_size) - rcr_size == addr) {
 			*link = (struct page *) page->mapping;
 			np->ops->unmap_page(np->device, page->index,