Message ID | 1318545567.2533.46.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 14 Oct 2011 00:39:27 +0200 > Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size > by the caller to properly account frag sizes in skb->truesize > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > Please David double check this one as I am not very familiar with NIU > code. Thanks ! It looks perfect! And if it's not I'll soon find out :-) Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 13 octobre 2011 à 22:26 -0400, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Fri, 14 Oct 2011 00:39:27 +0200 > > > Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size > > by the caller to properly account frag sizes in skb->truesize > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > --- > > Please David double check this one as I am not very familiar with NIU > > code. Thanks ! > > It looks perfect! And if it's not I'll soon find out :-) > Thanks ! By the way, I noticed NIU uses a get_page() every time a chunk is attached to a skb (only the last chunk of a page is given without the get_page()) So I thought it might incur false sharing if a previous SKB using a chunk from same page is processed by another CPU. But then I see you also do in niu_rbr_add_page(), rigth after the alloc_page(), the thing I was thinking to add : (perform all needed get_page() in a single shot) atomic_add(rp->rbr_blocks_per_page - 1, &compound_head(page)->_count); So I am a bit lost here. Arent you doing too many page->_count increases ? Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 14 Oct 2011 05:33:51 +0200 > But then I see you also do in niu_rbr_add_page(), rigth after the > alloc_page(), the thing I was thinking to add : (perform all needed > get_page() in a single shot) > > atomic_add(rp->rbr_blocks_per_page - 1, > &compound_head(page)->_count); > > So I am a bit lost here. Arent you doing too many page->_count > increases ? It would be pretty amazing for a leak of this magnitude to exist for so long. :-) A page can be split into multiple blocks, each block is some power of two in size. The chip splits up "blocks" into smaller (also power of two) fragments, and these fragments are what we en-tail to the SKBs. So at the top level we give the chip blocks. We try to make this equal to PAGE_SIZE. But if PAGE_SIZE is really large we limit the block size to 1 << 15. Note that it is only when we enforce this block size limit that the compount_page(page)->_count atomic increment will occur. As long as PAGE_SIZE <= 1 << 15, rbr_blocks_per_page will be 1. When the chip takes a block and starts using it, it decides which fragment size to use for that block. Once a fragment size has been choosen for a block, it will not change. The fragment sizes the chip can use is stored in rp->rbr_sizes[]. We always configure the chip to use 256 byte and 1024 byte blocks, then depending upon the MTU and the PAGE_SIZE we'll optionally enable other sizes such as 2048, 4096, and 8192. When we get an RX packet the descriptor tells us the DMA address and the fragment size in use for the block that the memory at DMA address belongs to. So the two seperate page reference count grabs you see are handling references for memory being chopped up at two different levels. I can't see how we could optimize the intra-block refcounts any further. Part of the problem is that we don't know apriori what fragment size the chip will use for a given block. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 14 octobre 2011 à 00:34 -0400, David Miller a écrit : > > It would be pretty amazing for a leak of this magnitude to exist for > so long. :-) > > A page can be split into multiple blocks, each block is some power > of two in size. > > The chip splits up "blocks" into smaller (also power of two) > fragments, and these fragments are what we en-tail to the SKBs. > > So at the top level we give the chip blocks. We try to make this > equal to PAGE_SIZE. But if PAGE_SIZE is really large we limit the > block size to 1 << 15. Note that it is only when we enforce this > block size limit that the compount_page(page)->_count atomic increment > will occur. As long as PAGE_SIZE <= 1 << 15, rbr_blocks_per_page > will be 1. > > When the chip takes a block and starts using it, it decides which > fragment size to use for that block. Once a fragment size has been > choosen for a block, it will not change. > > The fragment sizes the chip can use is stored in rp->rbr_sizes[]. We > always configure the chip to use 256 byte and 1024 byte blocks, then > depending upon the MTU and the PAGE_SIZE we'll optionally enable other > sizes such as 2048, 4096, and 8192. > > When we get an RX packet the descriptor tells us the DMA address > and the fragment size in use for the block that the memory at > DMA address belongs to. > > So the two seperate page reference count grabs you see are handling > references for memory being chopped up at two different levels. > > I can't see how we could optimize the intra-block refcounts any > further. Part of the problem is that we don't know apriori what > fragment size the chip will use for a given block. > Thanks for taking the time to explain this David :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c index d133888..23740e8 100644 --- a/drivers/net/ethernet/sun/niu.c +++ b/drivers/net/ethernet/sun/niu.c @@ -3287,17 +3287,13 @@ static u16 tcam_get_valid_entry_cnt(struct niu *np) } static void niu_rx_skb_append(struct sk_buff *skb, struct page *page, - u32 offset, u32 size) + u32 offset, u32 size, u32 truesize) { - int i = skb_shinfo(skb)->nr_frags; - - __skb_fill_page_desc(skb, i, page, offset, size); + skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags, page, offset, size); skb->len += size; skb->data_len += size; - skb->truesize += size; - - skb_shinfo(skb)->nr_frags = i + 1; + skb->truesize += truesize; } static unsigned int niu_hash_rxaddr(struct rx_ring_info *rp, u64 a) @@ -3480,7 +3476,7 @@ static int niu_process_rx_pkt(struct napi_struct *napi, struct niu *np, } else if (!(val & RCR_ENTRY_MULTI)) append_size = len - skb->len; - niu_rx_skb_append(skb, page, off, append_size); + niu_rx_skb_append(skb, page, off, append_size, rcr_size); if ((page->index + rp->rbr_block_size) - rcr_size == addr) { *link = (struct page *) page->mapping; np->ops->unmap_page(np->device, page->index,
Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size by the caller to properly account frag sizes in skb->truesize Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- Please David double check this one as I am not very familiar with NIU code. Thanks ! drivers/net/ethernet/sun/niu.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html