net: introduce alloc_skb_order0

Message ID	1286639996.2692.148.camel@edumazet-laptop
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=iJO+DLZ0BISPohCwkHAwSkSftAg5OZ5kcXdGdFr7LJx/cY+ddgZQvMnuI3t3Mk2atQ rpMWivEENFOoe0mUXSmj+CSt3QmFkaf+wwHcUEl3SSH7FFlvyYBAA3GFCFvttnH2EGT6 OnTGKDa8NLHLopZKiKV79HJzgYkzC2E9HHEtI= Subject: [PATCH] net: introduce alloc_skb_order0 From: Eric Dumazet <eric.dumazet@gmail.com> To: Stanislaw Gruszka <sgruszka@redhat.com>, David Miller <davem@davemloft.net> Cc: Francois Romieu <romieu@fr.zoreil.com>, netdev@vger.kernel.org In-Reply-To: <20101008160341.GC10393@redhat.com> References: <1286547901-10782-1-git-send-email-sgruszka@redhat.com> <20101008145256.GB10393@redhat.com> <1286550247.2959.444.camel@edumazet-laptop> <20101008160341.GC10393@redhat.com> Content-Type: text/plain; charset="UTF-8" Date: Sat, 09 Oct 2010 17:59:56 +0200 Message-ID: <1286639996.2692.148.camel@edumazet-laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk

Message ID

1286639996.2692.148.camel@edumazet-laptop

State

RFC, archived

Delegated to:

David Miller

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=subject:from:to:cc:in-reply-to:references:content-type:date
	:message-id:mime-version:x-mailer:content-transfer-encoding;
	b=iJO+DLZ0BISPohCwkHAwSkSftAg5OZ5kcXdGdFr7LJx/cY+ddgZQvMnuI3t3Mk2atQ
	rpMWivEENFOoe0mUXSmj+CSt3QmFkaf+wwHcUEl3SSH7FFlvyYBAA3GFCFvttnH2EGT6
	OnTGKDa8NLHLopZKiKV79HJzgYkzC2E9HHEtI=
Subject: [PATCH] net: introduce alloc_skb_order0
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Stanislaw Gruszka <sgruszka@redhat.com>,
	David Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>, netdev@vger.kernel.org
In-Reply-To: <20101008160341.GC10393@redhat.com>
References: <1286547901-10782-1-git-send-email-sgruszka@redhat.com>
	<20101008145256.GB10393@redhat.com>
	<1286550247.2959.444.camel@edumazet-laptop>
	<20101008160341.GC10393@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Date: Sat, 09 Oct 2010 17:59:56 +0200
Message-ID: <1286639996.2692.148.camel@edumazet-laptop>
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Commit Message

Eric Dumazet Oct. 9, 2010, 3:59 p.m. UTC

Le vendredi 08 octobre 2010 à 18:03 +0200, Stanislaw Gruszka a écrit :
> On Fri, Oct 08, 2010 at 05:04:07PM +0200, Eric Dumazet wrote:

> > Switch to SLAB -> no more problem ;)
> 
> yeh, I wish to, but fedora use SLUB because of some debugging
> capabilities. 

Yes, of course, I was kidding :)

echo 0 >/sys/kernel/slab/kmalloc-2048/order
echo 0 >/sys/kernel/slab/kmalloc-1024/order
echo 0 >/sys/kernel/slab/kmalloc-512/order

Should do the trick : No more high order allocations for MTU=1500
frames.

For MTU=9000 frames, we probably need something like this patch :

(Not yet for inclusion, this is an RFC, this will need two separate
patches)

[PATCH] net: introduce alloc_skb_order0()

Reception of big frames hit a memory allocation problem, because of high
order pages allocations (order-3 sometimes for MTU=9000). This patch
introduces alloc_skb_order0(), to build skbs with order-0 pages only.

Their headlen is at most SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN)
(3648 bytes on x86_64, 3840 bytes on x86_32)

As net drivers might use skb_store_bits() to copy data to this newly
allocated skb, we might even use __GFP_HIGHMEM for the fragments ?

Note : Use GFP_NOWAIT | __GFP_NOWARN mask to allocate pages, since we
dont want to let big packets exhaust GFP_ATOMIC pool.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/r8169.c    |   19 ++++++---------
 include/linux/skbuff.h |    1 
 net/core/skbuff.c      |   47 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 11 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Stanislaw Gruszka Oct. 11, 2010, 3:55 p.m. UTC | #1

On Sat, Oct 09, 2010 at 05:59:56PM +0200, Eric Dumazet wrote:
> Le vendredi 08 octobre 2010 à 18:03 +0200, Stanislaw Gruszka a écrit :
> > On Fri, Oct 08, 2010 at 05:04:07PM +0200, Eric Dumazet wrote:
> 
> > > Switch to SLAB -> no more problem ;)
> > 
> > yeh, I wish to, but fedora use SLUB because of some debugging
> > capabilities. 
> 
> Yes, of course, I was kidding :)
> 
> echo 0 >/sys/kernel/slab/kmalloc-2048/order
> echo 0 >/sys/kernel/slab/kmalloc-1024/order
> echo 0 >/sys/kernel/slab/kmalloc-512/order
> 
> Should do the trick : No more high order allocations for MTU=1500
> frames.

So the SLUB is great, but we need a patch to avoid using it :-)

> For MTU=9000 frames, we probably need something like this patch :
>
> Reception of big frames hit a memory allocation problem, because of high
> order pages allocations (order-3 sometimes for MTU=9000). This patch
> introduces alloc_skb_order0(), to build skbs with order-0 pages only.

I had never seen allocation problems in rtl8169_try_rx_copy or in any
other driver rx path (except iwlwifi, but now this is solved by using
skb_add_rx_frag), so I'm not sure if need this patch.

However I see other benefit of that patch. We save memory. Allocating
for MTU 9000 gives something like skb->data = kmalloc(9000 + 32 + 2
+ 334). So we take data from kmalloc-16384 cache, we waste about 7kB on
every allocation. With patch wastage would be about 2k per allocation
(assuming 4kB and 8kB page size)

However I started this thread thinking about other memory wastage,
in rtl8169_alloc_rx_skb, skb->data = kmalloc(16383 + 32 + 2 + 334), taken
from kmalloc-32768, almost 50% wastage.
 
> +struct sk_buff *alloc_skb_order0(int pkt_size)
> +{
> +	int head = min_t(int, pkt_size, SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN));
> +	struct sk_buff *skb;
> +
> +	skb = alloc_skb(head + NET_SKB_PAD + NET_IP_ALIGN,
> +			GFP_ATOMIC | __GFP_NOWARN);
> +	if (!skb)
> +		return NULL;
> +	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +	skb_put(skb, head);
> +	pkt_size -= head;
> +
> +	skb->len += pkt_size;
> +	skb->data_len += pkt_size;
> +	skb->truesize += pkt_size;
> +	while (pkt_size) {

if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS - 1)
	goto error;

> +		int i = skb_shinfo(skb)->nr_frags++;
> +		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> +		int fragsize = min_t(int, pkt_size, PAGE_SIZE);
> +		struct page *page = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> +
> +		if (!page)
> +			goto error;
> +		frag->page = page;
> +		frag->size = fragsize;
> +		frag->page_offset = 0;
> +		pkt_size -= fragsize;
> +	}
> +	return skb;
> +
> +error:
> +	kfree_skb(skb);
> +	return NULL;	
> +}
> +EXPORT_SYMBOL(alloc_skb_order0);
> +
>  /* Checksum skb data. */
>  
>  __wsum skb_checksum(const struct sk_buff *skb, int offset,
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet Oct. 11, 2010, 4:05 p.m. UTC | #2

Le lundi 11 octobre 2010 à 17:55 +0200, Stanislaw Gruszka a écrit :
> On Sat, Oct 09, 2010 at 05:59:56PM +0200, Eric Dumazet wrote:
> > Le vendredi 08 octobre 2010 à 18:03 +0200, Stanislaw Gruszka a écrit :
> > > On Fri, Oct 08, 2010 at 05:04:07PM +0200, Eric Dumazet wrote:
> > 
> > > > Switch to SLAB -> no more problem ;)
> > > 
> > > yeh, I wish to, but fedora use SLUB because of some debugging
> > > capabilities. 
> > 
> > Yes, of course, I was kidding :)
> > 
> > echo 0 >/sys/kernel/slab/kmalloc-2048/order
> > echo 0 >/sys/kernel/slab/kmalloc-1024/order
> > echo 0 >/sys/kernel/slab/kmalloc-512/order
> > 
> > Should do the trick : No more high order allocations for MTU=1500
> > frames.
> 
> So the SLUB is great, but we need a patch to avoid using it :-)
> 
> > For MTU=9000 frames, we probably need something like this patch :
> >
> > Reception of big frames hit a memory allocation problem, because of high
> > order pages allocations (order-3 sometimes for MTU=9000). This patch
> > introduces alloc_skb_order0(), to build skbs with order-0 pages only.
> 
> I had never seen allocation problems in rtl8169_try_rx_copy or in any
> other driver rx path (except iwlwifi, but now this is solved by using
> skb_add_rx_frag), so I'm not sure if need this patch.
> 
> However I see other benefit of that patch. We save memory. Allocating
> for MTU 9000 gives something like skb->data = kmalloc(9000 + 32 + 2
> + 334). So we take data from kmalloc-16384 cache, we waste about 7kB on
> every allocation. With patch wastage would be about 2k per allocation
> (assuming 4kB and 8kB page size)
> 
> However I started this thread thinking about other memory wastage,
> in rtl8169_alloc_rx_skb, skb->data = kmalloc(16383 + 32 + 2 + 334), taken
> from kmalloc-32768, almost 50% wastage.
>  

You cannot use my patch to avoid this waste. Really.

You have two different things in this driver :

1) Allocation of a physically continous 16Kbytes bloc for the rx-ring,
at device initialization (GFP_KERNEL OK here)

   Here, the only thing you could do is to not allocate real skbs but
only 16KB data blocs (no need for the sk_buf, only the ->data part), and
force copybreak for all incoming packets (remove the rx_copybreak
tunable)

2) Allocation of order0 skb to perform the copybreak in rx path.
(GFP_ATOMIC) : My patch.


> > +struct sk_buff *alloc_skb_order0(int pkt_size)
> > +{
> > +	int head = min_t(int, pkt_size, SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN));
> > +	struct sk_buff *skb;
> > +
> > +	skb = alloc_skb(head + NET_SKB_PAD + NET_IP_ALIGN,
> > +			GFP_ATOMIC | __GFP_NOWARN);
> > +	if (!skb)
> > +		return NULL;
> > +	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> > +	skb_put(skb, head);
> > +	pkt_size -= head;
> > +
> > +	skb->len += pkt_size;
> > +	skb->data_len += pkt_size;
> > +	skb->truesize += pkt_size;
> > +	while (pkt_size) {
> 
> if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS - 1)
> 	goto error;

Not needed. A frame is < 16383 bytes, so _must_ fit in an skb,
(skb can hold up to 64 Kbytes)

> 
> > +		int i = skb_shinfo(skb)->nr_frags++;
> > +		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> > +		int fragsize = min_t(int, pkt_size, PAGE_SIZE);
> > +		struct page *page = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> > +
> > +		if (!page)
> > +			goto error;
> > +		frag->page = page;
> > +		frag->size = fragsize;
> > +		frag->page_offset = 0;
> > +		pkt_size -= fragsize;
> > +	}
> > +	return skb;
> > +
> > +error:
> > +	kfree_skb(skb);
> > +	return NULL;	
> > +}
> > +EXPORT_SYMBOL(alloc_skb_order0);
> > +
> >  /* Checksum skb data. */
> >  
> >  __wsum skb_checksum(const struct sk_buff *skb, int offset,


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index fe3b762..f4220db 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4468,27 +4468,24 @@  static inline void rtl8169_rx_csum(struct sk_buff *skb, u32 opts1)
 		skb_checksum_none_assert(skb);
 }
 
-static inline bool rtl8169_try_rx_copy(struct sk_buff **sk_buff,
+static inline bool rtl8169_try_rx_copy(struct sk_buff **pskb,
 				       struct rtl8169_private *tp, int pkt_size,
 				       dma_addr_t addr)
 {
 	struct sk_buff *skb;
-	bool done = false;
 
 	if (pkt_size >= rx_copybreak)
-		goto out;
+		return false;
 
-	skb = netdev_alloc_skb_ip_align(tp->dev, pkt_size);
+	skb = alloc_skb_order0(pkt_size);
 	if (!skb)
-		goto out;
+		return false;
 
 	pci_dma_sync_single_for_cpu(tp->pci_dev, addr, pkt_size,
 				    PCI_DMA_FROMDEVICE);
-	skb_copy_from_linear_data(*sk_buff, skb->data, pkt_size);
-	*sk_buff = skb;
-	done = true;
-out:
-	return done;
+	skb_store_bits(skb, 0, (*pskb)->data, pkt_size);
+	*pskb = skb;
+	return true;
 }
 
 /*
@@ -4559,10 +4556,10 @@  static int rtl8169_rx_interrupt(struct net_device *dev,
 				pci_unmap_single(pdev, addr, tp->rx_buf_sz,
 						 PCI_DMA_FROMDEVICE);
 				tp->Rx_skbuff[entry] = NULL;
+				skb_put(skb, pkt_size);
 			}
 
 			rtl8169_rx_csum(skb, status);
-			skb_put(skb, pkt_size);
 			skb->protocol = eth_type_trans(skb, dev);
 
 			if (rtl8169_rx_vlan_skb(tp, desc, skb, polling) < 0) {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0b53c43..2cc161a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1841,6 +1841,7 @@  extern int	       skb_copy_bits(const struct sk_buff *skb, int offset,
 				     void *to, int len);
 extern int	       skb_store_bits(struct sk_buff *skb, int offset,
 				      const void *from, int len);
+extern struct sk_buff *alloc_skb_order0(int pkt_size);
 extern __wsum	       skb_copy_and_csum_bits(const struct sk_buff *skb,
 					      int offset, u8 *to, int len,
 					      __wsum csum);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 752c197..4a6195d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1664,6 +1664,53 @@  fault:
 }
 EXPORT_SYMBOL(skb_store_bits);
 
+/**
+ * alloc_skb_order0 - allocate skb with order-0 requirements
+ * @pkt_size: packet size
+ * 
+ * Allocate an skb with a head small enough that skb->data should not
+ * require high order page allocation, and complete with fragments if
+ * pkt_size is too big. Might be use in drivers RX path : We reserve
+ * NET_SKB_PAD + NET_IP_ALIGN bytes and use GFP_ATOMIC allocations.
+ * We also set skb->len to pkt_size, so driver should not call skb_put()
+ */
+struct sk_buff *alloc_skb_order0(int pkt_size)
+{
+	int head = min_t(int, pkt_size, SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN));
+	struct sk_buff *skb;
+
+	skb = alloc_skb(head + NET_SKB_PAD + NET_IP_ALIGN,
+			GFP_ATOMIC | __GFP_NOWARN);
+	if (!skb)
+		return NULL;
+	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+	skb_put(skb, head);
+	pkt_size -= head;
+
+	skb->len += pkt_size;
+	skb->data_len += pkt_size;
+	skb->truesize += pkt_size;
+	while (pkt_size) {
+		int i = skb_shinfo(skb)->nr_frags++;
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+		int fragsize = min_t(int, pkt_size, PAGE_SIZE);
+		struct page *page = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
+
+		if (!page)
+			goto error;
+		frag->page = page;
+		frag->size = fragsize;
+		frag->page_offset = 0;
+		pkt_size -= fragsize;
+	}
+	return skb;
+
+error:
+	kfree_skb(skb);
+	return NULL;	
+}
+EXPORT_SYMBOL(alloc_skb_order0);
+
 /* Checksum skb data. */
 
 __wsum skb_checksum(const struct sk_buff *skb, int offset,

net: introduce alloc_skb_order0

Commit Message

Comments

Patch