diff mbox

[v3,1/6] net: pad skb data and shinfo as a whole rather than individually

Message ID 1327494434-21566-1-git-send-email-ian.campbell@citrix.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Ian Campbell Jan. 25, 2012, 12:27 p.m. UTC
This reduces the minimum overhead required for this allocation such that the
shinfo can be grown in the following patch without overflowing 2048 bytes for a
1500 byte frame.

Reducing this overhead while also growing the shinfo means that sometimes the
tail end of the data can end up in the same cache line as the beginning of the
shinfo. Specifically in the case of the 64 byte cache lines on a 64 bit system
the first 8 bytes of shinfo can overlap the tail cacheline of the data. In many
cases the allocation slop means that there is no overlap.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/skbuff.h |    4 ++--
 net/core/skbuff.c      |    3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Jan. 25, 2012, 12:51 p.m. UTC | #1
Le mercredi 25 janvier 2012 à 12:27 +0000, Ian Campbell a écrit :
> This reduces the minimum overhead required for this allocation such that the
> shinfo can be grown in the following patch without overflowing 2048 bytes for a
> 1500 byte frame.
> 
> Reducing this overhead while also growing the shinfo means that sometimes the
> tail end of the data can end up in the same cache line as the beginning of the
> shinfo. Specifically in the case of the 64 byte cache lines on a 64 bit system
> the first 8 bytes of shinfo can overlap the tail cacheline of the data. In many
> cases the allocation slop means that there is no overlap.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> ---

Hmm... you missed build_skb() and all places we use
SKB_DATA_ALIGN(sizeof(struct skb_shared_info))

(for example in some drivers)

If you want to see possible performance impact of your changes, see
commit e52fcb2462ac (bnx2x: uses build_skb() in receive path), and
expect a drop from 820.000 pps to 720.000 pps



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Jan. 25, 2012, 1:09 p.m. UTC | #2
On Wed, 2012-01-25 at 12:51 +0000, Eric Dumazet wrote:
> Le mercredi 25 janvier 2012 à 12:27 +0000, Ian Campbell a écrit :
> > This reduces the minimum overhead required for this allocation such that the
> > shinfo can be grown in the following patch without overflowing 2048 bytes for a
> > 1500 byte frame.
> > 
> > Reducing this overhead while also growing the shinfo means that sometimes the
> > tail end of the data can end up in the same cache line as the beginning of the
> > shinfo. Specifically in the case of the 64 byte cache lines on a 64 bit system
> > the first 8 bytes of shinfo can overlap the tail cacheline of the data. In many
> > cases the allocation slop means that there is no overlap.
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> 
> Hmm... you missed build_skb()

Presumably build_skb should be using SKB_WITH_OVERHEAD instead of open
coding it? (which I think is how I managed to miss it)

>  and all places we use
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info))

> (for example in some drivers)

I'm not sure how I missed these with my grep. Obviously it is necessary
to catch them all in order to avoid the performance drop you mention
below. Fortunately there is fewer than a dozen instances in half a dozen
drivers to check.

> If you want to see possible performance impact of your changes, see
> commit e52fcb2462ac (bnx2x: uses build_skb() in receive path), and
> expect a drop from 820.000 pps to 720.000 pps

Can you elaborate on the specific benchmark you used there?

Thanks,
Ian

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 25, 2012, 1:12 p.m. UTC | #3
Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit :

> Can you elaborate on the specific benchmark you used there?

One machine was sending udp frames on my target (using pktgen)

Target was running a mono threaded udp receiver (one socket)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Jan. 31, 2012, 2:35 p.m. UTC | #4
Hi Eric,

On Wed, 2012-01-25 at 13:12 +0000, Eric Dumazet wrote:
> Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit :
> 
> > Can you elaborate on the specific benchmark you used there?
> 
> One machine was sending udp frames on my target (using pktgen)
> 
> Target was running a mono threaded udp receiver (one socket)

I've been playing with pktgen and I'm seeing more like 81,600-81,800 pps
from a UDP transmitter, measuring on the rx side using "bwm-ng -u
packets" and sinking the traffic with "nc -l -u -p 9 > /dev/null". The
numbers are the same with or without this series.

You mentioned numbers in the 820pps region -- is that really kilo-pps
(in which case I'm an order of magnitude down) or actually 820pps (in
which case I'm somehow a couple of orders of magnitude up).

I'm using a single NIC transmitter, no delay, 1000000 clones of each skb
and I've tried 60 and 1500 byte packets. In the 60 byte case I see more
like 50k pps

I'm in the process of setting up a receiver with a bnx2 but in the
meantime I feel like I'm making some obvious or fundamental flaw in my
method...

Any tips greatly appreciated.

Thanks,
Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 31, 2012, 2:45 p.m. UTC | #5
Le mardi 31 janvier 2012 à 14:35 +0000, Ian Campbell a écrit :
> Hi Eric,
> 
> On Wed, 2012-01-25 at 13:12 +0000, Eric Dumazet wrote:
> > Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit :
> > 
> > > Can you elaborate on the specific benchmark you used there?
> > 
> > One machine was sending udp frames on my target (using pktgen)
> > 
> > Target was running a mono threaded udp receiver (one socket)
> 
> I've been playing with pktgen and I'm seeing more like 81,600-81,800 pps
> from a UDP transmitter, measuring on the rx side using "bwm-ng -u
> packets" and sinking the traffic with "nc -l -u -p 9 > /dev/null". The
> numbers are the same with or without this series.
> 
> You mentioned numbers in the 820pps region -- is that really kilo-pps
> (in which case I'm an order of magnitude down) or actually 820pps (in
> which case I'm somehow a couple of orders of magnitude up).
> 
> I'm using a single NIC transmitter, no delay, 1000000 clones of each skb
> and I've tried 60 and 1500 byte packets. In the 60 byte case I see more
> like 50k pps
> 
> I'm in the process of setting up a receiver with a bnx2 but in the
> meantime I feel like I'm making some obvious or fundamental flaw in my
> method...
> 
> Any tips greatly appreciated.

I confirm I reach 820.000 packets per second, on a Gigabit link.

Sender can easily reach line rate (more than 1.000.000 packets per
second)

Check how many packet drops you have on receiver ?

ifconfig eth0

or "ethtool -S eth0"



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu Jan. 31, 2012, 2:54 p.m. UTC | #6
Ian Campbell <Ian.Campbell@citrix.com> :
[...]
> I'm using a single NIC transmitter, no delay, 1000000 clones of each skb
> and I've tried 60 and 1500 byte packets. In the 60 byte case I see more
> like 50k pps

At this stage 'tools/perf/perf top -G graph' sometimes helps me discover
debug options that I had left enabled.
Ian Campbell Feb. 1, 2012, 9:38 a.m. UTC | #7
On Tue, 2012-01-31 at 14:54 +0000, Francois Romieu wrote:
> Ian Campbell <Ian.Campbell@citrix.com> :
> [...]
> > I'm using a single NIC transmitter, no delay, 1000000 clones of each skb
> > and I've tried 60 and 1500 byte packets. In the 60 byte case I see more
> > like 50k pps
> 
> At this stage 'tools/perf/perf top -G graph' sometimes helps me discover
> debug options that I had left enabled.

Thanks for the tip, I'll give it a look.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Feb. 1, 2012, 9:39 a.m. UTC | #8
On Tue, 2012-01-31 at 14:45 +0000, Eric Dumazet wrote:
> Le mardi 31 janvier 2012 à 14:35 +0000, Ian Campbell a écrit :
> > Hi Eric,
> > 
> > On Wed, 2012-01-25 at 13:12 +0000, Eric Dumazet wrote:
> > > Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit :
> > > 
> > > > Can you elaborate on the specific benchmark you used there?
> > > 
> > > One machine was sending udp frames on my target (using pktgen)
> > > 
> > > Target was running a mono threaded udp receiver (one socket)
> > 
> > I've been playing with pktgen and I'm seeing more like 81,600-81,800 pps
> > from a UDP transmitter, measuring on the rx side using "bwm-ng -u
> > packets" and sinking the traffic with "nc -l -u -p 9 > /dev/null". The
> > numbers are the same with or without this series.
> > 
> > You mentioned numbers in the 820pps region -- is that really kilo-pps
> > (in which case I'm an order of magnitude down) or actually 820pps (in
> > which case I'm somehow a couple of orders of magnitude up).
> > 
> > I'm using a single NIC transmitter, no delay, 1000000 clones of each skb
> > and I've tried 60 and 1500 byte packets. In the 60 byte case I see more
> > like 50k pps
> > 
> > I'm in the process of setting up a receiver with a bnx2 but in the
> > meantime I feel like I'm making some obvious or fundamental flaw in my
> > method...
> > 
> > Any tips greatly appreciated.
> 
> I confirm I reach 820.000 packets per second, on a Gigabit link.

Heh, this is where $LOCALE steps up and confuses things again (commas vs
periods for thousands separators) ;-). However I know what you mean.

> Sender can easily reach line rate (more than 1.000.000 packets per
> second)

Right, this is where I seem to have failed -- 10,000,000 took more like
2 minutes to send than the expected 10s...

> Check how many packet drops you have on receiver ?
> 
> ifconfig eth0
> 
> or "ethtool -S eth0"

After a full 10,000,000 packet run of pktgen I see a difference in
rx_packets of +10,001,561 (there is some other traffic). Various
rx_*_error do not increase and neither does rx_no_buffer_count or
rx_missed_errors (although both of the latter two are non-zero to start
with). ifconfig tells the same story.

I guess this isn't surprising given the send rate since it's not really
stressing the receiver all that much.

I'll investigate the sending side. The sender is running a 2.6.32 distro
kernel. Maybe I need to tweak it somewhat.

Thanks,

Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 1, 2012, 10:02 a.m. UTC | #9
Le mercredi 01 février 2012 à 09:39 +0000, Ian Campbell a écrit :

> I'll investigate the sending side. The sender is running a 2.6.32 distro
> kernel. Maybe I need to tweak it somewhat.

Thats strange, even a very old kernel is able to sustain Gigabit line
rate with pktgen. Maybe a problem with a switch and ethernet pauses ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Feb. 1, 2012, 10:09 a.m. UTC | #10
On Wed, 2012-02-01 at 10:02 +0000, Eric Dumazet wrote:
> Le mercredi 01 février 2012 à 09:39 +0000, Ian Campbell a écrit :
> 
> > I'll investigate the sending side. The sender is running a 2.6.32 distro
> > kernel. Maybe I need to tweak it somewhat.
> 
> Thats strange, even a very old kernel is able to sustain Gigabit line
> rate with pktgen. Maybe a problem with a switch and ethernet pauses ?

Tanks, that was something I was going to investigate. These machines are
in our lab so I need to figure out if they are even directly attached to
the same switch etc.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 50db9b0..f04333b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -41,7 +41,7 @@ 
 #define SKB_DATA_ALIGN(X)	(((X) + (SMP_CACHE_BYTES - 1)) & \
 				 ~(SMP_CACHE_BYTES - 1))
 #define SKB_WITH_OVERHEAD(X)	\
-	((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+	((X) - sizeof(struct skb_shared_info))
 #define SKB_MAX_ORDER(X, ORDER) \
 	SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X))
 #define SKB_MAX_HEAD(X)		(SKB_MAX_ORDER((X), 0))
@@ -50,7 +50,7 @@ 
 /* return minimum truesize of one skb containing X bytes of data */
 #define SKB_TRUESIZE(X) ((X) +						\
 			 SKB_DATA_ALIGN(sizeof(struct sk_buff)) +	\
-			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+			 sizeof(struct skb_shared_info))
 
 /* A. Checksumming of received packets by device.
  *
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index da0c97f..b6de604 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -189,8 +189,7 @@  struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info));
 	data = kmalloc_node_track_caller(size, gfp_mask, node);
 	if (!data)
 		goto nodata;