diff mbox

[net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell

Message ID 1411654669.16953.17.camel@edumazet-glaptop2.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 25, 2014, 2:17 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

skb->xmit_more tells us if another skb is coming next.

We need to send doorbell when : xmit_more is not set,
or txqueue is stopped (preventing next skb to come immediately)

Tested with a modified pktgen version, I got a 40% increase of
throughput.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |   23 +++++++++++++------
 1 file changed, 16 insertions(+), 7 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jesper Dangaard Brouer Sept. 25, 2014, 2:43 p.m. UTC | #1
On Thu, 25 Sep 2014 07:17:49 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> Tested with a modified pktgen version, I got a 40% increase of
> throughput.

Pktgen is artificial benchmarking, you know ;-) :-P

If you really must use pktgen for this, you can use an unmodified
pktgen against the qdisc layer by adding a VLAN interface on-top of the
device you want to test.  I blame John, for telling me this ;-)

/me running away
Eric Dumazet Sept. 25, 2014, 3:06 p.m. UTC | #2
On Thu, 2014-09-25 at 16:43 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 25 Sep 2014 07:17:49 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Tested with a modified pktgen version, I got a 40% increase of
> > throughput.
> 
> Pktgen is artificial benchmarking, you know ;-) :-P
> 
> If you really must use pktgen for this, you can use an unmodified
> pktgen against the qdisc layer by adding a VLAN interface on-top of the
> device you want to test.  I blame John, for telling me this ;-)
> 
> /me running away


But... Nothing yet in qdisc layer sets xmit_more, unless TSO is
disabled, and a GSO packet is segmented.

So you might notice some improvement with this patch, but not if you use
TSO, and a 40GB NIC is better with TSO on you know ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 25, 2014, 3:36 p.m. UTC | #3
On Thu, 2014-09-25 at 08:06 -0700, Eric Dumazet wrote:

> 
> But... Nothing yet in qdisc layer sets xmit_more, unless TSO is
> disabled, and a GSO packet is segmented.
> 
> So you might notice some improvement with this patch, but not if you use
> TSO, and a 40GB NIC is better with TSO on you know ;)
> 

So playing with TSO=off (but GSO=on), we see about 12 % increase of
throughput on a TCP_STREAM flow.

And kernel profile clearly shows the difference of sending or not the
doorbell, as mlx4_en_xmit() shifts. Note also mlx4_en_free_tx_desc()
disappears, this might point to some false sharing or something worth
investigating.

Before patch :

    26.36%  [kernel]  [k] __copy_skb_header             
    16.59%  [kernel]  [k] mlx4_en_xmit                  
     5.62%  [kernel]  [k] __alloc_skb                   
     5.48%  [kernel]  [k] copy_user_enhanced_fast_string
     4.48%  [kernel]  [k] skb_segment                   
     2.46%  [kernel]  [k] mlx4_en_free_tx_desc.isra.27  
     2.31%  [kernel]  [k] _raw_spin_lock                
     2.10%  [kernel]  [k] memcpy                        
     2.01%  [kernel]  [k] tcp_sendmsg                   
     1.62%  [kernel]  [k] __iowrite64_copy              


After patch :

    32.78%  [kernel]  [k] __copy_skb_header             
     8.26%  [kernel]  [k] mlx4_en_xmit                  
     7.25%  [kernel]  [k] __alloc_skb                   
     7.18%  [kernel]  [k] copy_user_enhanced_fast_string
     4.39%  [kernel]  [k] skb_segment                   
     2.87%  [kernel]  [k] memcpy                        
     2.59%  [kernel]  [k] tcp_sendmsg                   
     2.50%  [kernel]  [k] _raw_spin_lock                
     2.38%  [kernel]  [k] dev_hard_start_xmit           
     1.52%  [kernel]  [k] tcp_gso_segment               
     1.50%  [kernel]  [k] kmem_cache_alloc_node_trace   
     1.40%  [kernel]  [k] kmem_cache_alloc_node         
     1.16%  [kernel]  [k] ip_send_check                 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 25, 2014, 3:40 p.m. UTC | #4
On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote:

> So playing with TSO=off (but GSO=on), we see about 12 % increase of
> throughput on a TCP_STREAM flow.

Correction, this was a 20 % increase.

( 849284 pps -> 1023469 pps)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Sept. 25, 2014, 3:53 p.m. UTC | #5
On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> skb->xmit_more tells us if another skb is coming next.
>
> We need to send doorbell when : xmit_more is not set,
> or txqueue is stopped (preventing next skb to come immediately)
>
> Tested with a modified pktgen version, I got a 40% increase of
> throughput.

this is awesome!

I've been hacking pktgen as well based on Jesper's earlier patch,
but in slightly different way. Sounds like you already have working
pktgen with xmit_more. Do you mind sharing it? even rough patch
would be great.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 25, 2014, 4:05 p.m. UTC | #6
On Thu, 2014-09-25 at 08:53 -0700, Alexei Starovoitov wrote:
> On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > skb->xmit_more tells us if another skb is coming next.
> >
> > We need to send doorbell when : xmit_more is not set,
> > or txqueue is stopped (preventing next skb to come immediately)
> >
> > Tested with a modified pktgen version, I got a 40% increase of
> > throughput.
> 
> this is awesome!
> 
> I've been hacking pktgen as well based on Jesper's earlier patch,
> but in slightly different way. Sounds like you already have working
> pktgen with xmit_more. Do you mind sharing it? even rough patch
> would be great.

The plan was to add a 'pburst x' parameter.

Because not only we want to avoid sending the doorbell, but we could
also not doing the spinlock for every start_xmit().

We also can enforce a global [p]rate (say 100000 packets per second),
but still allow pktgen to send burts of x packets to use (and test) this
xmit_more

Nothing ready yet ;)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Sept. 25, 2014, 4:08 p.m. UTC | #7
On Thu, Sep 25, 2014 at 9:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-09-25 at 08:53 -0700, Alexei Starovoitov wrote:
>> On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > From: Eric Dumazet <edumazet@google.com>
>> >
>> > skb->xmit_more tells us if another skb is coming next.
>> >
>> > We need to send doorbell when : xmit_more is not set,
>> > or txqueue is stopped (preventing next skb to come immediately)
>> >
>> > Tested with a modified pktgen version, I got a 40% increase of
>> > throughput.
>>
>> this is awesome!
>>
>> I've been hacking pktgen as well based on Jesper's earlier patch,
>> but in slightly different way. Sounds like you already have working
>> pktgen with xmit_more. Do you mind sharing it? even rough patch
>> would be great.
>
> The plan was to add a 'pburst x' parameter.
>
> Because not only we want to avoid sending the doorbell, but we could
> also not doing the spinlock for every start_xmit().

exactly!
I also hacked
atomic_inc(&(pkt_dev->skb->users))
into
single atomic_add of N packets for bursting to amortize the cost
of both spin_lock and lock xadd

> We also can enforce a global [p]rate (say 100000 packets per second),
> but still allow pktgen to send burts of x packets to use (and test) this
> xmit_more

that would be nice. I'm not sure how yet.

> Nothing ready yet ;)

can't wait :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 25, 2014, 4:19 p.m. UTC | #8
On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote:


>     26.36%  [kernel]  [k] __copy_skb_header     

Note to myself :

Time to optimize this thing, doing a memset() to copy a bunch of
consecutive fields.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 25, 2014, 8:47 p.m. UTC | #9
On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote:

> After patch :
> 
>     32.78%  [kernel]  [k] __copy_skb_header             
>      8.26%  [kernel]  [k] mlx4_en_xmit                  
>      7.25%  [kernel]  [k] __alloc_skb                   
>      7.18%  [kernel]  [k] copy_user_enhanced_fast_string
>      4.39%  [kernel]  [k] skb_segment                   
>      2.87%  [kernel]  [k] memcpy                        
>      2.59%  [kernel]  [k] tcp_sendmsg                   
>      2.50%  [kernel]  [k] _raw_spin_lock                
>      2.38%  [kernel]  [k] dev_hard_start_xmit           
>      1.52%  [kernel]  [k] tcp_gso_segment               
>      1.50%  [kernel]  [k] kmem_cache_alloc_node_trace   
>      1.40%  [kernel]  [k] kmem_cache_alloc_node         
>      1.16%  [kernel]  [k] ip_send_check                 
> 

After __copy_skb_header() optimization I get something even nicer ;)

    19.90%  [kernel]          [k] __copy_skb_header             
    15.16%  [kernel]          [k] skb_segment                   
     7.02%  [kernel]          [k] __alloc_skb                   
     6.89%  [kernel]          [k] copy_user_enhanced_fast_string
     6.60%  [kernel]          [k] mlx4_en_xmit                  
     2.70%  [kernel]          [k] _raw_spin_lock                
     2.68%  [kernel]          [k] memcpy                        
     2.41%  [kernel]          [k] tcp_sendmsg                   

I'll post this patch asap.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joe Perches Sept. 25, 2014, 8:58 p.m. UTC | #10
On Thu, 2014-09-25 at 13:47 -0700, Eric Dumazet wrote:
> After patch :
>     32.78%  [kernel]  [k] __copy_skb_header             
> After __copy_skb_header() optimization I get something even nicer ;)
>     19.90%  [kernel]          [k] __copy_skb_header             
> I'll post this patch asap.

Eric, you are _killing_ it.  network ninja indeed...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 28, 2014, 9:27 p.m. UTC | #11
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 25 Sep 2014 07:17:49 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> skb->xmit_more tells us if another skb is coming next.
> 
> We need to send doorbell when : xmit_more is not set,
> or txqueue is stopped (preventing next skb to come immediately)
> 
> Tested with a modified pktgen version, I got a 40% increase of
> throughput.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index c44f4237b9be..adedc47e947d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -667,6 +667,7 @@  netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	int lso_header_size;
 	void *fragptr;
 	bool bounce = false;
+	bool send_doorbell;
 
 	if (!priv->port_up)
 		goto tx_drop;
@@ -878,12 +879,16 @@  netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb_tx_timestamp(skb);
 
-	if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tx_tag_present(skb)) {
+	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
+
+	if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
+	    !vlan_tx_tag_present(skb) && send_doorbell) {
 		tx_desc->ctrl.bf_qpn |= cpu_to_be32(ring->doorbell_qpn);
 
 		op_own |= htonl((bf_index & 0xffff) << 8);
-		/* Ensure new descirptor hits memory
-		* before setting ownership of this descriptor to HW */
+		/* Ensure new descriptor hits memory
+		 * before setting ownership of this descriptor to HW
+		 */
 		wmb();
 		tx_desc->ctrl.owner_opcode = op_own;
 
@@ -896,12 +901,16 @@  netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		ring->bf.offset ^= ring->bf.buf_size;
 	} else {
-		/* Ensure new descirptor hits memory
-		* before setting ownership of this descriptor to HW */
+		/* Ensure new descriptor hits memory
+		 * before setting ownership of this descriptor to HW
+		 */
 		wmb();
 		tx_desc->ctrl.owner_opcode = op_own;
-		wmb();
-		iowrite32be(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
+		if (send_doorbell) {
+			wmb();
+			iowrite32be(ring->doorbell_qpn,
+				    ring->bf.uar->map + MLX4_SEND_DOORBELL);
+		}
 	}
 
 	return NETDEV_TX_OK;