diff mbox

pktgen: Avoid dirtying skb->users when txq is full

Message ID 4AC3E3C5.1090108@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 30, 2009, 11:03 p.m. UTC
Stephen Hemminger a écrit :
> On Tue, 22 Sep 2009 22:49:02 -0700
> Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
>> I thought others want to know how to get maximum speed of pktgen.
>>
>> 1. Run nothing else (even X11), just a command line
>> 2. Make sure ethernet flow control is disabled
>>    ethtool -A eth0 autoneg off rx off tx off
>> 3. Make sure clocksource is TSC.  On my old SMP Opteron's
>>    needed to get patch since in 2.6.30 or later, the clock guru's
>>    decided to remove it on all non Intel machines.  Look for patch
>>    than enables "tsc=reliable"
>> 4. Compile Ethernet drivers in, the overhead of the indirect
>>    function call required for modules (or cache footprint),
>>    slows things down.
>> 5. Increase transmit ring size to 1000
>>    ethtool -G eth0 tx 1000
>>

Thanks a lot Stephen.

I did some pktgen session tonight and found one contention on skb->users field
that following patch avoids.


Before patch :
------------------------------------------------------------------------------
   PerfTop:    5187 irqs/sec  kernel:100.0% [100000 cycles],  (all, cpu: 0)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            16688.00 - 50.9% : consume_skb
             6541.00 - 20.0% : skb_dma_unmap
             3277.00 - 10.0% : tg3_poll
             1968.00 -  6.0% : mwait_idle
              651.00 -  2.0% : irq_entries_start
              466.00 -  1.4% : _spin_lock
              442.00 -  1.3% : mix_pool_bytes_extract
              373.00 -  1.1% : tg3_msi
              353.00 -  1.1% : read_tsc
              177.00 -  0.5% : sched_clock_local
              176.00 -  0.5% : sched_clock
              137.00 -  0.4% : tick_nohz_stop_sched_tick

After patch:
------------------------------------------------------------------------------
   PerfTop:    3530 irqs/sec  kernel:99.9% [100000 cycles],  (all, cpu: 0)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            17127.00 - 34.0% : tg3_poll
            12691.00 - 25.2% : consume_skb
             5299.00 - 10.5% : skb_dma_unmap
             4179.00 -  8.3% : mwait_idle
             1583.00 -  3.1% : irq_entries_start
             1288.00 -  2.6% : mix_pool_bytes_extract
             1239.00 -  2.5% : tg3_msi
             1062.00 -  2.1% : read_tsc
              583.00 -  1.2% : _spin_lock
              432.00 -  0.9% : sched_clock
              416.00 -  0.8% : sched_clock_local
              360.00 -  0.7% : tick_nohz_stop_sched_tick
              329.00 -  0.7% : ktime_get
              263.00 -  0.5% : _spin_lock_irqsave

I believe we could go further, batching the atomic_inc(&skb->users) we do all the
time, competing with the atomic_dec() done by tx completion handler (possibly run
on other cpu): Reserve XXXXXXX units to the skb->users, and decrement a pktgen
local variable and refill the reserve if necessary, once in a while...


[PATCH] pktgen: Avoid dirtying skb->users when txq is full

We can avoid two atomic ops on skb->users if packet is not going to be
sent to the device (because hardware txqueue is full)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

stephen hemminger Oct. 1, 2009, 12:25 a.m. UTC | #1
On Thu, 01 Oct 2009 01:03:33 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > On Tue, 22 Sep 2009 22:49:02 -0700
> > Stephen Hemminger <shemminger@vyatta.com> wrote:
> > 
> >> I thought others want to know how to get maximum speed of pktgen.
> >>
> >> 1. Run nothing else (even X11), just a command line
> >> 2. Make sure ethernet flow control is disabled
> >>    ethtool -A eth0 autoneg off rx off tx off
> >> 3. Make sure clocksource is TSC.  On my old SMP Opteron's
> >>    needed to get patch since in 2.6.30 or later, the clock guru's
> >>    decided to remove it on all non Intel machines.  Look for patch
> >>    than enables "tsc=reliable"
> >> 4. Compile Ethernet drivers in, the overhead of the indirect
> >>    function call required for modules (or cache footprint),
> >>    slows things down.
> >> 5. Increase transmit ring size to 1000
> >>    ethtool -G eth0 tx 1000
> >>
> 
> Thanks a lot Stephen.
> 
> I did some pktgen session tonight and found one contention on skb->users field
> that following patch avoids.
> 
> 
> Before patch :
> ------------------------------------------------------------------------------
>    PerfTop:    5187 irqs/sec  kernel:100.0% [100000 cycles],  (all, cpu: 0)
> ------------------------------------------------------------------------------
> 
>              samples    pcnt   kernel function
>              _______   _____   _______________
> 
>             16688.00 - 50.9% : consume_skb
>              6541.00 - 20.0% : skb_dma_unmap
>              3277.00 - 10.0% : tg3_poll
>              1968.00 -  6.0% : mwait_idle
>               651.00 -  2.0% : irq_entries_start
>               466.00 -  1.4% : _spin_lock
>               442.00 -  1.3% : mix_pool_bytes_extract
>               373.00 -  1.1% : tg3_msi
>               353.00 -  1.1% : read_tsc
>               177.00 -  0.5% : sched_clock_local
>               176.00 -  0.5% : sched_clock
>               137.00 -  0.4% : tick_nohz_stop_sched_tick
> 
> After patch:
> ------------------------------------------------------------------------------
>    PerfTop:    3530 irqs/sec  kernel:99.9% [100000 cycles],  (all, cpu: 0)
> ------------------------------------------------------------------------------
> 
>              samples    pcnt   kernel function
>              _______   _____   _______________
> 
>             17127.00 - 34.0% : tg3_poll
>             12691.00 - 25.2% : consume_skb
>              5299.00 - 10.5% : skb_dma_unmap
>              4179.00 -  8.3% : mwait_idle
>              1583.00 -  3.1% : irq_entries_start
>              1288.00 -  2.6% : mix_pool_bytes_extract
>              1239.00 -  2.5% : tg3_msi
>              1062.00 -  2.1% : read_tsc
>               583.00 -  1.2% : _spin_lock
>               432.00 -  0.9% : sched_clock
>               416.00 -  0.8% : sched_clock_local
>               360.00 -  0.7% : tick_nohz_stop_sched_tick
>               329.00 -  0.7% : ktime_get
>               263.00 -  0.5% : _spin_lock_irqsave
> 
> I believe we could go further, batching the atomic_inc(&skb->users) we do all the
> time, competing with the atomic_dec() done by tx completion handler (possibly run
> on other cpu): Reserve XXXXXXX units to the skb->users, and decrement a pktgen
> local variable and refill the reserve if necessary, once in a while...

When buffer is allocated we know the number of times it will be cloned. So why
not set it there, would need to cleanup on interrupt but that should be possible.

Alternatively, just change skb->destructor on last packet and use a proper
completion mechanism.

> [PATCH] pktgen: Avoid dirtying skb->users when txq is full
> 
> We can avoid two atomic ops on skb->users if packet is not going to be
> sent to the device (because hardware txqueue is full)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> 
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 4d11c28..6a9ab28 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -3439,12 +3439,14 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>  	txq = netdev_get_tx_queue(odev, queue_map);
>  
>  	__netif_tx_lock_bh(txq);
> -	atomic_inc(&(pkt_dev->skb->users));
>  
> -	if (unlikely(netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq)))
> +	if (unlikely(netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq))) {
>  		ret = NETDEV_TX_BUSY;
> -	else
> -		ret = (*xmit)(pkt_dev->skb, odev);
> +		pkt_dev->last_ok = 0;
> +		goto unlock;
> +	}
> +	atomic_inc(&(pkt_dev->skb->users));
> +	ret = (*xmit)(pkt_dev->skb, odev);
>  
>  	switch (ret) {
>  	case NETDEV_TX_OK:
> @@ -3466,6 +3468,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>  		atomic_dec(&(pkt_dev->skb->users));
>  		pkt_dev->last_ok = 0;
>  	}
> +unlock:
>  	__netif_tx_unlock_bh(txq);
>  
>  	/* If pkt_dev->count is zero, then run forever */

Acked-by: Stephen Hemminger <shemminger@vyatta.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 4d11c28..6a9ab28 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3439,12 +3439,14 @@  static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 	txq = netdev_get_tx_queue(odev, queue_map);
 
 	__netif_tx_lock_bh(txq);
-	atomic_inc(&(pkt_dev->skb->users));
 
-	if (unlikely(netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq)))
+	if (unlikely(netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq))) {
 		ret = NETDEV_TX_BUSY;
-	else
-		ret = (*xmit)(pkt_dev->skb, odev);
+		pkt_dev->last_ok = 0;
+		goto unlock;
+	}
+	atomic_inc(&(pkt_dev->skb->users));
+	ret = (*xmit)(pkt_dev->skb, odev);
 
 	switch (ret) {
 	case NETDEV_TX_OK:
@@ -3466,6 +3468,7 @@  static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		atomic_dec(&(pkt_dev->skb->users));
 		pkt_dev->last_ok = 0;
 	}
+unlock:
 	__netif_tx_unlock_bh(txq);
 
 	/* If pkt_dev->count is zero, then run forever */