diff mbox

[net-next.git,3/8,(V2)] stmmac: add the initial tx coalesce schema

Message ID 1347346514-23411-4-git-send-email-peppe.cavallaro@st.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Giuseppe CAVALLARO Sept. 11, 2012, 6:55 a.m. UTC
This patch adds a new schema used for mitigating the
number of transmit interrupts.
It is based on a sw timer and a threshold value.
The timer is used to periodically call the stmmac_tx
function that can be invoked by the ISR but only for
the descriptors where the interrupt on completion
field has been set. This is tuned by a threshold.

Next step is to add the ability to tune these coalesce
values by ethtool.

Till now I have put a default that showed a real gain
on all the platforms ARM/SH4 where I performed benchmarks.

V2: review the logic to manage the IC bit in the TDESC
that was bugged because it didn't take care about the
fragments. Also fix the tx_count_frames that has not to be
limited to TX DMA ring. Thanks to Ben Hutchings.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       |    8 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    4 +
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |    9 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   92 ++++++++++++--------
 4 files changed, 69 insertions(+), 44 deletions(-)

Comments

David Miller Sept. 13, 2012, 8:23 p.m. UTC | #1
From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 11 Sep 2012 08:55:09 +0200

> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&priv->tx_lock, flags);
>  
> -	spin_lock(&priv->tx_lock);
> +	priv->xstats.tx_clean++;

You are changing the locking here for the sake of the new timer.

But timers run in software interrupt context, so this change is
completely unnecessary since NAPI runs in software interrupt context
as well, and neither timers nor NAPI run in hardware interrupts
context.

Therefore, disabling hardware interrupts for this lock is unnecessary
and will decrease performance.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Sept. 13, 2012, 8:42 p.m. UTC | #2
On Thu, 2012-09-13 at 16:23 -0400, David Miller wrote:
> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
> Date: Tue, 11 Sep 2012 08:55:09 +0200
> 
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&priv->tx_lock, flags);
> >  
> > -	spin_lock(&priv->tx_lock);
> > +	priv->xstats.tx_clean++;
> 
> You are changing the locking here for the sake of the new timer.
> 
> But timers run in software interrupt context, so this change is
> completely unnecessary since NAPI runs in software interrupt context
> as well, and neither timers nor NAPI run in hardware interrupts
> context.

NAPI pollers can be called by netpoll in hardware interrupt context, and
this driver supports netpoll.

Ben.

> Therefore, disabling hardware interrupts for this lock is unnecessary
> and will decrease performance.
David Miller Sept. 13, 2012, 8:46 p.m. UTC | #3
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 13 Sep 2012 21:42:51 +0100

> On Thu, 2012-09-13 at 16:23 -0400, David Miller wrote:
>> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
>> Date: Tue, 11 Sep 2012 08:55:09 +0200
>> 
>> > +	unsigned long flags;
>> > +
>> > +	spin_lock_irqsave(&priv->tx_lock, flags);
>> >  
>> > -	spin_lock(&priv->tx_lock);
>> > +	priv->xstats.tx_clean++;
>> 
>> You are changing the locking here for the sake of the new timer.
>> 
>> But timers run in software interrupt context, so this change is
>> completely unnecessary since NAPI runs in software interrupt context
>> as well, and neither timers nor NAPI run in hardware interrupts
>> context.
> 
> NAPI pollers can be called by netpoll in hardware interrupt context, and
> this driver supports netpoll.

And the netpoll handler need to make amends to make sure that
hardware the environment expected by the software interrupt
code is preserved.

Well written NAPI drivers never need to disable hardware interrupts
in their ->poll() method and it's callers, neither should you.

I'm not considering your patches until you implement this properly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Sept. 13, 2012, 9:10 p.m. UTC | #4
On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Thu, 13 Sep 2012 21:42:51 +0100
> 
> > On Thu, 2012-09-13 at 16:23 -0400, David Miller wrote:
> >> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
> >> Date: Tue, 11 Sep 2012 08:55:09 +0200
> >> 
> >> > +	unsigned long flags;
> >> > +
> >> > +	spin_lock_irqsave(&priv->tx_lock, flags);
> >> >  
> >> > -	spin_lock(&priv->tx_lock);
> >> > +	priv->xstats.tx_clean++;
> >> 
> >> You are changing the locking here for the sake of the new timer.
> >> 
> >> But timers run in software interrupt context, so this change is
> >> completely unnecessary since NAPI runs in software interrupt context
> >> as well, and neither timers nor NAPI run in hardware interrupts
> >> context.
> > 
> > NAPI pollers can be called by netpoll in hardware interrupt context, and
> > this driver supports netpoll.
> 
> And the netpoll handler need to make amends to make sure that
> hardware the environment expected by the software interrupt
> code is preserved.
>
> Well written NAPI drivers never need to disable hardware interrupts
> in their ->poll() method and it's callers, neither should you.

Perhaps you should get round to reviewing netpoll, because it does
exactly this.

> I'm not considering your patches until you implement this properly.

These aren't my patches.

Ben.
David Miller Sept. 13, 2012, 9:37 p.m. UTC | #5
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 13 Sep 2012 22:10:50 +0100

> On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote:
>> From: Ben Hutchings <bhutchings@solarflare.com>
>> Date: Thu, 13 Sep 2012 21:42:51 +0100
>> 
>> Well written NAPI drivers never need to disable hardware interrupts
>> in their ->poll() method and it's callers, neither should you.
> 
> Perhaps you should get round to reviewing netpoll, because it does
> exactly this.

Then I don't understand the point you're trying to make.

Hardware interrupt disabling has absolutely no place in the
NAPI polling fast paths.

If NAPI drivers can't be implemented without hardware interrupt
toggling in ->poll(), we've failed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Sept. 13, 2012, 11:11 p.m. UTC | #6
On Thu, 2012-09-13 at 17:37 -0400, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Thu, 13 Sep 2012 22:10:50 +0100
> 
> > On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote:
> >> From: Ben Hutchings <bhutchings@solarflare.com>
> >> Date: Thu, 13 Sep 2012 21:42:51 +0100
> >> 
> >> Well written NAPI drivers never need to disable hardware interrupts
> >> in their ->poll() method and it's callers, neither should you.
> > 
> > Perhaps you should get round to reviewing netpoll, because it does
> > exactly this.
> 
> Then I don't understand the point you're trying to make.
> 
> Hardware interrupt disabling has absolutely no place in the
> NAPI polling fast paths.
> 
> If NAPI drivers can't be implemented without hardware interrupt
> toggling in ->poll(), we've failed.

Right.

The problem being that NAPI poll functions *are* sometimes called in
hardware interrupt context.  Thus, any spinlock that may be taken by a
NAPI handler, may well need to be taken with spinlock_irq or
spinlock_irqsave elsewhere.  (This is horrible and I think it's well
past time that we ripped the NAPI polling out of netpoll.)

I think you're right that stmmac_tx() (completion handler?) doesn't need
to disable hardware interrupts, but sadly stmmac_xmit() does right now
unless Giuseppe can work out how to make their interaction lockless.

Ben.
Giuseppe CAVALLARO Sept. 14, 2012, 7:36 a.m. UTC | #7
On 9/13/2012 10:23 PM, David Miller wrote:
> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
> Date: Tue, 11 Sep 2012 08:55:09 +0200
>
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&priv->tx_lock, flags);
>>
>> -	spin_lock(&priv->tx_lock);
>> +	priv->xstats.tx_clean++;
>
> You are changing the locking here for the sake of the new timer.
>
> But timers run in software interrupt context, so this change is
> completely unnecessary since NAPI runs in software interrupt context
> as well, and neither timers nor NAPI run in hardware interrupts
> context.

Indeed It can be called by the ISR too in this new implementation.
I have added the spin_lock_irqsave/restore otherwise, testing with 
CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP.

[    8.030000]
[    8.030000] =================================
[    8.030000] [ INFO: inconsistent lock state ]
[    8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted
[    8.030000] ---------------------------------
[    8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[    8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
[    8.030000]  (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>] 
stmmac_tx+0x1c/0x388
[    8.030000] {HARDIRQ-ON-W} state was registered at:
[    8.030000]   [<800562b4>] __lock_acquire+0x638/0x179c
[    8.030000]   [<80057884>] lock_acquire+0x60/0x74
[    8.030000]   [<80428a08>] _raw_spin_lock+0x40/0x50
[    8.030000]   [<802651d8>] stmmac_tx+0x1c/0x388
[    8.030000]   [<80026be0>] run_timer_softirq+0x180/0x23c
[    8.030000]   [<80020ccc>] __do_softirq+0xa0/0x114
[    8.030000]   [<80021204>] irq_exit+0x58/0x7c
[    8.030000]   [<8000dc80>] handle_IRQ+0x7c/0xb8
[    8.030000]   [<80008464>] gic_handle_irq+0x34/0x58
[    8.030000]   [<80429684>] __irq_svc+0x44/0x78
[    8.030000]   [<8001c3f4>] vprintk+0x41c/0x480
[    8.030000]   [<8042097c>] printk+0x18/0x24
[    8.030000]   [<805aef6c>] prepare_namespace+0x1c/0x1a4
[    8.030000]   [<805ae980>] kernel_init+0x1c8/0x20c
[    8.030000]   [<8000deb8>] kernel_thread_exit+0x0/0x8
[    8.030000] irq event stamp: 254745
[    8.030000] hardirqs last  enabled at (254744): [<80429240>] 
_raw_spin_unlock_irqrestore+0x3c/0x6c
[    8.030000] hardirqs last disabled at (254745): [<80429674>] 
__irq_svc+0x34/0x78
[    8.030000] softirqs last  enabled at (254741): [<8035d964>] 
dev_queue_xmit+0x6a4/0x724
[    8.030000] softirqs last disabled at (254737): [<8035d2d4>] 
dev_queue_xmit+0x14/0x724
[    8.030000]
[    8.030000] other info that might help us debug this:
[    8.030000]  Possible unsafe locking scenario:
[    8.030000]
[    8.030000]        CPU0
[    8.030000]        ----
[    8.030000]   lock(&(&priv->tx_lock)->rlock);
[    8.030000]   <Interrupt>
[    8.030000]     lock(&(&priv->tx_lock)->rlock);
[    8.030000]
[    8.030000]  *** DEADLOCK ***

> Therefore, disabling hardware interrupts for this lock is unnecessary
> and will decrease performance.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Giuseppe CAVALLARO Sept. 18, 2012, 5:41 a.m. UTC | #8
Hello David, Ben,

On 9/14/2012 9:36 AM, Giuseppe CAVALLARO wrote:
> On 9/13/2012 10:23 PM, David Miller wrote:
>> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
>> Date: Tue, 11 Sep 2012 08:55:09 +0200
>>
>>> +    unsigned long flags;
>>> +
>>> +    spin_lock_irqsave(&priv->tx_lock, flags);
>>>
>>> -    spin_lock(&priv->tx_lock);
>>> +    priv->xstats.tx_clean++;
>>
>> You are changing the locking here for the sake of the new timer.
>>
>> But timers run in software interrupt context, so this change is
>> completely unnecessary since NAPI runs in software interrupt context
>> as well, and neither timers nor NAPI run in hardware interrupts
>> context.
>
> Indeed It can be called by the ISR too in this new implementation.
> I have added the spin_lock_irqsave/restore otherwise, testing with
> CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP.

sorry if I disturb you again, any news on these patches?
Please, let me know.

Regards
Peppe

>
> [    8.030000]
> [    8.030000] =================================
> [    8.030000] [ INFO: inconsistent lock state ]
> [    8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted
> [    8.030000] ---------------------------------
> [    8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [    8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [    8.030000]  (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>]
> stmmac_tx+0x1c/0x388
> [    8.030000] {HARDIRQ-ON-W} state was registered at:
> [    8.030000]   [<800562b4>] __lock_acquire+0x638/0x179c
> [    8.030000]   [<80057884>] lock_acquire+0x60/0x74
> [    8.030000]   [<80428a08>] _raw_spin_lock+0x40/0x50
> [    8.030000]   [<802651d8>] stmmac_tx+0x1c/0x388
> [    8.030000]   [<80026be0>] run_timer_softirq+0x180/0x23c
> [    8.030000]   [<80020ccc>] __do_softirq+0xa0/0x114
> [    8.030000]   [<80021204>] irq_exit+0x58/0x7c
> [    8.030000]   [<8000dc80>] handle_IRQ+0x7c/0xb8
> [    8.030000]   [<80008464>] gic_handle_irq+0x34/0x58
> [    8.030000]   [<80429684>] __irq_svc+0x44/0x78
> [    8.030000]   [<8001c3f4>] vprintk+0x41c/0x480
> [    8.030000]   [<8042097c>] printk+0x18/0x24
> [    8.030000]   [<805aef6c>] prepare_namespace+0x1c/0x1a4
> [    8.030000]   [<805ae980>] kernel_init+0x1c8/0x20c
> [    8.030000]   [<8000deb8>] kernel_thread_exit+0x0/0x8
> [    8.030000] irq event stamp: 254745
> [    8.030000] hardirqs last  enabled at (254744): [<80429240>]
> _raw_spin_unlock_irqrestore+0x3c/0x6c
> [    8.030000] hardirqs last disabled at (254745): [<80429674>]
> __irq_svc+0x34/0x78
> [    8.030000] softirqs last  enabled at (254741): [<8035d964>]
> dev_queue_xmit+0x6a4/0x724
> [    8.030000] softirqs last disabled at (254737): [<8035d2d4>]
> dev_queue_xmit+0x14/0x724
> [    8.030000]
> [    8.030000] other info that might help us debug this:
> [    8.030000]  Possible unsafe locking scenario:
> [    8.030000]
> [    8.030000]        CPU0
> [    8.030000]        ----
> [    8.030000]   lock(&(&priv->tx_lock)->rlock);
> [    8.030000]   <Interrupt>
> [    8.030000]     lock(&(&priv->tx_lock)->rlock);
> [    8.030000]
> [    8.030000]  *** DEADLOCK ***
>
>> Therefore, disabling hardware interrupts for this lock is unnecessary
>> and will decrease performance.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Giuseppe CAVALLARO Oct. 4, 2012, 10:06 a.m. UTC | #9
On 9/18/2012 7:41 AM, Giuseppe CAVALLARO wrote:
> Hello David, Ben,
>
> On 9/14/2012 9:36 AM, Giuseppe CAVALLARO wrote:
>> On 9/13/2012 10:23 PM, David Miller wrote:
>>> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
>>> Date: Tue, 11 Sep 2012 08:55:09 +0200
>>>
>>>> +    unsigned long flags;
>>>> +
>>>> +    spin_lock_irqsave(&priv->tx_lock, flags);
>>>>
>>>> -    spin_lock(&priv->tx_lock);
>>>> +    priv->xstats.tx_clean++;
>>>
>>> You are changing the locking here for the sake of the new timer.
>>>
>>> But timers run in software interrupt context, so this change is
>>> completely unnecessary since NAPI runs in software interrupt context
>>> as well, and neither timers nor NAPI run in hardware interrupts
>>> context.
>>
>> Indeed It can be called by the ISR too in this new implementation.
>> I have added the spin_lock_irqsave/restore otherwise, testing with
>> CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP.
>
> sorry if I disturb you again, any news on these patches?
> Please, let me know.

Hello David,

I'd like to review these patches and resend them again (not now that 
net-next is closed) especially because they improve the driver 
performances and other people can get these benefits.

I've some doubts about what exactly I have to do. Sorry if I bother you.

Some of these patches were also reviewed by Ben long time ago; now there 
is only pending the spinlock modifications where you've had some concerns.

Maybe, I was not so clear in my comments so let me provide you further 
details and please let me know.

The stmmac_tx (to clear the tx resources) is now called by both the sw 
timer and the Interrupt handler.

   Note:
   I have also added a threshold that allows to set the interrupt on
   completion bit after a number of frames (Ben also game me some useful
   info to fix this code). On my tests, using both SH4 and ARM (SMP)
   platforms, this helped to make the driver more reactive on heavy
   traffic. Only using the timer I noticed that the driver losses frames
   in some network benchmarks (with iperf, netperf, nuttcp).

On ARM SMP, with  CONFIG_PROVE_LOOKING enabled, I got the info about the 
inconsistent lock state. Indeed, it makes sense to me to protect the 
stmmac_tx from irq in this new implementation.

What do you think, welcome your feedback.

Best Regards
Peppe

>
> Regards
> Peppe
>
>>
>> [    8.030000]
>> [    8.030000] =================================
>> [    8.030000] [ INFO: inconsistent lock state ]
>> [    8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted
>> [    8.030000] ---------------------------------
>> [    8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>> [    8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
>> [    8.030000]  (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>]
>> stmmac_tx+0x1c/0x388
>> [    8.030000] {HARDIRQ-ON-W} state was registered at:
>> [    8.030000]   [<800562b4>] __lock_acquire+0x638/0x179c
>> [    8.030000]   [<80057884>] lock_acquire+0x60/0x74
>> [    8.030000]   [<80428a08>] _raw_spin_lock+0x40/0x50
>> [    8.030000]   [<802651d8>] stmmac_tx+0x1c/0x388
>> [    8.030000]   [<80026be0>] run_timer_softirq+0x180/0x23c
>> [    8.030000]   [<80020ccc>] __do_softirq+0xa0/0x114
>> [    8.030000]   [<80021204>] irq_exit+0x58/0x7c
>> [    8.030000]   [<8000dc80>] handle_IRQ+0x7c/0xb8
>> [    8.030000]   [<80008464>] gic_handle_irq+0x34/0x58
>> [    8.030000]   [<80429684>] __irq_svc+0x44/0x78
>> [    8.030000]   [<8001c3f4>] vprintk+0x41c/0x480
>> [    8.030000]   [<8042097c>] printk+0x18/0x24
>> [    8.030000]   [<805aef6c>] prepare_namespace+0x1c/0x1a4
>> [    8.030000]   [<805ae980>] kernel_init+0x1c8/0x20c
>> [    8.030000]   [<8000deb8>] kernel_thread_exit+0x0/0x8
>> [    8.030000] irq event stamp: 254745
>> [    8.030000] hardirqs last  enabled at (254744): [<80429240>]
>> _raw_spin_unlock_irqrestore+0x3c/0x6c
>> [    8.030000] hardirqs last disabled at (254745): [<80429674>]
>> __irq_svc+0x34/0x78
>> [    8.030000] softirqs last  enabled at (254741): [<8035d964>]
>> dev_queue_xmit+0x6a4/0x724
>> [    8.030000] softirqs last disabled at (254737): [<8035d2d4>]
>> dev_queue_xmit+0x14/0x724
>> [    8.030000]
>> [    8.030000] other info that might help us debug this:
>> [    8.030000]  Possible unsafe locking scenario:
>> [    8.030000]
>> [    8.030000]        CPU0
>> [    8.030000]        ----
>> [    8.030000]   lock(&(&priv->tx_lock)->rlock);
>> [    8.030000]   <Interrupt>
>> [    8.030000]     lock(&(&priv->tx_lock)->rlock);
>> [    8.030000]
>> [    8.030000]  *** DEADLOCK ***
>>
>>> Therefore, disabling hardware interrupts for this lock is unnecessary
>>> and will decrease performance.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index bd32fe6..1d6bd3e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -95,11 +95,13 @@  struct stmmac_extra_stats {
 	unsigned long threshold;
 	unsigned long tx_pkt_n;
 	unsigned long rx_pkt_n;
-	unsigned long rx_napi_poll;
+	unsigned long normal_irq_n;
 	unsigned long rx_normal_irq_n;
+	unsigned long rx_napi_poll;
 	unsigned long tx_normal_irq_n;
-	unsigned long sched_timer_n;
-	unsigned long normal_irq_n;
+	unsigned long txtimer;
+	unsigned long tx_clean;
+	unsigned long tx_reset_ic_bit;
 	unsigned long mmc_tx_irq_n;
 	unsigned long mmc_rx_irq_n;
 	unsigned long mmc_rx_csum_offload_irq_n;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 9f35769..0f5ab28 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -88,6 +88,10 @@  struct stmmac_priv {
 	int eee_enabled;
 	int eee_active;
 	int tx_lpi_timer;
+	struct timer_list txtimer;
+	u32 tx_count_frames;
+	u32 tx_coal_frames;
+	u32 tx_coal_timer;
 };
 
 extern int phyaddr;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 505fe71..48ad0bc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -90,12 +90,13 @@  static const struct stmmac_stats stmmac_gstrings_stats[] = {
 	STMMAC_STAT(threshold),
 	STMMAC_STAT(tx_pkt_n),
 	STMMAC_STAT(rx_pkt_n),
-	STMMAC_STAT(rx_napi_poll),
+	STMMAC_STAT(normal_irq_n),
 	STMMAC_STAT(rx_normal_irq_n),
+	STMMAC_STAT(rx_napi_poll),
 	STMMAC_STAT(tx_normal_irq_n),
-	STMMAC_STAT(sched_timer_n),
-	STMMAC_STAT(normal_irq_n),
-	STMMAC_STAT(normal_irq_n),
+	STMMAC_STAT(txtimer),
+	STMMAC_STAT(tx_clean),
+	STMMAC_STAT(tx_reset_ic_bit),
 	STMMAC_STAT(mmc_tx_irq_n),
 	STMMAC_STAT(mmc_rx_irq_n),
 	STMMAC_STAT(mmc_rx_csum_offload_irq_n),
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 8e1e53e..3df3c3b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -77,6 +77,8 @@ 
 
 #define STMMAC_ALIGN(x)	L1_CACHE_ALIGN(x)
 #define JUMBO_LEN	9000
+#define	STMMAC_TX_TM	40000
+#define STMMAC_TX_MAX_FRAMES	32	/* Max coalesced frame */
 
 /* Module parameters */
 #define TX_TIMEO 5000 /* default 5 seconds */
@@ -695,8 +697,11 @@  static void stmmac_dma_operation_mode(struct stmmac_priv *priv)
 static void stmmac_tx(struct stmmac_priv *priv)
 {
 	unsigned int txsize = priv->dma_tx_size;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
 
-	spin_lock(&priv->tx_lock);
+	priv->xstats.tx_clean++;
 
 	while (priv->dirty_tx != priv->cur_tx) {
 		int last;
@@ -741,7 +746,7 @@  static void stmmac_tx(struct stmmac_priv *priv)
 				skb_recycle_check(skb, priv->dma_buf_sz))
 				__skb_queue_head(&priv->rx_recycle, skb);
 			else
-				dev_kfree_skb(skb);
+				dev_kfree_skb_any(skb);
 
 			priv->tx_skbuff[entry] = NULL;
 		}
@@ -765,7 +770,7 @@  static void stmmac_tx(struct stmmac_priv *priv)
 		stmmac_enable_eee_mode(priv);
 		mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_TIMER(eee_timer));
 	}
-	spin_unlock(&priv->tx_lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
 }
 
 static inline void stmmac_enable_irq(struct stmmac_priv *priv)
@@ -778,29 +783,12 @@  static inline void stmmac_disable_irq(struct stmmac_priv *priv)
 	priv->hw->dma->disable_dma_irq(priv->ioaddr);
 }
 
-static int stmmac_has_work(struct stmmac_priv *priv)
+static void stmmac_txtimer(unsigned long data)
 {
-	unsigned int has_work = 0;
-	int rxret, tx_work = 0;
-
-	rxret = priv->hw->desc->get_rx_owner(priv->dma_rx +
-		(priv->cur_rx % priv->dma_rx_size));
-
-	if (priv->dirty_tx != priv->cur_tx)
-		tx_work = 1;
-
-	if (likely(!rxret || tx_work))
-		has_work = 1;
+	struct stmmac_priv *priv = (struct stmmac_priv *)data;
 
-	return has_work;
-}
-
-static inline void _stmmac_schedule(struct stmmac_priv *priv)
-{
-	if (likely(stmmac_has_work(priv))) {
-		stmmac_disable_irq(priv);
-		napi_schedule(&priv->napi);
-	}
+	priv->xstats.txtimer++;
+	stmmac_tx(priv);
 }
 
 /**
@@ -824,7 +812,7 @@  static void stmmac_tx_err(struct stmmac_priv *priv)
 	netif_wake_queue(priv->dev);
 }
 
-static inline void stmmac_rx_schedule(struct stmmac_priv *priv)
+static void stmmac_rx_schedule(struct stmmac_priv *priv)
 {
 	if (likely(napi_schedule_prep(&priv->napi))) {
 		stmmac_disable_irq(priv);
@@ -1001,6 +989,18 @@  static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 				   priv->dma_rx_phy);
 }
 
+static void stmmac_init_tx_coalesce(struct stmmac_priv *priv)
+{
+	/* Set Tx coalesce parameters and timers */
+	priv->tx_coal_frames = STMMAC_TX_MAX_FRAMES;
+	priv->tx_coal_timer = jiffies + usecs_to_jiffies(STMMAC_TX_TM);
+	init_timer(&priv->txtimer);
+	priv->txtimer.expires = priv->tx_coal_timer;
+	priv->txtimer.data = (unsigned long)priv;
+	priv->txtimer.function = stmmac_txtimer;
+	add_timer(&priv->txtimer);
+}
+
 /**
  *  stmmac_open - open entry point of the driver
  *  @dev : pointer to the device structure.
@@ -1113,6 +1113,8 @@  static int stmmac_open(struct net_device *dev)
 	priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS_TIMER;
 	priv->eee_enabled = stmmac_eee_init(priv);
 
+	stmmac_init_tx_coalesce(priv);
+
 	napi_enable(&priv->napi);
 	skb_queue_head_init(&priv->rx_recycle);
 	netif_start_queue(dev);
@@ -1160,6 +1162,8 @@  static int stmmac_release(struct net_device *dev)
 	napi_disable(&priv->napi);
 	skb_queue_purge(&priv->rx_recycle);
 
+	del_timer_sync(&priv->txtimer);
+
 	/* Free the IRQ lines */
 	free_irq(dev->irq, dev);
 	if (priv->wol_irq != dev->irq)
@@ -1202,6 +1206,7 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 	int nfrags = skb_shinfo(skb)->nr_frags;
 	struct dma_desc *desc, *first;
 	unsigned int nopaged_len = skb_headlen(skb);
+	unsigned long flags;
 
 	if (unlikely(stmmac_tx_avail(priv) < nfrags + 1)) {
 		if (!netif_queue_stopped(dev)) {
@@ -1213,7 +1218,7 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_BUSY;
 	}
 
-	spin_lock(&priv->tx_lock);
+	spin_lock_irqsave(&priv->tx_lock, flags);
 
 	if (priv->tx_path_in_lpi_mode)
 		stmmac_disable_eee_mode(priv);
@@ -1222,11 +1227,13 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 #ifdef STMMAC_XMIT_DEBUG
 	if ((skb->len > ETH_FRAME_LEN) || nfrags)
-		pr_info("stmmac xmit:\n"
-		       "\tskb addr %p - len: %d - nopaged_len: %d\n"
-		       "\tn_frags: %d - ip_summed: %d - %s gso\n",
-		       skb, skb->len, nopaged_len, nfrags, skb->ip_summed,
-		       !skb_is_gso(skb) ? "isn't" : "is");
+		pr_debug("stmmac xmit: [entry %d]\n"
+			 "\tskb addr %p - len: %d - nopaged_len: %d\n"
+			 "\tn_frags: %d - ip_summed: %d - %s gso\n"
+			 "\ttx_count_frames %d\n", entry,
+			 skb, skb->len, nopaged_len, nfrags, skb->ip_summed,
+			 !skb_is_gso(skb) ? "isn't" : "is",
+			 priv->tx_count_frames);
 #endif
 
 	csum_insertion = (skb->ip_summed == CHECKSUM_PARTIAL);
@@ -1236,9 +1243,9 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 #ifdef STMMAC_XMIT_DEBUG
 	if ((nfrags > 0) || (skb->len > ETH_FRAME_LEN))
-		pr_debug("stmmac xmit: skb len: %d, nopaged_len: %d,\n"
-		       "\t\tn_frags: %d, ip_summed: %d\n",
-		       skb->len, nopaged_len, nfrags, skb->ip_summed);
+		pr_debug("\tskb len: %d, nopaged_len: %d,\n"
+			 "\t\tn_frags: %d, ip_summed: %d\n",
+			 skb->len, nopaged_len, nfrags, skb->ip_summed);
 #endif
 	priv->tx_skbuff[entry] = skb;
 
@@ -1269,10 +1276,22 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		wmb();
 	}
 
-	/* Interrupt on completition only for the latest segment */
+	/* Finalize the latest segment. */
 	priv->hw->desc->close_tx_desc(desc);
 
 	wmb();
+	/* According to the coalesce parameter the IC bit for the latest
+	 * segment could be reset and the timer re-started to invoke the
+	 * stmmac_tx function. This approach takes care about the fragments. */
+	priv->tx_count_frames += nfrags + 1;
+	if (priv->tx_coal_frames > priv->tx_count_frames) {
+		priv->hw->desc->clear_tx_ic(desc);
+		priv->xstats.tx_reset_ic_bit++;
+		TX_DBG("\t[entry %d]: tx_count_frames %d\n", entry,
+		       priv->tx_count_frames);
+		mod_timer(&priv->txtimer, priv->tx_coal_timer);
+	} else
+		priv->tx_count_frames = 0;
 
 	/* To avoid raise condition */
 	priv->hw->desc->set_tx_owner(first);
@@ -1302,7 +1321,7 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	priv->hw->dma->enable_dma_transmission(priv->ioaddr);
 
-	spin_unlock(&priv->tx_lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	return NETDEV_TX_OK;
 }
@@ -1447,7 +1466,6 @@  static int stmmac_rx(struct stmmac_priv *priv, int limit)
  *	      all interfaces.
  *  Description :
  *   This function implements the the reception process.
- *   Also it runs the TX completion thread
  */
 static int stmmac_poll(struct napi_struct *napi, int budget)
 {