Message ID | 1347346514-23411-4-git-send-email-peppe.cavallaro@st.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Giuseppe CAVALLARO <peppe.cavallaro@st.com> Date: Tue, 11 Sep 2012 08:55:09 +0200 > + unsigned long flags; > + > + spin_lock_irqsave(&priv->tx_lock, flags); > > - spin_lock(&priv->tx_lock); > + priv->xstats.tx_clean++; You are changing the locking here for the sake of the new timer. But timers run in software interrupt context, so this change is completely unnecessary since NAPI runs in software interrupt context as well, and neither timers nor NAPI run in hardware interrupts context. Therefore, disabling hardware interrupts for this lock is unnecessary and will decrease performance. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 16:23 -0400, David Miller wrote: > From: Giuseppe CAVALLARO <peppe.cavallaro@st.com> > Date: Tue, 11 Sep 2012 08:55:09 +0200 > > > + unsigned long flags; > > + > > + spin_lock_irqsave(&priv->tx_lock, flags); > > > > - spin_lock(&priv->tx_lock); > > + priv->xstats.tx_clean++; > > You are changing the locking here for the sake of the new timer. > > But timers run in software interrupt context, so this change is > completely unnecessary since NAPI runs in software interrupt context > as well, and neither timers nor NAPI run in hardware interrupts > context. NAPI pollers can be called by netpoll in hardware interrupt context, and this driver supports netpoll. Ben. > Therefore, disabling hardware interrupts for this lock is unnecessary > and will decrease performance.
From: Ben Hutchings <bhutchings@solarflare.com> Date: Thu, 13 Sep 2012 21:42:51 +0100 > On Thu, 2012-09-13 at 16:23 -0400, David Miller wrote: >> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com> >> Date: Tue, 11 Sep 2012 08:55:09 +0200 >> >> > + unsigned long flags; >> > + >> > + spin_lock_irqsave(&priv->tx_lock, flags); >> > >> > - spin_lock(&priv->tx_lock); >> > + priv->xstats.tx_clean++; >> >> You are changing the locking here for the sake of the new timer. >> >> But timers run in software interrupt context, so this change is >> completely unnecessary since NAPI runs in software interrupt context >> as well, and neither timers nor NAPI run in hardware interrupts >> context. > > NAPI pollers can be called by netpoll in hardware interrupt context, and > this driver supports netpoll. And the netpoll handler need to make amends to make sure that hardware the environment expected by the software interrupt code is preserved. Well written NAPI drivers never need to disable hardware interrupts in their ->poll() method and it's callers, neither should you. I'm not considering your patches until you implement this properly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote: > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Thu, 13 Sep 2012 21:42:51 +0100 > > > On Thu, 2012-09-13 at 16:23 -0400, David Miller wrote: > >> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com> > >> Date: Tue, 11 Sep 2012 08:55:09 +0200 > >> > >> > + unsigned long flags; > >> > + > >> > + spin_lock_irqsave(&priv->tx_lock, flags); > >> > > >> > - spin_lock(&priv->tx_lock); > >> > + priv->xstats.tx_clean++; > >> > >> You are changing the locking here for the sake of the new timer. > >> > >> But timers run in software interrupt context, so this change is > >> completely unnecessary since NAPI runs in software interrupt context > >> as well, and neither timers nor NAPI run in hardware interrupts > >> context. > > > > NAPI pollers can be called by netpoll in hardware interrupt context, and > > this driver supports netpoll. > > And the netpoll handler need to make amends to make sure that > hardware the environment expected by the software interrupt > code is preserved. > > Well written NAPI drivers never need to disable hardware interrupts > in their ->poll() method and it's callers, neither should you. Perhaps you should get round to reviewing netpoll, because it does exactly this. > I'm not considering your patches until you implement this properly. These aren't my patches. Ben.
From: Ben Hutchings <bhutchings@solarflare.com> Date: Thu, 13 Sep 2012 22:10:50 +0100 > On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote: >> From: Ben Hutchings <bhutchings@solarflare.com> >> Date: Thu, 13 Sep 2012 21:42:51 +0100 >> >> Well written NAPI drivers never need to disable hardware interrupts >> in their ->poll() method and it's callers, neither should you. > > Perhaps you should get round to reviewing netpoll, because it does > exactly this. Then I don't understand the point you're trying to make. Hardware interrupt disabling has absolutely no place in the NAPI polling fast paths. If NAPI drivers can't be implemented without hardware interrupt toggling in ->poll(), we've failed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 17:37 -0400, David Miller wrote: > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Thu, 13 Sep 2012 22:10:50 +0100 > > > On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote: > >> From: Ben Hutchings <bhutchings@solarflare.com> > >> Date: Thu, 13 Sep 2012 21:42:51 +0100 > >> > >> Well written NAPI drivers never need to disable hardware interrupts > >> in their ->poll() method and it's callers, neither should you. > > > > Perhaps you should get round to reviewing netpoll, because it does > > exactly this. > > Then I don't understand the point you're trying to make. > > Hardware interrupt disabling has absolutely no place in the > NAPI polling fast paths. > > If NAPI drivers can't be implemented without hardware interrupt > toggling in ->poll(), we've failed. Right. The problem being that NAPI poll functions *are* sometimes called in hardware interrupt context. Thus, any spinlock that may be taken by a NAPI handler, may well need to be taken with spinlock_irq or spinlock_irqsave elsewhere. (This is horrible and I think it's well past time that we ripped the NAPI polling out of netpoll.) I think you're right that stmmac_tx() (completion handler?) doesn't need to disable hardware interrupts, but sadly stmmac_xmit() does right now unless Giuseppe can work out how to make their interaction lockless. Ben.
On 9/13/2012 10:23 PM, David Miller wrote: > From: Giuseppe CAVALLARO <peppe.cavallaro@st.com> > Date: Tue, 11 Sep 2012 08:55:09 +0200 > >> + unsigned long flags; >> + >> + spin_lock_irqsave(&priv->tx_lock, flags); >> >> - spin_lock(&priv->tx_lock); >> + priv->xstats.tx_clean++; > > You are changing the locking here for the sake of the new timer. > > But timers run in software interrupt context, so this change is > completely unnecessary since NAPI runs in software interrupt context > as well, and neither timers nor NAPI run in hardware interrupts > context. Indeed It can be called by the ISR too in this new implementation. I have added the spin_lock_irqsave/restore otherwise, testing with CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP. [ 8.030000] [ 8.030000] ================================= [ 8.030000] [ INFO: inconsistent lock state ] [ 8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted [ 8.030000] --------------------------------- [ 8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [ 8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 8.030000] (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>] stmmac_tx+0x1c/0x388 [ 8.030000] {HARDIRQ-ON-W} state was registered at: [ 8.030000] [<800562b4>] __lock_acquire+0x638/0x179c [ 8.030000] [<80057884>] lock_acquire+0x60/0x74 [ 8.030000] [<80428a08>] _raw_spin_lock+0x40/0x50 [ 8.030000] [<802651d8>] stmmac_tx+0x1c/0x388 [ 8.030000] [<80026be0>] run_timer_softirq+0x180/0x23c [ 8.030000] [<80020ccc>] __do_softirq+0xa0/0x114 [ 8.030000] [<80021204>] irq_exit+0x58/0x7c [ 8.030000] [<8000dc80>] handle_IRQ+0x7c/0xb8 [ 8.030000] [<80008464>] gic_handle_irq+0x34/0x58 [ 8.030000] [<80429684>] __irq_svc+0x44/0x78 [ 8.030000] [<8001c3f4>] vprintk+0x41c/0x480 [ 8.030000] [<8042097c>] printk+0x18/0x24 [ 8.030000] [<805aef6c>] prepare_namespace+0x1c/0x1a4 [ 8.030000] [<805ae980>] kernel_init+0x1c8/0x20c [ 8.030000] [<8000deb8>] kernel_thread_exit+0x0/0x8 [ 8.030000] irq event stamp: 254745 [ 8.030000] hardirqs last enabled at (254744): [<80429240>] _raw_spin_unlock_irqrestore+0x3c/0x6c [ 8.030000] hardirqs last disabled at (254745): [<80429674>] __irq_svc+0x34/0x78 [ 8.030000] softirqs last enabled at (254741): [<8035d964>] dev_queue_xmit+0x6a4/0x724 [ 8.030000] softirqs last disabled at (254737): [<8035d2d4>] dev_queue_xmit+0x14/0x724 [ 8.030000] [ 8.030000] other info that might help us debug this: [ 8.030000] Possible unsafe locking scenario: [ 8.030000] [ 8.030000] CPU0 [ 8.030000] ---- [ 8.030000] lock(&(&priv->tx_lock)->rlock); [ 8.030000] <Interrupt> [ 8.030000] lock(&(&priv->tx_lock)->rlock); [ 8.030000] [ 8.030000] *** DEADLOCK *** > Therefore, disabling hardware interrupts for this lock is unnecessary > and will decrease performance. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello David, Ben, On 9/14/2012 9:36 AM, Giuseppe CAVALLARO wrote: > On 9/13/2012 10:23 PM, David Miller wrote: >> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com> >> Date: Tue, 11 Sep 2012 08:55:09 +0200 >> >>> + unsigned long flags; >>> + >>> + spin_lock_irqsave(&priv->tx_lock, flags); >>> >>> - spin_lock(&priv->tx_lock); >>> + priv->xstats.tx_clean++; >> >> You are changing the locking here for the sake of the new timer. >> >> But timers run in software interrupt context, so this change is >> completely unnecessary since NAPI runs in software interrupt context >> as well, and neither timers nor NAPI run in hardware interrupts >> context. > > Indeed It can be called by the ISR too in this new implementation. > I have added the spin_lock_irqsave/restore otherwise, testing with > CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP. sorry if I disturb you again, any news on these patches? Please, let me know. Regards Peppe > > [ 8.030000] > [ 8.030000] ================================= > [ 8.030000] [ INFO: inconsistent lock state ] > [ 8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted > [ 8.030000] --------------------------------- > [ 8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. > [ 8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes: > [ 8.030000] (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>] > stmmac_tx+0x1c/0x388 > [ 8.030000] {HARDIRQ-ON-W} state was registered at: > [ 8.030000] [<800562b4>] __lock_acquire+0x638/0x179c > [ 8.030000] [<80057884>] lock_acquire+0x60/0x74 > [ 8.030000] [<80428a08>] _raw_spin_lock+0x40/0x50 > [ 8.030000] [<802651d8>] stmmac_tx+0x1c/0x388 > [ 8.030000] [<80026be0>] run_timer_softirq+0x180/0x23c > [ 8.030000] [<80020ccc>] __do_softirq+0xa0/0x114 > [ 8.030000] [<80021204>] irq_exit+0x58/0x7c > [ 8.030000] [<8000dc80>] handle_IRQ+0x7c/0xb8 > [ 8.030000] [<80008464>] gic_handle_irq+0x34/0x58 > [ 8.030000] [<80429684>] __irq_svc+0x44/0x78 > [ 8.030000] [<8001c3f4>] vprintk+0x41c/0x480 > [ 8.030000] [<8042097c>] printk+0x18/0x24 > [ 8.030000] [<805aef6c>] prepare_namespace+0x1c/0x1a4 > [ 8.030000] [<805ae980>] kernel_init+0x1c8/0x20c > [ 8.030000] [<8000deb8>] kernel_thread_exit+0x0/0x8 > [ 8.030000] irq event stamp: 254745 > [ 8.030000] hardirqs last enabled at (254744): [<80429240>] > _raw_spin_unlock_irqrestore+0x3c/0x6c > [ 8.030000] hardirqs last disabled at (254745): [<80429674>] > __irq_svc+0x34/0x78 > [ 8.030000] softirqs last enabled at (254741): [<8035d964>] > dev_queue_xmit+0x6a4/0x724 > [ 8.030000] softirqs last disabled at (254737): [<8035d2d4>] > dev_queue_xmit+0x14/0x724 > [ 8.030000] > [ 8.030000] other info that might help us debug this: > [ 8.030000] Possible unsafe locking scenario: > [ 8.030000] > [ 8.030000] CPU0 > [ 8.030000] ---- > [ 8.030000] lock(&(&priv->tx_lock)->rlock); > [ 8.030000] <Interrupt> > [ 8.030000] lock(&(&priv->tx_lock)->rlock); > [ 8.030000] > [ 8.030000] *** DEADLOCK *** > >> Therefore, disabling hardware interrupts for this lock is unnecessary >> and will decrease performance. >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 9/18/2012 7:41 AM, Giuseppe CAVALLARO wrote: > Hello David, Ben, > > On 9/14/2012 9:36 AM, Giuseppe CAVALLARO wrote: >> On 9/13/2012 10:23 PM, David Miller wrote: >>> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com> >>> Date: Tue, 11 Sep 2012 08:55:09 +0200 >>> >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&priv->tx_lock, flags); >>>> >>>> - spin_lock(&priv->tx_lock); >>>> + priv->xstats.tx_clean++; >>> >>> You are changing the locking here for the sake of the new timer. >>> >>> But timers run in software interrupt context, so this change is >>> completely unnecessary since NAPI runs in software interrupt context >>> as well, and neither timers nor NAPI run in hardware interrupts >>> context. >> >> Indeed It can be called by the ISR too in this new implementation. >> I have added the spin_lock_irqsave/restore otherwise, testing with >> CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP. > > sorry if I disturb you again, any news on these patches? > Please, let me know. Hello David, I'd like to review these patches and resend them again (not now that net-next is closed) especially because they improve the driver performances and other people can get these benefits. I've some doubts about what exactly I have to do. Sorry if I bother you. Some of these patches were also reviewed by Ben long time ago; now there is only pending the spinlock modifications where you've had some concerns. Maybe, I was not so clear in my comments so let me provide you further details and please let me know. The stmmac_tx (to clear the tx resources) is now called by both the sw timer and the Interrupt handler. Note: I have also added a threshold that allows to set the interrupt on completion bit after a number of frames (Ben also game me some useful info to fix this code). On my tests, using both SH4 and ARM (SMP) platforms, this helped to make the driver more reactive on heavy traffic. Only using the timer I noticed that the driver losses frames in some network benchmarks (with iperf, netperf, nuttcp). On ARM SMP, with CONFIG_PROVE_LOOKING enabled, I got the info about the inconsistent lock state. Indeed, it makes sense to me to protect the stmmac_tx from irq in this new implementation. What do you think, welcome your feedback. Best Regards Peppe > > Regards > Peppe > >> >> [ 8.030000] >> [ 8.030000] ================================= >> [ 8.030000] [ INFO: inconsistent lock state ] >> [ 8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted >> [ 8.030000] --------------------------------- >> [ 8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. >> [ 8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes: >> [ 8.030000] (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>] >> stmmac_tx+0x1c/0x388 >> [ 8.030000] {HARDIRQ-ON-W} state was registered at: >> [ 8.030000] [<800562b4>] __lock_acquire+0x638/0x179c >> [ 8.030000] [<80057884>] lock_acquire+0x60/0x74 >> [ 8.030000] [<80428a08>] _raw_spin_lock+0x40/0x50 >> [ 8.030000] [<802651d8>] stmmac_tx+0x1c/0x388 >> [ 8.030000] [<80026be0>] run_timer_softirq+0x180/0x23c >> [ 8.030000] [<80020ccc>] __do_softirq+0xa0/0x114 >> [ 8.030000] [<80021204>] irq_exit+0x58/0x7c >> [ 8.030000] [<8000dc80>] handle_IRQ+0x7c/0xb8 >> [ 8.030000] [<80008464>] gic_handle_irq+0x34/0x58 >> [ 8.030000] [<80429684>] __irq_svc+0x44/0x78 >> [ 8.030000] [<8001c3f4>] vprintk+0x41c/0x480 >> [ 8.030000] [<8042097c>] printk+0x18/0x24 >> [ 8.030000] [<805aef6c>] prepare_namespace+0x1c/0x1a4 >> [ 8.030000] [<805ae980>] kernel_init+0x1c8/0x20c >> [ 8.030000] [<8000deb8>] kernel_thread_exit+0x0/0x8 >> [ 8.030000] irq event stamp: 254745 >> [ 8.030000] hardirqs last enabled at (254744): [<80429240>] >> _raw_spin_unlock_irqrestore+0x3c/0x6c >> [ 8.030000] hardirqs last disabled at (254745): [<80429674>] >> __irq_svc+0x34/0x78 >> [ 8.030000] softirqs last enabled at (254741): [<8035d964>] >> dev_queue_xmit+0x6a4/0x724 >> [ 8.030000] softirqs last disabled at (254737): [<8035d2d4>] >> dev_queue_xmit+0x14/0x724 >> [ 8.030000] >> [ 8.030000] other info that might help us debug this: >> [ 8.030000] Possible unsafe locking scenario: >> [ 8.030000] >> [ 8.030000] CPU0 >> [ 8.030000] ---- >> [ 8.030000] lock(&(&priv->tx_lock)->rlock); >> [ 8.030000] <Interrupt> >> [ 8.030000] lock(&(&priv->tx_lock)->rlock); >> [ 8.030000] >> [ 8.030000] *** DEADLOCK *** >> >>> Therefore, disabling hardware interrupts for this lock is unnecessary >>> and will decrease performance. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index bd32fe6..1d6bd3e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -95,11 +95,13 @@ struct stmmac_extra_stats { unsigned long threshold; unsigned long tx_pkt_n; unsigned long rx_pkt_n; - unsigned long rx_napi_poll; + unsigned long normal_irq_n; unsigned long rx_normal_irq_n; + unsigned long rx_napi_poll; unsigned long tx_normal_irq_n; - unsigned long sched_timer_n; - unsigned long normal_irq_n; + unsigned long txtimer; + unsigned long tx_clean; + unsigned long tx_reset_ic_bit; unsigned long mmc_tx_irq_n; unsigned long mmc_rx_irq_n; unsigned long mmc_rx_csum_offload_irq_n; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 9f35769..0f5ab28 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -88,6 +88,10 @@ struct stmmac_priv { int eee_enabled; int eee_active; int tx_lpi_timer; + struct timer_list txtimer; + u32 tx_count_frames; + u32 tx_coal_frames; + u32 tx_coal_timer; }; extern int phyaddr; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index 505fe71..48ad0bc 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -90,12 +90,13 @@ static const struct stmmac_stats stmmac_gstrings_stats[] = { STMMAC_STAT(threshold), STMMAC_STAT(tx_pkt_n), STMMAC_STAT(rx_pkt_n), - STMMAC_STAT(rx_napi_poll), + STMMAC_STAT(normal_irq_n), STMMAC_STAT(rx_normal_irq_n), + STMMAC_STAT(rx_napi_poll), STMMAC_STAT(tx_normal_irq_n), - STMMAC_STAT(sched_timer_n), - STMMAC_STAT(normal_irq_n), - STMMAC_STAT(normal_irq_n), + STMMAC_STAT(txtimer), + STMMAC_STAT(tx_clean), + STMMAC_STAT(tx_reset_ic_bit), STMMAC_STAT(mmc_tx_irq_n), STMMAC_STAT(mmc_rx_irq_n), STMMAC_STAT(mmc_rx_csum_offload_irq_n), diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 8e1e53e..3df3c3b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -77,6 +77,8 @@ #define STMMAC_ALIGN(x) L1_CACHE_ALIGN(x) #define JUMBO_LEN 9000 +#define STMMAC_TX_TM 40000 +#define STMMAC_TX_MAX_FRAMES 32 /* Max coalesced frame */ /* Module parameters */ #define TX_TIMEO 5000 /* default 5 seconds */ @@ -695,8 +697,11 @@ static void stmmac_dma_operation_mode(struct stmmac_priv *priv) static void stmmac_tx(struct stmmac_priv *priv) { unsigned int txsize = priv->dma_tx_size; + unsigned long flags; + + spin_lock_irqsave(&priv->tx_lock, flags); - spin_lock(&priv->tx_lock); + priv->xstats.tx_clean++; while (priv->dirty_tx != priv->cur_tx) { int last; @@ -741,7 +746,7 @@ static void stmmac_tx(struct stmmac_priv *priv) skb_recycle_check(skb, priv->dma_buf_sz)) __skb_queue_head(&priv->rx_recycle, skb); else - dev_kfree_skb(skb); + dev_kfree_skb_any(skb); priv->tx_skbuff[entry] = NULL; } @@ -765,7 +770,7 @@ static void stmmac_tx(struct stmmac_priv *priv) stmmac_enable_eee_mode(priv); mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_TIMER(eee_timer)); } - spin_unlock(&priv->tx_lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); } static inline void stmmac_enable_irq(struct stmmac_priv *priv) @@ -778,29 +783,12 @@ static inline void stmmac_disable_irq(struct stmmac_priv *priv) priv->hw->dma->disable_dma_irq(priv->ioaddr); } -static int stmmac_has_work(struct stmmac_priv *priv) +static void stmmac_txtimer(unsigned long data) { - unsigned int has_work = 0; - int rxret, tx_work = 0; - - rxret = priv->hw->desc->get_rx_owner(priv->dma_rx + - (priv->cur_rx % priv->dma_rx_size)); - - if (priv->dirty_tx != priv->cur_tx) - tx_work = 1; - - if (likely(!rxret || tx_work)) - has_work = 1; + struct stmmac_priv *priv = (struct stmmac_priv *)data; - return has_work; -} - -static inline void _stmmac_schedule(struct stmmac_priv *priv) -{ - if (likely(stmmac_has_work(priv))) { - stmmac_disable_irq(priv); - napi_schedule(&priv->napi); - } + priv->xstats.txtimer++; + stmmac_tx(priv); } /** @@ -824,7 +812,7 @@ static void stmmac_tx_err(struct stmmac_priv *priv) netif_wake_queue(priv->dev); } -static inline void stmmac_rx_schedule(struct stmmac_priv *priv) +static void stmmac_rx_schedule(struct stmmac_priv *priv) { if (likely(napi_schedule_prep(&priv->napi))) { stmmac_disable_irq(priv); @@ -1001,6 +989,18 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) priv->dma_rx_phy); } +static void stmmac_init_tx_coalesce(struct stmmac_priv *priv) +{ + /* Set Tx coalesce parameters and timers */ + priv->tx_coal_frames = STMMAC_TX_MAX_FRAMES; + priv->tx_coal_timer = jiffies + usecs_to_jiffies(STMMAC_TX_TM); + init_timer(&priv->txtimer); + priv->txtimer.expires = priv->tx_coal_timer; + priv->txtimer.data = (unsigned long)priv; + priv->txtimer.function = stmmac_txtimer; + add_timer(&priv->txtimer); +} + /** * stmmac_open - open entry point of the driver * @dev : pointer to the device structure. @@ -1113,6 +1113,8 @@ static int stmmac_open(struct net_device *dev) priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS_TIMER; priv->eee_enabled = stmmac_eee_init(priv); + stmmac_init_tx_coalesce(priv); + napi_enable(&priv->napi); skb_queue_head_init(&priv->rx_recycle); netif_start_queue(dev); @@ -1160,6 +1162,8 @@ static int stmmac_release(struct net_device *dev) napi_disable(&priv->napi); skb_queue_purge(&priv->rx_recycle); + del_timer_sync(&priv->txtimer); + /* Free the IRQ lines */ free_irq(dev->irq, dev); if (priv->wol_irq != dev->irq) @@ -1202,6 +1206,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) int nfrags = skb_shinfo(skb)->nr_frags; struct dma_desc *desc, *first; unsigned int nopaged_len = skb_headlen(skb); + unsigned long flags; if (unlikely(stmmac_tx_avail(priv) < nfrags + 1)) { if (!netif_queue_stopped(dev)) { @@ -1213,7 +1218,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_BUSY; } - spin_lock(&priv->tx_lock); + spin_lock_irqsave(&priv->tx_lock, flags); if (priv->tx_path_in_lpi_mode) stmmac_disable_eee_mode(priv); @@ -1222,11 +1227,13 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) #ifdef STMMAC_XMIT_DEBUG if ((skb->len > ETH_FRAME_LEN) || nfrags) - pr_info("stmmac xmit:\n" - "\tskb addr %p - len: %d - nopaged_len: %d\n" - "\tn_frags: %d - ip_summed: %d - %s gso\n", - skb, skb->len, nopaged_len, nfrags, skb->ip_summed, - !skb_is_gso(skb) ? "isn't" : "is"); + pr_debug("stmmac xmit: [entry %d]\n" + "\tskb addr %p - len: %d - nopaged_len: %d\n" + "\tn_frags: %d - ip_summed: %d - %s gso\n" + "\ttx_count_frames %d\n", entry, + skb, skb->len, nopaged_len, nfrags, skb->ip_summed, + !skb_is_gso(skb) ? "isn't" : "is", + priv->tx_count_frames); #endif csum_insertion = (skb->ip_summed == CHECKSUM_PARTIAL); @@ -1236,9 +1243,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) #ifdef STMMAC_XMIT_DEBUG if ((nfrags > 0) || (skb->len > ETH_FRAME_LEN)) - pr_debug("stmmac xmit: skb len: %d, nopaged_len: %d,\n" - "\t\tn_frags: %d, ip_summed: %d\n", - skb->len, nopaged_len, nfrags, skb->ip_summed); + pr_debug("\tskb len: %d, nopaged_len: %d,\n" + "\t\tn_frags: %d, ip_summed: %d\n", + skb->len, nopaged_len, nfrags, skb->ip_summed); #endif priv->tx_skbuff[entry] = skb; @@ -1269,10 +1276,22 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) wmb(); } - /* Interrupt on completition only for the latest segment */ + /* Finalize the latest segment. */ priv->hw->desc->close_tx_desc(desc); wmb(); + /* According to the coalesce parameter the IC bit for the latest + * segment could be reset and the timer re-started to invoke the + * stmmac_tx function. This approach takes care about the fragments. */ + priv->tx_count_frames += nfrags + 1; + if (priv->tx_coal_frames > priv->tx_count_frames) { + priv->hw->desc->clear_tx_ic(desc); + priv->xstats.tx_reset_ic_bit++; + TX_DBG("\t[entry %d]: tx_count_frames %d\n", entry, + priv->tx_count_frames); + mod_timer(&priv->txtimer, priv->tx_coal_timer); + } else + priv->tx_count_frames = 0; /* To avoid raise condition */ priv->hw->desc->set_tx_owner(first); @@ -1302,7 +1321,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) priv->hw->dma->enable_dma_transmission(priv->ioaddr); - spin_unlock(&priv->tx_lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); return NETDEV_TX_OK; } @@ -1447,7 +1466,6 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit) * all interfaces. * Description : * This function implements the the reception process. - * Also it runs the TX completion thread */ static int stmmac_poll(struct napi_struct *napi, int budget) {
This patch adds a new schema used for mitigating the number of transmit interrupts. It is based on a sw timer and a threshold value. The timer is used to periodically call the stmmac_tx function that can be invoked by the ISR but only for the descriptors where the interrupt on completion field has been set. This is tuned by a threshold. Next step is to add the ability to tune these coalesce values by ethtool. Till now I have put a default that showed a real gain on all the platforms ARM/SH4 where I performed benchmarks. V2: review the logic to manage the IC bit in the TDESC that was bugged because it didn't take care about the fragments. Also fix the tx_count_frames that has not to be limited to TX DMA ring. Thanks to Ben Hutchings. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> --- drivers/net/ethernet/stmicro/stmmac/common.h | 8 +- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 + .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 9 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 92 ++++++++++++-------- 4 files changed, 69 insertions(+), 44 deletions(-)