diff mbox

[RFC] Re: Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.

Message ID 20161207123140.GA8785@amd
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Pavel Machek Dec. 7, 2016, 12:31 p.m. UTC
On Fri 2016-12-02 15:05:21, Lino Sanfilippo wrote:
> Hi,
> 
> 
> > 
> > There's nothing that protect stmmac_poll() from running concurently
> > with stmmac_dma_interrupt(), right?
> > 
> 
> could it be that there is also another issue concerned locking?:
> The tx completion handler takes the xmit_lock in case that the
> netif_queue is stopped. This is AFAICS unnecessary, since both
> xmit and completion handler are already synchronized by the private
> tx lock. But it is IMHO also dangerous:
> 
> In the xmit handler we have the locking order
> 1. xmit_lock
> 2. private tx lock
> 
> while in the completion handler its the reverse:
> 
> 1. private tx lock
> 2. xmit lock.
> 
> Do I miss something?

No, it seems you are right. Something like this?

Hmm. And can priv->tx_lock be removed, as we already rely on
netif_tx_lock?

(I copied the "lock already held" annotations from forcedeth. I hope
they are right....)

Best regards,
								Pavel


commit a6f21255dfc11fcadc5062dfd0c5f3d77ca4f634
Author: Pavel <pavel@ucw.cz>
Date:   Wed Dec 7 13:29:15 2016 +0100

    Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
    
    The tx completion handler takes the xmit_lock in case that the
    netif_queue is stopped. This is AFAICS unnecessary, since both
    xmit and completion handler are already synchronized by the private
    tx lock. But it is IMHO also dangerous:
    
    In the xmit handler we have the locking order
    1. xmit_lock
    2. private tx lock
    
    while in the completion handler its the reverse:
    
    1. private tx lock
    2. xmit lock.
    
    Signed-off-by: Pavel Machek <pavel@denx.de>

Comments

Lino Sanfilippo Dec. 7, 2016, 1:18 p.m. UTC | #1
Hi,

On 07.12.2016 13:31, Pavel Machek wrote:
> On Fri 2016-12-02 15:05:21, Lino Sanfilippo wrote:
>> Hi,
>>
>>
>>>
>>> There's nothing that protect stmmac_poll() from running concurently
>>> with stmmac_dma_interrupt(), right?
>>>
>>
>> could it be that there is also another issue concerned locking?:
>> The tx completion handler takes the xmit_lock in case that the
>> netif_queue is stopped. This is AFAICS unnecessary, since both
>> xmit and completion handler are already synchronized by the private
>> tx lock. But it is IMHO also dangerous:
>>
>> In the xmit handler we have the locking order
>> 1. xmit_lock
>> 2. private tx lock
>>
>> while in the completion handler its the reverse:
>>
>> 1. private tx lock
>> 2. xmit lock.
>>
>> Do I miss something?
>
> No, it seems you are right. Something like this?
>
> Hmm. And can priv->tx_lock be removed, as we already rely on
> netif_tx_lock?
>

David wanted indeed the private lock to be removed completely. See
this thread.

https://marc.info/?l=linux-netdev&m=148072001900620&w=2

If you can wait a bit, I already have patches prepared to fix this issue in both
drivers. I want to send them as soon as I am at home (a few hours).

Regards,
Lino
diff mbox

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 982c952..5df9bb3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1380,14 +1380,9 @@  static void stmmac_tx_clean(struct stmmac_priv *priv)
 
 	if (unlikely(netif_queue_stopped(priv->dev) &&
 		     stmmac_tx_avail(priv) > STMMAC_TX_THRESH)) {
-		netif_tx_lock(priv->dev);
-		if (netif_queue_stopped(priv->dev) &&
-		    stmmac_tx_avail(priv) > STMMAC_TX_THRESH) {
-			netif_dbg(priv, tx_done, priv->dev,
-				  "%s: restart transmit\n", __func__);
-			netif_wake_queue(priv->dev);
-		}
-		netif_tx_unlock(priv->dev);
+		netif_dbg(priv, tx_done, priv->dev,
+			  "%s: restart transmit\n", __func__);
+		netif_wake_queue(priv->dev);
 	}
 
 	if ((priv->eee_enabled) && (!priv->tx_path_in_lpi_mode)) {
@@ -1630,7 +1625,9 @@  static void stmmac_tx_timer(unsigned long data)
 {
 	struct stmmac_priv *priv = (struct stmmac_priv *)data;
 
+	netif_tx_lock_bh(priv->dev);
 	stmmac_tx_clean(priv);
+	netif_tx_unlock_bh(priv->dev);
 }
 
 /**
@@ -1994,7 +1991,8 @@  static void stmmac_tso_allocator(struct stmmac_priv *priv, unsigned int des,
  *   --------
  *
  * mss is fixed when enable tso, so w/o programming the TDES3 ctx field.
- */
+ *
+ *  Called with netif_tx_lock held.                                                                 */
 static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	u32 pay_len, mss;
@@ -2174,6 +2172,7 @@  static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
  *  Description : this is the tx entry point of the driver.
  *  It programs the chain or the ring and supports oversized frames
  *  and SG feature.
+ *  Called with netif_tx_lock held. 
  */
 static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -2684,7 +2683,9 @@  static int stmmac_poll(struct napi_struct *napi, int budget)
 	int work_done = 0;
 
 	priv->xstats.napi_poll++;
+	netif_tx_lock_bh(priv->dev);
 	stmmac_tx_clean(priv);
+	netif_tx_unlock_bh(priv->dev);
 
 	work_done = stmmac_rx(priv, budget);
 	if (work_done < budget) {
@@ -2701,6 +2702,7 @@  static int stmmac_poll(struct napi_struct *napi, int budget)
  *   complete within a reasonable time. The driver will mark the error in the
  *   netdev structure and arrange for the device to be reset to a sane state
  *   in order to transmit a new packet.
+ * Called with netif_tx_lock held.
  */
 static void stmmac_tx_timeout(struct net_device *dev)
 {