diff mbox

[v3] ucc_geth: Lockless xmit

Message ID 1348218675-3804-1-git-send-email-Joakim.Tjernlund@transmode.se
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Joakim Tjernlund Sept. 21, 2012, 9:11 a.m. UTC
Currently ucc_geth_start_xmit wraps IRQ off for the
whole body just to be safe. By rearranging the code a bit
one can avoid the lock completely.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---

 v2: Move assignment of ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]]
     inside IRQ off section to prevent racing against
     ucc_geth_tx(). Spotted by Francois Romieu <romieu@fr.zoreil.com>

 v3: Lockless xmit
     Here is my attemept to do lockless xmit. Thanks to
     Francois Romieu <romieu@fr.zoreil.com> for the idea.

 drivers/net/ethernet/freescale/ucc_geth.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

Comments

Francois Romieu Sept. 21, 2012, 12:51 p.m. UTC | #1
Joakim Tjernlund <Joakim.Tjernlund@transmode.se> :
> Currently ucc_geth_start_xmit wraps IRQ off for the
> whole body just to be safe. By rearranging the code a bit
> one can avoid the lock completely.

Afaics you went a bit too lockless with the queueing disable / enable
logic. The hard_start_xmit handler is run in a locally softirq disabled
section but it will happily race with the napi handler on a different
CPU. Grep netif_tx_lock in tg3.c for it.

The Tx skb free logic probably requires some smp memory barriers as
well since the current skb is used by the ucc_geth driver to sync the
Tx xmit with the napi completion handler.
Joakim Tjernlund Sept. 23, 2012, 9:39 a.m. UTC | #2
Francois Romieu <romieu@fr.zoreil.com> wrote on 2012/09/21 19:35:35:
>
> Please keep netdev in the Cc:.

Sorry, wrong reply button. Now added.

>
> Joakim Tjernlund <joakim.tjernlund@transmode.se> :
> [...]
> > This is what I could come up with, what do you think?
>
> In its current state the driver implicitely relies on the Tx xmit vs Tx
> completion exclusion to work. As long as they exclude each other the
> ugeth->tx_skbuff[txQ][...] and the "bd_status & T_R" states are consistent.
>
> This scheme can not work without locking. The skb alone won't provide a
> synchronization point.

I don't get it. The skb test is there just for one special case, when
the BD ring is empty the (bd_status & T_R) == 0 will be true as well so
one need something more than the bd_status test.

>
> An usual solution would be some skb dirtytx vs curtx comparison + smp
> barriers (Documentation/memory-barriers.txt).

You mean adding an extra test in addition to !skb? Something like:
if (!skb && ugeth->skb_dirtytx[txQ] == ugeth->skb_curtx[txQ])
	break;

>
> Sidenote: is there some reason why modulo (%) operations should be
> avoided on this platform ?

Not that I know. The original driver author did it this way.

>
> [...]
> > @@ -3380,8 +3380,12 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
> >                      1) & TX_RING_MOD_MASK(ugeth->ug_info->bdRingLenTx[txQ]);
> >
> >                 /* We freed a buffer, so now we can restart transmission */
> > -               if (netif_queue_stopped(dev))
> > -                       netif_wake_queue(dev);
> > +               if (netif_queue_stopped(dev)) {
> > +                       netif_tx_lock(dev);
> > +                       if (netif_queue_stopped(dev))
> > +                               netif_wake_queue(dev);
> > +                       netif_tx_unlock(dev);
> > +               }
>
> Without exclusion we don't know if it was stopped before or after the packet
> was freed. There must be some "are there available Tx slots ?" test in the
> locked section.

This makes sense, but stopping relies on ugeth->confBd[txQ] which is updated
last in ucc_geth_tx(). Looks a bit suboptimal though so it should probably
be changed.

>
> Btw the second netif_queue_stopped test can be removed if tx queueing can
> only be stopped in a single place (I did not check).

There is a netif_stop_queue in xmit and one in ucc_geth_close() so I guess
one can say there is only one place.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu Sept. 24, 2012, 9:10 p.m. UTC | #3
Joakim Tjernlund <joakim.tjernlund@transmode.se> :
[...]
> I don't get it. The skb test is there just for one special case, when
> the BD ring is empty the (bd_status & T_R) == 0 will be true as well so
> one need something more than the bd_status test.

Sure but the converse is not true : (bd_status & T_R) == 0 && skb does not
mean that the skb has been sent. It happens when said skb is about to be
given to the hardware by hard_start_xmit as well.
Joakim Tjernlund Sept. 25, 2012, 2:09 p.m. UTC | #4
Francois Romieu <romieu@fr.zoreil.com> wrote on 2012/09/24 23:10:14:
>
> Joakim Tjernlund <joakim.tjernlund@transmode.se> :
> [...]
> > I don't get it. The skb test is there just for one special case, when
> > the BD ring is empty the (bd_status & T_R) == 0 will be true as well so
> > one need something more than the bd_status test.
>
> Sure but the converse is not true : (bd_status & T_R) == 0 && skb does not
> mean that the skb has been sent. It happens when said skb is about to be
> given to the hardware by hard_start_xmit as well.

duhh, I was too tired when trying to make sense of smp & racing in general, thanks.

Will probably be some time before I get to this again due to other stuff though.
The other patches are independent of this one, I hope they are good/accepted?

  Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index 9ac14f8..040aa70 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -3177,19 +3177,20 @@  static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	u8 __iomem *bd;			/* BD pointer */
 	u32 bd_status;
 	u8 txQ = 0;
-	unsigned long flags;
 
 	ugeth_vdbg("%s: IN", __func__);
 
-	spin_lock_irqsave(&ugeth->lock, flags);
-
 	dev->stats.tx_bytes += skb->len;
 
+	/* We are running in BH disabled context with netif_tx_lock
+	 * and TX reclaim runs via tp->napi.poll inside of a software
+	 * interrupt.  Furthermore, IRQ processing runs lockless so we have
+	 * no IRQ context deadlocks to worry about either.  Rejoice!
+	 */
+
 	/* Start from the next BD that should be filled */
 	bd = ugeth->txBd[txQ];
 	bd_status = in_be32((u32 __iomem *)bd);
-	/* Save the skb pointer so we can free it later */
-	ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]] = skb;
 
 	/* Update the current skb pointer (wrapping if this was the last) */
 	ugeth->skb_curtx[txQ] =
@@ -3207,6 +3208,8 @@  static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* set bd status and length */
 	out_be32((u32 __iomem *)bd, bd_status);
+	/* Save the skb pointer so we can free it later */
+	ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]] = skb;
 
 	/* Move to next BD in the ring */
 	if (!(bd_status & T_W))
@@ -3238,8 +3241,6 @@  static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	uccf = ugeth->uccf;
 	out_be16(uccf->p_utodr, UCC_FAST_TOD);
 #endif
-	spin_unlock_irqrestore(&ugeth->lock, flags);
-
 	return NETDEV_TX_OK;
 }
 
@@ -3387,10 +3388,8 @@  static int ucc_geth_poll(struct napi_struct *napi, int budget)
 	ug_info = ugeth->ug_info;
 
 	/* Tx event processing */
-	spin_lock(&ugeth->lock);
 	for (i = 0; i < ug_info->numQueuesTx; i++)
 		ucc_geth_tx(ugeth->ndev, i);
-	spin_unlock(&ugeth->lock);
 
 	howmany = 0;
 	for (i = 0; i < ug_info->numQueuesRx; i++)