Message ID | 1280784368-4226-9-git-send-email-mcarlson@broadcom.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Hi, Just saw this go in: > static inline u32 tg3_tx_avail(struct tg3_napi *tnapi) > { > - smp_mb(); > + /* Tell compiler to fetch tx indices from memory. */ > + barrier(); > return tnapi->tx_pending - > ((tnapi->tx_prod - tnapi->tx_cons) & (TG3_TX_RING_SIZE - 1)); > } Which worries me. Are we sure we don't need any ordering (eg smp_rmb)? A compiler barrier does nothing to ensure two loads are ordered. Anton -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 04, 2010 at 03:27:41PM -0700, Anton Blanchard wrote: > > Hi, > > Just saw this go in: > > > static inline u32 tg3_tx_avail(struct tg3_napi *tnapi) > > { > > - smp_mb(); > > + /* Tell compiler to fetch tx indices from memory. */ > > + barrier(); > > return tnapi->tx_pending - > > ((tnapi->tx_prod - tnapi->tx_cons) & (TG3_TX_RING_SIZE - 1)); > > } > > Which worries me. Are we sure we don't need any ordering (eg smp_rmb)? > A compiler barrier does nothing to ensure two loads are ordered. > > Anton The compiler barrier makes sure the loads stay roughly in the same location. In the places where the ordering really matters, we have the memory barrier inlined into the calling code. In the rest of the places, we can relax the ordering requirement because the final value is just used as an estimate. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-08-04 at 15:27 -0700, Anton Blanchard wrote: > Hi, > > Just saw this go in: > > > static inline u32 tg3_tx_avail(struct tg3_napi *tnapi) > > { > > - smp_mb(); > > + /* Tell compiler to fetch tx indices from memory. */ > > + barrier(); > > return tnapi->tx_pending - > > ((tnapi->tx_prod - tnapi->tx_cons) & (TG3_TX_RING_SIZE - 1)); > > } > > Which worries me. Are we sure we don't need any ordering (eg smp_rmb)? > A compiler barrier does nothing to ensure two loads are ordered. We generally only get an estimate of the available tx ring size when we call tg3_tx_avail(), so memory barriers are not generally needed. We put a compiler barrier there to make sure that the compiler will fetch the tx_prod and tx_cons from memory to give us a better estimate. In specific cases detailed in the patch description, we do need memory barriers when we call netif_tx_stop_queue() and then check for the tx ring. We decided to put memory barriers exactly where they're needed instead of inside tg3_tx_avail() which is an overkill. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, > We generally only get an estimate of the available tx ring size when we > call tg3_tx_avail(), so memory barriers are not generally needed. We > put a compiler barrier there to make sure that the compiler will fetch > the tx_prod and tx_cons from memory to give us a better estimate. > > In specific cases detailed in the patch description, we do need memory > barriers when we call netif_tx_stop_queue() and then check for the tx > ring. We decided to put memory barriers exactly where they're needed > instead of inside tg3_tx_avail() which is an overkill. Thanks Matt and Michael. I was pretty sure you were thinking about ordering issues but wanted to double check. Anton -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index 32e3a3d..820a7dd 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -4386,7 +4386,8 @@ static void tg3_tx_recover(struct tg3 *tp) static inline u32 tg3_tx_avail(struct tg3_napi *tnapi) { - smp_mb(); + /* Tell compiler to fetch tx indices from memory. */ + barrier(); return tnapi->tx_pending - ((tnapi->tx_prod - tnapi->tx_cons) & (TG3_TX_RING_SIZE - 1)); } @@ -5670,6 +5671,13 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, tnapi->tx_prod = entry; if (unlikely(tg3_tx_avail(tnapi) <= (MAX_SKB_FRAGS + 1))) { netif_tx_stop_queue(txq); + + /* netif_tx_stop_queue() must be done before checking + * checking tx index in tg3_tx_avail() below, because in + * tg3_tx(), we update tx index before checking for + * netif_tx_queue_stopped(). + */ + smp_mb(); if (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi)) netif_tx_wake_queue(txq); } @@ -5715,6 +5723,13 @@ static int tg3_tso_bug(struct tg3 *tp, struct sk_buff *skb) /* Estimate the number of fragments in the worst case */ if (unlikely(tg3_tx_avail(&tp->napi[0]) <= frag_cnt_est)) { netif_stop_queue(tp->dev); + + /* netif_tx_stop_queue() must be done before checking + * checking tx index in tg3_tx_avail() below, because in + * tg3_tx(), we update tx index before checking for + * netif_tx_queue_stopped(). + */ + smp_mb(); if (tg3_tx_avail(&tp->napi[0]) <= frag_cnt_est) return NETDEV_TX_BUSY; @@ -5950,6 +5965,13 @@ static netdev_tx_t tg3_start_xmit_dma_bug(struct sk_buff *skb, tnapi->tx_prod = entry; if (unlikely(tg3_tx_avail(tnapi) <= (MAX_SKB_FRAGS + 1))) { netif_tx_stop_queue(txq); + + /* netif_tx_stop_queue() must be done before checking + * checking tx index in tg3_tx_avail() below, because in + * tg3_tx(), we update tx index before checking for + * netif_tx_queue_stopped(). + */ + smp_mb(); if (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi)) netif_tx_wake_queue(txq); }