Patchwork [net-next,2/2] gianfar: convert u64 status counters to atomic64_t

login
register
mail settings
Submitter Paul Gortmaker
Date Feb. 13, 2013, 12:24 a.m.
Message ID <1360715064-2689-3-git-send-email-paul.gortmaker@windriver.com>
Download mbox | patch
Permalink /patch/220016/
State Accepted
Delegated to: David Miller
Headers show

Comments

Paul Gortmaker - Feb. 13, 2013, 12:24 a.m.
While looking at some asm dump for an unrelated change, Eric
noticed in the following stats count increment code:

    50b8:       81 3c 01 f8     lwz     r9,504(r28)
    50bc:       81 5c 01 fc     lwz     r10,508(r28)
    50c0:       31 4a 00 01     addic   r10,r10,1
    50c4:       7d 29 01 94     addze   r9,r9
    50c8:       91 3c 01 f8     stw     r9,504(r28)
    50cc:       91 5c 01 fc     stw     r10,508(r28)

that a 64 bit counter was used on ppc-32 without sync
and hence the "ethtool -S" output was racy.

Here we convert all the values to use atomic64_t so that
the output will always be consistent.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 drivers/net/ethernet/freescale/gianfar.c         | 26 ++++++++++----------
 drivers/net/ethernet/freescale/gianfar.h         | 31 ++++++++++++------------
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  4 +--
 3 files changed, 31 insertions(+), 30 deletions(-)
Claudiu Manoil - Feb. 13, 2013, 3:22 p.m.
On 2/13/2013 2:24 AM, Paul Gortmaker wrote:
> While looking at some asm dump for an unrelated change, Eric
> noticed in the following stats count increment code:
>
>      50b8:       81 3c 01 f8     lwz     r9,504(r28)
>      50bc:       81 5c 01 fc     lwz     r10,508(r28)
>      50c0:       31 4a 00 01     addic   r10,r10,1
>      50c4:       7d 29 01 94     addze   r9,r9
>      50c8:       91 3c 01 f8     stw     r9,504(r28)
>      50cc:       91 5c 01 fc     stw     r10,508(r28)
>
> that a 64 bit counter was used on ppc-32 without sync
> and hence the "ethtool -S" output was racy.
>
> Here we convert all the values to use atomic64_t so that
> the output will always be consistent.
>

At least it seems that this conversion results in fewer asm
instructions, as apparently addze and the double lwz/stw are
not generated anymore. Hopefully it's faster too :P

Reviewed-by: Claudiu Manoil <claudiu.manoil@freescale.com>

Thanks,
Claudiu


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Feb. 13, 2013, 4:14 p.m.
On Wed, 2013-02-13 at 17:22 +0200, Claudiu Manoil wrote:

> At least it seems that this conversion results in fewer asm
> instructions, as apparently addze and the double lwz/stw are
> not generated anymore. Hopefully it's faster too :P

Strange, could you show us these asm instructions ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Claudiu Manoil - Feb. 13, 2013, 5:47 p.m.
On 2/13/2013 6:14 PM, Eric Dumazet wrote:
> On Wed, 2013-02-13 at 17:22 +0200, Claudiu Manoil wrote:
>
>> At least it seems that this conversion results in fewer asm
>> instructions, as apparently addze and the double lwz/stw are
>> not generated anymore. Hopefully it's faster too :P
>
> Strange, could you show us these asm instructions ?
>
>
>

Ok, I'm looking over gfar_clean_rx_ring's asm code, with and w/o this
patch. They are difficult to compare as asm code changed considerably,
the initial version having more lines.
The first thing I notice is that the initial ver has 13 'addic'
instructions, and the new version has 7.

Now taking the code around the last 'addic' instruction (from the
gfar_clean_rx_ring function):

Initial version looks like this:

     5024:	4b ff ff ac 	b       4fd0 <gfar_clean_rx_ring+0x450>
     5028:	81 5d 06 30 	lwz     r10,1584(r29)
     502c:	81 7d 06 34 	lwz     r11,1588(r29)
     5030:	31 6b 00 01 	addic   r11,r11,1
     5034:	7d 4a 01 94 	addze   r10,r10
     5038:	91 5d 06 30 	stw     r10,1584(r29)
     503c:	91 7d 06 34 	stw     r11,1588(r29)
     5040:	4b ff fe fc 	b       4f3c <gfar_clean_rx_ring+0x3bc>

New version looks like this:

     4ff8:	4b ff fd a8 	b       4da0 <gfar_clean_rx_ring+0x1ec>
     4ffc:	80 1c 00 a0 	lwz     r0,160(r28)
     5000:	38 60 00 00 	li      r3,0
     5004:	80 a1 00 18 	lwz     r5,24(r1)
     5008:	38 80 00 01 	li      r4,1
     500c:	30 00 00 01 	addic   r0,r0,1
     5010:	90 1c 00 a0 	stw     r0,160(r28)
     5014:	48 00 00 01 	bl      5014 <gfar_clean_rx_ring+0x460>
     5018:	4b ff ff 4c 	b       4f64 <gfar_clean_rx_ring+0x3b0>


I have the whole function's asm excepts, if needed.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Feb. 13, 2013, 6:03 p.m.
On Wed, 2013-02-13 at 19:47 +0200, Claudiu Manoil wrote:
> On 2/13/2013 6:14 PM, Eric Dumazet wrote:
> > On Wed, 2013-02-13 at 17:22 +0200, Claudiu Manoil wrote:
> >
> >> At least it seems that this conversion results in fewer asm
> >> instructions, as apparently addze and the double lwz/stw are
> >> not generated anymore. Hopefully it's faster too :P
> >
> > Strange, could you show us these asm instructions ?
> >
> >
> >
> 
> Ok, I'm looking over gfar_clean_rx_ring's asm code, with and w/o this
> patch. They are difficult to compare as asm code changed considerably,
> the initial version having more lines.
> The first thing I notice is that the initial ver has 13 'addic'
> instructions, and the new version has 7.
> 
> Now taking the code around the last 'addic' instruction (from the
> gfar_clean_rx_ring function):
> 
> Initial version looks like this:
> 
>      5024:	4b ff ff ac 	b       4fd0 <gfar_clean_rx_ring+0x450>
>      5028:	81 5d 06 30 	lwz     r10,1584(r29)
>      502c:	81 7d 06 34 	lwz     r11,1588(r29)
>      5030:	31 6b 00 01 	addic   r11,r11,1
>      5034:	7d 4a 01 94 	addze   r10,r10
>      5038:	91 5d 06 30 	stw     r10,1584(r29)
>      503c:	91 7d 06 34 	stw     r11,1588(r29)
>      5040:	4b ff fe fc 	b       4f3c <gfar_clean_rx_ring+0x3bc>
> 
> New version looks like this:
> 
>      4ff8:	4b ff fd a8 	b       4da0 <gfar_clean_rx_ring+0x1ec>
>      4ffc:	80 1c 00 a0 	lwz     r0,160(r28)
>      5000:	38 60 00 00 	li      r3,0
>      5004:	80 a1 00 18 	lwz     r5,24(r1)
>      5008:	38 80 00 01 	li      r4,1
>      500c:	30 00 00 01 	addic   r0,r0,1
>      5010:	90 1c 00 a0 	stw     r0,160(r28)
>      5014:	48 00 00 01 	bl      5014 <gfar_clean_rx_ring+0x460>
>      5018:	4b ff ff 4c 	b       4f64 <gfar_clean_rx_ring+0x3b0>
> 
> 
> I have the whole function's asm excepts, if needed.
> 

I guess you are not looking at the right spot.

32bit powerpc probably use the generic atomic64

Your kernel should have an atomic64_add() function (in lib/atomic64.c)
and gianfar should call it.

This is certainly expensive, but these counters are not in fast path.

If they were in fast path, include/linux/u64_stats_sync.h would be a
better choice.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index ab32bd0..c82f677 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2648,7 +2648,7 @@  static inline void count_errors(unsigned short status, struct net_device *dev)
 	if (status & RXBD_TRUNCATED) {
 		stats->rx_length_errors++;
 
-		estats->rx_trunc++;
+		atomic64_inc(&estats->rx_trunc);
 
 		return;
 	}
@@ -2657,20 +2657,20 @@  static inline void count_errors(unsigned short status, struct net_device *dev)
 		stats->rx_length_errors++;
 
 		if (status & RXBD_LARGE)
-			estats->rx_large++;
+			atomic64_inc(&estats->rx_large);
 		else
-			estats->rx_short++;
+			atomic64_inc(&estats->rx_short);
 	}
 	if (status & RXBD_NONOCTET) {
 		stats->rx_frame_errors++;
-		estats->rx_nonoctet++;
+		atomic64_inc(&estats->rx_nonoctet);
 	}
 	if (status & RXBD_CRCERR) {
-		estats->rx_crcerr++;
+		atomic64_inc(&estats->rx_crcerr);
 		stats->rx_crc_errors++;
 	}
 	if (status & RXBD_OVERRUN) {
-		estats->rx_overrun++;
+		atomic64_inc(&estats->rx_overrun);
 		stats->rx_crc_errors++;
 	}
 }
@@ -2744,7 +2744,7 @@  static int gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
 	ret = napi_gro_receive(napi, skb);
 
 	if (GRO_DROP == ret)
-		priv->extra_stats.kernel_dropped++;
+		atomic64_inc(&priv->extra_stats.kernel_dropped);
 
 	return 0;
 }
@@ -2812,7 +2812,7 @@  int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 			} else {
 				netif_warn(priv, rx_err, dev, "Missing skb!\n");
 				rx_queue->stats.rx_dropped++;
-				priv->extra_stats.rx_skbmissing++;
+				atomic64_inc(&priv->extra_stats.rx_skbmissing);
 			}
 
 		}
@@ -3245,7 +3245,7 @@  static irqreturn_t gfar_error(int irq, void *grp_id)
 			netif_dbg(priv, tx_err, dev,
 				  "TX FIFO underrun, packet dropped\n");
 			dev->stats.tx_dropped++;
-			priv->extra_stats.tx_underrun++;
+			atomic64_inc(&priv->extra_stats.tx_underrun);
 
 			local_irq_save(flags);
 			lock_tx_qs(priv);
@@ -3260,7 +3260,7 @@  static irqreturn_t gfar_error(int irq, void *grp_id)
 	}
 	if (events & IEVENT_BSY) {
 		dev->stats.rx_errors++;
-		priv->extra_stats.rx_bsy++;
+		atomic64_inc(&priv->extra_stats.rx_bsy);
 
 		gfar_receive(irq, grp_id);
 
@@ -3269,19 +3269,19 @@  static irqreturn_t gfar_error(int irq, void *grp_id)
 	}
 	if (events & IEVENT_BABR) {
 		dev->stats.rx_errors++;
-		priv->extra_stats.rx_babr++;
+		atomic64_inc(&priv->extra_stats.rx_babr);
 
 		netif_dbg(priv, rx_err, dev, "babbling RX error\n");
 	}
 	if (events & IEVENT_EBERR) {
-		priv->extra_stats.eberr++;
+		atomic64_inc(&priv->extra_stats.eberr);
 		netif_dbg(priv, rx_err, dev, "bus error\n");
 	}
 	if (events & IEVENT_RXC)
 		netif_dbg(priv, rx_status, dev, "control frame\n");
 
 	if (events & IEVENT_BABT) {
-		priv->extra_stats.tx_babt++;
+		atomic64_inc(&priv->extra_stats.tx_babt);
 		netif_dbg(priv, tx_err, dev, "babbling TX error\n");
 	}
 	return IRQ_HANDLED;
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 61b1785..78125f1 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -627,24 +627,25 @@  struct rmon_mib
 };
 
 struct gfar_extra_stats {
-	u64 kernel_dropped;
-	u64 rx_large;
-	u64 rx_short;
-	u64 rx_nonoctet;
-	u64 rx_crcerr;
-	u64 rx_overrun;
-	u64 rx_bsy;
-	u64 rx_babr;
-	u64 rx_trunc;
-	u64 eberr;
-	u64 tx_babt;
-	u64 tx_underrun;
-	u64 rx_skbmissing;
-	u64 tx_timeout;
+	atomic64_t kernel_dropped;
+	atomic64_t rx_large;
+	atomic64_t rx_short;
+	atomic64_t rx_nonoctet;
+	atomic64_t rx_crcerr;
+	atomic64_t rx_overrun;
+	atomic64_t rx_bsy;
+	atomic64_t rx_babr;
+	atomic64_t rx_trunc;
+	atomic64_t eberr;
+	atomic64_t tx_babt;
+	atomic64_t tx_underrun;
+	atomic64_t rx_skbmissing;
+	atomic64_t tx_timeout;
 };
 
 #define GFAR_RMON_LEN ((sizeof(struct rmon_mib) - 16)/sizeof(u32))
-#define GFAR_EXTRA_STATS_LEN (sizeof(struct gfar_extra_stats)/sizeof(u64))
+#define GFAR_EXTRA_STATS_LEN \
+	(sizeof(struct gfar_extra_stats)/sizeof(atomic64_t))
 
 /* Number of stats exported via ethtool */
 #define GFAR_STATS_LEN (GFAR_RMON_LEN + GFAR_EXTRA_STATS_LEN)
diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 172acb9..75e89ac 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -149,10 +149,10 @@  static void gfar_fill_stats(struct net_device *dev, struct ethtool_stats *dummy,
 	int i;
 	struct gfar_private *priv = netdev_priv(dev);
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	u64 *extra = (u64 *) & priv->extra_stats;
+	atomic64_t *extra = (atomic64_t *)&priv->extra_stats;
 
 	for (i = 0; i < GFAR_EXTRA_STATS_LEN; i++)
-		buf[i] = extra[i];
+		buf[i] = atomic64_read(&extra[i]);
 
 	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_RMON) {
 		u32 __iomem *rmon = (u32 __iomem *) &regs->rmon;