diff mbox series

[net,1/3] bnxt_en: Fix accumulation of bp->net_stats_prev.

Message ID 1590442879-18961-2-git-send-email-michael.chan@broadcom.com
State Accepted
Delegated to: David Miller
Headers show
Series bnxt_en: Bug fixes. | expand

Commit Message

Michael Chan May 25, 2020, 9:41 p.m. UTC
We have logic to maintain network counters across resets by storing
the counters in bp->net_stats_prev before reset.  But not all resets
will clear the counters.  Certain resets that don't need to change
the number of rings do not clear the counters.  The current logic
accumulates the counters before all resets, causing big jumps in
the counters after some resets, such as ethtool -G.

Fix it by only accumulating the counters during reset if the irq_re_init
parameter is set.  The parameter signifies that all rings and interrupts
will be reset and that means that the counters will also be reset.

Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Fixes: b8875ca356f1 ("bnxt_en: Save ring statistics before reset.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jakub Kicinski May 26, 2020, 10:04 p.m. UTC | #1
On Mon, 25 May 2020 17:41:17 -0400 Michael Chan wrote:
> We have logic to maintain network counters across resets by storing
> the counters in bp->net_stats_prev before reset.  But not all resets
> will clear the counters.  Certain resets that don't need to change
> the number of rings do not clear the counters.  The current logic
> accumulates the counters before all resets, causing big jumps in
> the counters after some resets, such as ethtool -G.
> 
> Fix it by only accumulating the counters during reset if the irq_re_init
> parameter is set.  The parameter signifies that all rings and interrupts
> will be reset and that means that the counters will also be reset.
> 
> Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com>
> Fixes: b8875ca356f1 ("bnxt_en: Save ring statistics before reset.")
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>

Hi Michael! 

Could you explain why accumulating counters causes a jump?
Michael Chan May 26, 2020, 10:43 p.m. UTC | #2
On Tue, May 26, 2020 at 3:04 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 25 May 2020 17:41:17 -0400 Michael Chan wrote:
> > We have logic to maintain network counters across resets by storing
> > the counters in bp->net_stats_prev before reset.  But not all resets
> > will clear the counters.  Certain resets that don't need to change
> > the number of rings do not clear the counters.  The current logic
> > accumulates the counters before all resets, causing big jumps in
> > the counters after some resets, such as ethtool -G.
> >
> > Fix it by only accumulating the counters during reset if the irq_re_init
> > parameter is set.  The parameter signifies that all rings and interrupts
> > will be reset and that means that the counters will also be reset.
> >
> > Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com>
> > Fixes: b8875ca356f1 ("bnxt_en: Save ring statistics before reset.")
> > Signed-off-by: Michael Chan <michael.chan@broadcom.com>
>
> Hi Michael!
>
> Could you explain why accumulating counters causes a jump?

Yes, during chip reset, we free most hardware resources including
possibly hardware counter resources.  After freeing the hardware
counters, the counters will go to zero.  To preserve the counters, we
take a snapshot of the hardware counters and add them to the
bp->net_stats_prev.  The counters in bp->net_stats_prev are always
added to the current hardware counters to provide the true counters.

The problem is that not all resets will free the hardware counters.
The old code is unconditionally taking the snapshot during reset.  So
when we don't free the hardware counters, the snapshot will cause us
to effectively double the hardware counters after the reset.
Jakub Kicinski May 26, 2020, 10:47 p.m. UTC | #3
On Tue, 26 May 2020 15:43:52 -0700 Michael Chan wrote:
> On Tue, May 26, 2020 at 3:04 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Mon, 25 May 2020 17:41:17 -0400 Michael Chan wrote:  
> > > We have logic to maintain network counters across resets by storing
> > > the counters in bp->net_stats_prev before reset.  But not all resets
> > > will clear the counters.  Certain resets that don't need to change
> > > the number of rings do not clear the counters.  The current logic
> > > accumulates the counters before all resets, causing big jumps in
> > > the counters after some resets, such as ethtool -G.
> > >
> > > Fix it by only accumulating the counters during reset if the irq_re_init
> > > parameter is set.  The parameter signifies that all rings and interrupts
> > > will be reset and that means that the counters will also be reset.
> > >
> > > Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com>
> > > Fixes: b8875ca356f1 ("bnxt_en: Save ring statistics before reset.")
> > > Signed-off-by: Michael Chan <michael.chan@broadcom.com>  
> >
> > Hi Michael!
> >
> > Could you explain why accumulating counters causes a jump?  
> 
> Yes, during chip reset, we free most hardware resources including
> possibly hardware counter resources.  After freeing the hardware
> counters, the counters will go to zero.  To preserve the counters, we
> take a snapshot of the hardware counters and add them to the
> bp->net_stats_prev.  The counters in bp->net_stats_prev are always
> added to the current hardware counters to provide the true counters.
> 
> The problem is that not all resets will free the hardware counters.
> The old code is unconditionally taking the snapshot during reset.  So
> when we don't free the hardware counters, the snapshot will cause us
> to effectively double the hardware counters after the reset.

Aw! I see what you mean now, thanks for the explanation!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d1a8371..abb203c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9310,7 +9310,7 @@  static void __bnxt_close_nic(struct bnxt *bp, bool irq_re_init,
 	bnxt_free_skbs(bp);
 
 	/* Save ring stats before shutdown */
-	if (bp->bnapi)
+	if (bp->bnapi && irq_re_init)
 		bnxt_get_ring_stats(bp, &bp->net_stats_prev);
 	if (irq_re_init) {
 		bnxt_free_irq(bp);