Patchwork bnx2x: fix occasional statistics off-by-4GB error

login
register
mail settings
Submitter Maciej Żenczykowski
Date March 15, 2013, 9:56 p.m.
Message ID <1363384577-21287-1-git-send-email-zenczykowski@gmail.com>
Download mbox | patch
Permalink /patch/228189/
State Accepted
Delegated to: David Miller
Headers show

Comments

Maciej Żenczykowski - March 15, 2013, 9:56 p.m.
From: Maciej Żenczykowski <maze@google.com>

The UPDATE_QSTAT function introduced on February 15, 2012
in commit 1355b704b9ba "bnx2x: consistent statistics after
internal driver reload" incorrectly fails to handle overflow
during addition of the lower 32-bit field of a stat.

This bug is present since 3.4-rc1 and should thus be considered
a candidate for stable 3.4+ releases.

Google-Bug-Id: 8374428
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Mintz Yuval <yuvalmin@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Dmitry Kravkov - March 17, 2013, 1:10 p.m.
> -----Original Message-----

> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Maciej ?enczykowski

> Sent: Friday, March 15, 2013 11:56 PM

> To: Maciej Żenczykowski; David S. Miller

> Cc: netdev@vger.kernel.org; Yuval Mintz

> Subject: [PATCH] bnx2x: fix occasional statistics off-by-4GB error

> 

> From: Maciej Żenczykowski <maze@google.com>

> 

> The UPDATE_QSTAT function introduced on February 15, 2012

> in commit 1355b704b9ba "bnx2x: consistent statistics after

> internal driver reload" incorrectly fails to handle overflow

> during addition of the lower 32-bit field of a stat.

> 

> This bug is present since 3.4-rc1 and should thus be considered

> a candidate for stable 3.4+ releases.

> 

> Google-Bug-Id: 8374428

> Signed-off-by: Maciej Żenczykowski <maze@google.com>

> Cc: Mintz Yuval <yuvalmin@broadcom.com>

> ---

>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h | 3 ++-

>  1 file changed, 2 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h

> index 364e37ecbc5c..198f6f1c9ad5 100644

> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h

> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h

> @@ -459,8 +459,9 @@ struct bnx2x_fw_port_stats_old {

> 

>  #define UPDATE_QSTAT(s, t) \

>  	do { \

> -		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi); \

>  		qstats->t##_lo = qstats_old->t##_lo + le32_to_cpu(s.lo); \

> +		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi) \

> +			+ ((qstats->t##_lo < qstats_old->t##_lo) ? 1 : 0); \

>  	} while (0)

> 

>  #define UPDATE_QSTAT_OLD(f) \

> --

> 1.8.1.3

> 


Probably this commit resolved the issue:

commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
Author: Dmitry Kravkov <dmitry@broadcom.com>
Date:   Tue Sep 11 04:34:08 2012 +0000

    bnx2x: Avoid sending multiple statistics queries

Can you try it pls?

Thanks
David Miller - March 17, 2013, 6:24 p.m.
From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Sun, 17 Mar 2013 13:10:37 +0000

> Probably this commit resolved the issue:
> 
> commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> Author: Dmitry Kravkov <dmitry@broadcom.com>
> Date:   Tue Sep 11 04:34:08 2012 +0000
> 
>     bnx2x: Avoid sending multiple statistics queries
> 
> Can you try it pls?

These are completely seperate bugs.

The macro in question does not handle rollover of the lower 32-bits of
the statistic properly at all, and therefore Maciej's patch should be
applied and queud up for -stable.

Please give it your ACK unless you can find a bug in his change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eilon Greenstein - March 17, 2013, 7:37 p.m.
On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
> From: "Dmitry Kravkov" <dmitry@broadcom.com>
> Date: Sun, 17 Mar 2013 13:10:37 +0000
> 
> > Probably this commit resolved the issue:
> > 
> > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> > Author: Dmitry Kravkov <dmitry@broadcom.com>
> > Date:   Tue Sep 11 04:34:08 2012 +0000
> > 
> >     bnx2x: Avoid sending multiple statistics queries
> > 
> > Can you try it pls?
> 
> These are completely seperate bugs.
> 
> The macro in question does not handle rollover of the lower 32-bits of
> the statistic properly at all, and therefore Maciej's patch should be
> applied and queud up for -stable.
> 
> Please give it your ACK unless you can find a bug in his change.

Both the high value and the low value are read from the chip - so this
patch will increment the higher 32 bits twice (well, more than once).
Not taking the HW/FW high counter at all is also not acceptable since
the reading frequency is not high enough without those.

So the original patch is nacked, but we are trying to figure out what is
causing the statistics to misbehave and it might be related to sending
statistics query twice.

Thanks,
Eilon


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - March 17, 2013, 8:25 p.m.
On Sun, 2013-03-17 at 21:37 +0200, Eilon Greenstein wrote:
> On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
> > From: "Dmitry Kravkov" <dmitry@broadcom.com>
> > Date: Sun, 17 Mar 2013 13:10:37 +0000
> > 
> > > Probably this commit resolved the issue:
> > > 
> > > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> > > Author: Dmitry Kravkov <dmitry@broadcom.com>
> > > Date:   Tue Sep 11 04:34:08 2012 +0000
> > > 
> > >     bnx2x: Avoid sending multiple statistics queries
> > > 
> > > Can you try it pls?
> > 
> > These are completely seperate bugs.
> > 
> > The macro in question does not handle rollover of the lower 32-bits of
> > the statistic properly at all, and therefore Maciej's patch should be
> > applied and queud up for -stable.
> > 
> > Please give it your ACK unless you can find a bug in his change.
> 
> Both the high value and the low value are read from the chip - so this
> patch will increment the higher 32 bits twice (well, more than once).
> Not taking the HW/FW high counter at all is also not acceptable since
> the reading frequency is not high enough without those.
> 
> So the original patch is nacked, but we are trying to figure out what is
> causing the statistics to misbehave and it might be related to sending
> statistics query twice.

This looks like the typical problem of updating a 64bit value in non
atomic way.

Its guaranteed to happen on 32bit hosts.
We had to introduce include/linux/u64_stats_sync.h to help to solve this
without adding extra cost on 64bit arches.

In bnx2x case, we perform 32bit operations even on 64bit host, so we
probably need to add a seqcount_t, so that a stat consumer can detect it
read a stale value.





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eilon Greenstein - March 17, 2013, 8:53 p.m.
On Sun, 2013-03-17 at 13:25 -0700, Eric Dumazet wrote:
> On Sun, 2013-03-17 at 21:37 +0200, Eilon Greenstein wrote:
> > On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
> > > From: "Dmitry Kravkov" <dmitry@broadcom.com>
> > > Date: Sun, 17 Mar 2013 13:10:37 +0000
> > > 
> > > > Probably this commit resolved the issue:
> > > > 
> > > > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> > > > Author: Dmitry Kravkov <dmitry@broadcom.com>
> > > > Date:   Tue Sep 11 04:34:08 2012 +0000
> > > > 
> > > >     bnx2x: Avoid sending multiple statistics queries
> > > > 
> > > > Can you try it pls?
> > > 
> > > These are completely seperate bugs.
> > > 
> > > The macro in question does not handle rollover of the lower 32-bits of
> > > the statistic properly at all, and therefore Maciej's patch should be
> > > applied and queud up for -stable.
> > > 
> > > Please give it your ACK unless you can find a bug in his change.
> > 
> > Both the high value and the low value are read from the chip - so this
> > patch will increment the higher 32 bits twice (well, more than once).
> > Not taking the HW/FW high counter at all is also not acceptable since
> > the reading frequency is not high enough without those.
> > 
> > So the original patch is nacked, but we are trying to figure out what is
> > causing the statistics to misbehave and it might be related to sending
> > statistics query twice.
> 
> This looks like the typical problem of updating a 64bit value in non
> atomic way.
> 
> Its guaranteed to happen on 32bit hosts.
> We had to introduce include/linux/u64_stats_sync.h to help to solve this
> without adding extra cost on 64bit arches.
> 
> In bnx2x case, we perform 32bit operations even on 64bit host, so we
> probably need to add a seqcount_t, so that a stat consumer can detect it
> read a stale value.

This is not such a trivial issue - the HW/FW is guaranteeing the atomic
read and this is why we can always use 32b variables.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - March 17, 2013, 11:17 p.m.
On Sun, 2013-03-17 at 22:53 +0200, Eilon Greenstein wrote:

> This is not such a trivial issue - the HW/FW is guaranteeing the atomic
> read and this is why we can always use 32b variables.
> 

I am glad you qualify the following code as 'trivial'

/* difference = minuend - subtrahend */
#define DIFF_64(d_hi, m_hi, s_hi, d_lo, m_lo, s_lo) \
        do { \
                if (m_lo < s_lo) { \
                        /* underflow */ \
                        d_hi = m_hi - s_hi; \
                        if (d_hi > 0) { \
                                /* we can 'loan' 1 */ \
                                d_hi--; \
                                d_lo = m_lo + (UINT_MAX - s_lo) + 1; \
                        } else { \
                                /* m_hi <= s_hi */ \
                                d_hi = 0; \
                                d_lo = 0; \
                        } \
                } else { \
                        /* m_lo >= s_lo */ \
                        if (m_hi < s_hi) { \
                                d_hi = 0; \
                                d_lo = 0; \
                        } else { \
                                /* m_hi >= s_hi */ \
                                d_hi = m_hi - s_hi; \
                                d_lo = m_lo - s_lo; \
                        } \
                } \
        } while (0)


All the macros and _hi/_ho fields found in
drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
are really badly designed.

It hurts so much the common sense its not even funny.

All you really needed is to have normal host structures containing u64
fields, and you only needed one single helper to convert from hw u32:u32
to host u64.

Because gcc handles u64 operations very well, even on 32bit hosts, and
it knows about the carry flag.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski - March 18, 2013, 6:18 a.m.
This has nothing to do with atomicity or races or anything.

Look at the ADD_64 macro (which implements a += b) and compare it to
this one (which implents a = b + c).

It simply doesn't know how to correctly add in any situation in which
"qstats_old->t..._lo != 0".
By this point all the stats are already in software and both firmware
and hardware is not relevant.

Normally qstats_old starts off at 0 and the bug isn't visible (since
with a value of 0, there can't be overflow and carry propagation is
not needed).

The reason qstats_old was introduced was to allow for a partial nic
reset [or whatever the proper term is] which resets fw/hw provided
stats to not reset sw exported stats.
During the reset old stats are copied into qstats_old, and in the
future stats are returned as "qstats_old + current stats from fw/hw"
except this addition is buggy and only works if qstats_old happens to
have lower 32-bits zero (ie. basically only before first partial nic
reset).

Imagine:
(a) driver initializes, nic gets brought up, qstats_old = 0, stats
from device = 0
(b) 2GB of xfer happens
(c) qstats_old still = 0, stats from device = 2GB
(d) addition happens qstats = qstats_old + stats_from_dev = 0 + 2GB =
2GB   --- we report 2GB xfer
(e) something partially resets nic (say mtu change), qstats_old = 2GB,
stats from device = 0
(f) 2GB-1 of xfer happens
(g) qstats_old = 2GB, stats from device = 2GB-1
(h) addition happens qstats = qstats_old + stats_from_dev = 2GB +
2GB-1 = 4GB-1   --- we report 4GB-1 xfer
(i) 1 byte of xfer happens
(j) qstats_old = 2GB, stats from device = 2GB
(k) addition happens qstats = qstats_old + stats_from_dev = 2GB + 2GB
= 0   --- we report 0 xfer
(l) 2GB of xfer happens
(m) qstats_old = 2GB, stats from device = 4GB
(n) addition happens qstats = qstats_old + stats_from_dev = 2GB + 4GB
= 6GB   --- we report 6GB xfer

And yes the macro in question when asked to do 2GB+2GB will return 0.

(btw. at least on the driver I tested there's still some weirdness
with mtu change not-resetting per-queue stats but resetting
coalesced-from-all-queues stats)

Now, to be fair, I don't yet have confirmation that my patch fixes the
problem our SREs saw in deployment, but I expect to have that soon.
(and even if it doesn't fix the problem, this code is still buggy and
should be fixed)

Maciej

On Sun, Mar 17, 2013 at 1:53 PM, Eilon Greenstein <eilong@broadcom.com> wrote:
> On Sun, 2013-03-17 at 13:25 -0700, Eric Dumazet wrote:
>> On Sun, 2013-03-17 at 21:37 +0200, Eilon Greenstein wrote:
>> > On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
>> > > From: "Dmitry Kravkov" <dmitry@broadcom.com>
>> > > Date: Sun, 17 Mar 2013 13:10:37 +0000
>> > >
>> > > > Probably this commit resolved the issue:
>> > > >
>> > > > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
>> > > > Author: Dmitry Kravkov <dmitry@broadcom.com>
>> > > > Date:   Tue Sep 11 04:34:08 2012 +0000
>> > > >
>> > > >     bnx2x: Avoid sending multiple statistics queries
>> > > >
>> > > > Can you try it pls?
>> > >
>> > > These are completely seperate bugs.
>> > >
>> > > The macro in question does not handle rollover of the lower 32-bits of
>> > > the statistic properly at all, and therefore Maciej's patch should be
>> > > applied and queud up for -stable.
>> > >
>> > > Please give it your ACK unless you can find a bug in his change.
>> >
>> > Both the high value and the low value are read from the chip - so this
>> > patch will increment the higher 32 bits twice (well, more than once).
>> > Not taking the HW/FW high counter at all is also not acceptable since
>> > the reading frequency is not high enough without those.
>> >
>> > So the original patch is nacked, but we are trying to figure out what is
>> > causing the statistics to misbehave and it might be related to sending
>> > statistics query twice.
>>
>> This looks like the typical problem of updating a 64bit value in non
>> atomic way.
>>
>> Its guaranteed to happen on 32bit hosts.
>> We had to introduce include/linux/u64_stats_sync.h to help to solve this
>> without adding extra cost on 64bit arches.
>>
>> In bnx2x case, we perform 32bit operations even on 64bit host, so we
>> probably need to add a seqcount_t, so that a stat consumer can detect it
>> read a stale value.
>
> This is not such a trivial issue - the HW/FW is guaranteeing the atomic
> read and this is why we can always use 32b variables.
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eilon Greenstein - March 18, 2013, 7:27 a.m.
On Sun, 2013-03-17 at 16:17 -0700, Eric Dumazet wrote:
> On Sun, 2013-03-17 at 22:53 +0200, Eilon Greenstein wrote:
> 
> > This is not such a trivial issue - the HW/FW is guaranteeing the atomic
> > read and this is why we can always use 32b variables.
> > 
> 
> I am glad you qualify the following code as 'trivial'
> 

Eric - I did not mean that the solution is trivial, I meant that the
problem is trivial. Like the trivial statement that the code should not
contain any bugs :)


> All you really needed is to have normal host structures containing u64
> fields, and you only needed one single helper to convert from hw u32:u32
> to host u64.
> 
> Because gcc handles u64 operations very well, even on 32bit hosts, and
> it knows about the carry flag.

I was not clear when I said size-extension: most of the HW counters are
46bits and we are trying to keep 64bits counters in the SW, so simply
copying the data from the HW is not enough.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eilon Greenstein - March 18, 2013, 10:06 a.m.
On Sun, 2013-03-17 at 23:18 -0700, Maciej Żenczykowski wrote:
> This has nothing to do with atomicity or races or anything.
> 
> Look at the ADD_64 macro (which implements a += b) and compare it to
> this one (which implents a = b + c).
> 
> It simply doesn't know how to correctly add in any situation in which
> "qstats_old->t..._lo != 0".
> By this point all the stats are already in software and both firmware
> and hardware is not relevant.
> 
> Normally qstats_old starts off at 0 and the bug isn't visible (since
> with a value of 0, there can't be overflow and carry propagation is
> not needed).
> 
> The reason qstats_old was introduced was to allow for a partial nic
> reset [or whatever the proper term is] which resets fw/hw provided
> stats to not reset sw exported stats.
> During the reset old stats are copied into qstats_old, and in the
> future stats are returned as "qstats_old + current stats from fw/hw"
> except this addition is buggy and only works if qstats_old happens to
> have lower 32-bits zero (ie. basically only before first partial nic
> reset).
> 
> Imagine:
> (a) driver initializes, nic gets brought up, qstats_old = 0, stats
> from device = 0
> (b) 2GB of xfer happens
> (c) qstats_old still = 0, stats from device = 2GB
> (d) addition happens qstats = qstats_old + stats_from_dev = 0 + 2GB =
> 2GB   --- we report 2GB xfer
> (e) something partially resets nic (say mtu change), qstats_old = 2GB,
> stats from device = 0
> (f) 2GB-1 of xfer happens
> (g) qstats_old = 2GB, stats from device = 2GB-1
> (h) addition happens qstats = qstats_old + stats_from_dev = 2GB +
> 2GB-1 = 4GB-1   --- we report 4GB-1 xfer
> (i) 1 byte of xfer happens
> (j) qstats_old = 2GB, stats from device = 2GB
> (k) addition happens qstats = qstats_old + stats_from_dev = 2GB + 2GB
> = 0   --- we report 0 xfer
> (l) 2GB of xfer happens
> (m) qstats_old = 2GB, stats from device = 4GB
> (n) addition happens qstats = qstats_old + stats_from_dev = 2GB + 4GB
> = 6GB   --- we report 6GB xfer
> 
> And yes the macro in question when asked to do 2GB+2GB will return 0.
> 
> (btw. at least on the driver I tested there's still some weirdness
> with mtu change not-resetting per-queue stats but resetting
> coalesced-from-all-queues stats)
> 
> Now, to be fair, I don't yet have confirmation that my patch fixes the
> problem our SREs saw in deployment, but I expect to have that soon.
> (and even if it doesn't fix the problem, this code is still buggy and
> should be fixed)
> 
> Maciej

Maciej - thanks for the detailed information. You are right - it has
nothing to do with the HW/FW and it is simply a bug that needs to be
fixed. I withdraw my objections and add my ACK.

Acked-by: Eilon Greenstein <eilong@broadcom.com>





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 18, 2013, 5:13 p.m.
From: "Eilon Greenstein" <eilong@broadcom.com>
Date: Mon, 18 Mar 2013 12:06:22 +0200

> Maciej - thanks for the detailed information. You are right - it has
> nothing to do with the HW/FW and it is simply a bug that needs to be
> fixed. I withdraw my objections and add my ACK.
> 
> Acked-by: Eilon Greenstein <eilong@broadcom.com>

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski - March 18, 2013, 7:54 p.m.
I've reproduced this in the lab.

I can confirm this patch fixes per-queue statistics (ie. ethtool -S
"[0]: tx_bytes") - which previously were not monotonically increasing.
However there is still a bug wrt. global device stats (ie. ethtool -S
"tx_bytes" and /proc/net/dev transmit bytes).
I'm guessing there's another similar bug later on during aggregation.

I expect to find the offending code and send out a patch soon.

On Mon, Mar 18, 2013 at 10:13 AM, David Miller <davem@davemloft.net> wrote:
> From: "Eilon Greenstein" <eilong@broadcom.com>
> Date: Mon, 18 Mar 2013 12:06:22 +0200
>
>> Maciej - thanks for the detailed information. You are right - it has
>> nothing to do with the HW/FW and it is simply a bug that needs to be
>> fixed. I withdraw my objections and add my ACK.
>>
>> Acked-by: Eilon Greenstein <eilong@broadcom.com>
>
> Applied and queued up for -stable.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 18, 2013, 8:05 p.m.
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 18 Mar 2013 12:54:49 -0700

> I've reproduced this in the lab.
> 
> I can confirm this patch fixes per-queue statistics (ie. ethtool -S
> "[0]: tx_bytes") - which previously were not monotonically increasing.
> However there is still a bug wrt. global device stats (ie. ethtool -S
> "tx_bytes" and /proc/net/dev transmit bytes).
> I'm guessing there's another similar bug later on during aggregation.
> 
> I expect to find the offending code and send out a patch soon.

Thanks Maciej.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski - March 18, 2013, 8:06 p.m.
Current best guess is UPDATE_ESTAT_QSTAT_64 which uses SUB_64
implemented via DIFF_64 which is crazy.

On Mon, Mar 18, 2013 at 12:54 PM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> I've reproduced this in the lab.
>
> I can confirm this patch fixes per-queue statistics (ie. ethtool -S
> "[0]: tx_bytes") - which previously were not monotonically increasing.
> However there is still a bug wrt. global device stats (ie. ethtool -S
> "tx_bytes" and /proc/net/dev transmit bytes).
> I'm guessing there's another similar bug later on during aggregation.
>
> I expect to find the offending code and send out a patch soon.
>
> On Mon, Mar 18, 2013 at 10:13 AM, David Miller <davem@davemloft.net> wrote:
>> From: "Eilon Greenstein" <eilong@broadcom.com>
>> Date: Mon, 18 Mar 2013 12:06:22 +0200
>>
>>> Maciej - thanks for the detailed information. You are right - it has
>>> nothing to do with the HW/FW and it is simply a bug that needs to be
>>> fixed. I withdraw my objections and add my ACK.
>>>
>>> Acked-by: Eilon Greenstein <eilong@broadcom.com>
>>
>> Applied and queued up for -stable.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - March 18, 2013, 8:35 p.m.
On Mon, 2013-03-18 at 13:06 -0700, Maciej Żenczykowski wrote:
> Current best guess is UPDATE_ESTAT_QSTAT_64 which uses SUB_64
> implemented via DIFF_64 which is crazy.

All these macros are really ugly, I wish someone could cleanup this
code.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski - March 18, 2013, 8:36 p.m.
Not quite the right call sequence above.
  UPDATE_FSTAT_QSTAT(total_bytes_received); --> SUB_64 --> DIFF_64
is probably more relevant.

Regardless:

/* difference = minuend - subtrahend */
#define DIFF_64(d_hi, m_hi, s_hi, d_lo, m_lo, s_lo) \
<------>do { \
<------><------>if (m_lo < s_lo) { \
<------><------><------>/* underflow */ \
<------><------><------>d_hi = m_hi - s_hi; \
<------><------><------>if (d_hi > 0) { \
<------><------><------><------>/* we can 'loan' 1 */ \
<------><------><------><------>d_hi--; \
<------><------><------><------>d_lo = m_lo + (UINT_MAX - s_lo) + 1; \
<------><------><------>} else { \
<------><------><------><------>/* m_hi <= s_hi */ \
<------><------><------><------>d_hi = 0; \
<------><------><------><------>d_lo = 0; \
<------><------><------>} \
<------><------>} else { \
...
I believe this fails.  All parameters are most likely going to be u32,
since that's used for stats pretty much everywhere.
As such after d_hi = m_hi - s_hi;  d_hi will be >= 0 since it's u32.

As such if "m_hi == s_hi" && "m_lo < s_lo" we will return (0,0)
however if "m_hi < s_hi" && "m_lo < s_lo" we will not return (0,0)

I'm not sure which behaviour is desired, but either way this is obviously wrong.

0 - 1 returns 0
0 - (4GB+1) returns -4GB-1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski - March 21, 2013, 9:23 p.m.
Just as an update on the state here.

Extensive testing both in lab and in deployment confirms the already
applied patch fixes the problem.

My concerns about aggregate counters being wrong (after taking another
long look at the data from the 'bad' run)
was simply a matter of me getting mixed up in what was the 'correct'
value I should be seeing.

ie. I did not actually see any bad behaviour with this patch.

That said.  I spent a fair bit of time debugging this before I
realized I was stupid, and found other problems
while doing that.  I cannot actually trigger the potential problems,
but I'll follow up to this with a couple patches
to fix potential problems (and export more stats).

(a) The DIFF_64 macro is buggy in the case of underflow due to unsigned-ness
(b) Any code which does SUB/ADD should instead do ADD/SUB, because
SUB(A,B) is implemented via DIFF(A,B) and thus doesn't actually do
subtraction, but instead does A := max(A-B, 0)
(c) there's a bunch of extra statistics it is trivial to export via
ethtool which make debugging stuff like this much easier
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 21, 2013, 9:25 p.m.
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Thu, 21 Mar 2013 14:23:47 -0700

> (a) The DIFF_64 macro is buggy in the case of underflow due to unsigned-ness
> (b) Any code which does SUB/ADD should instead do ADD/SUB, because
> SUB(A,B) is implemented via DIFF(A,B) and thus doesn't actually do
> subtraction, but instead does A := max(A-B, 0)
> (c) there's a bunch of extra statistics it is trivial to export via
> ethtool which make debugging stuff like this much easier

Thanks for looking into this and the status update.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
index 364e37ecbc5c..198f6f1c9ad5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
@@ -459,8 +459,9 @@  struct bnx2x_fw_port_stats_old {
 
 #define UPDATE_QSTAT(s, t) \
 	do { \
-		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi); \
 		qstats->t##_lo = qstats_old->t##_lo + le32_to_cpu(s.lo); \
+		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi) \
+			+ ((qstats->t##_lo < qstats_old->t##_lo) ? 1 : 0); \
 	} while (0)
 
 #define UPDATE_QSTAT_OLD(f) \