Message ID | 1314078115-8121-4-git-send-email-sathya.perla@emulex.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Le mardi 23 août 2011 à 11:11 +0530, Sathya Perla a écrit : > The rx_drops_no_frags HW counter for RSS rings is 16bits in HW and can > wraparound often. Maintain a 32-bit accumulator in the driver to prevent > frequent wraparound. > > Also, incorporated Eric's feedback to use ACCESS_ONCE() for the accumulator > write. > > Signed-off-by: Sathya Perla <sathya.perla@emulex.com> > --- > drivers/net/ethernet/emulex/benet/be_main.c | 22 +++++++++++++++++++--- > 1 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c > index 2375c0c..fb2eda0 100644 > --- a/drivers/net/ethernet/emulex/benet/be_main.c > +++ b/drivers/net/ethernet/emulex/benet/be_main.c > @@ -378,6 +378,18 @@ static void populate_lancer_stats(struct be_adapter *adapter) > pport_stats->rx_drops_too_many_frags_lo; > } > > +static void accumulate_16bit_val(u32 *acc, u16 val) > +{ > +#define lo(x) (x & 0xFFFF) > +#define hi(x) (x & 0xFFFF0000) > + bool wrapped = val < lo(*acc); > + u32 newacc = hi(*acc) + val; > + > + if (wrapped) > + newacc += 65536; > + ACCESS_ONCE(*acc) = newacc; > +} > + I still feel something is wrong here : > void be_parse_stats(struct be_adapter *adapter) > { > struct be_erx_stats_v1 *erx = be_erx_stats_from_cmd(adapter); > @@ -394,9 +406,13 @@ void be_parse_stats(struct be_adapter *adapter) > } > > /* as erx_v1 is longer than v0, ok to use v1 defn for v0 access */ > - for_all_rx_queues(adapter, rxo, i) > - rx_stats(rxo)->rx_drops_no_frags = > - erx->rx_drops_no_fragments[rxo->q.id]; previous code was not doing a sum_of_all_queues. It only gave the final erx->rx_drops_no_fragments[rxo->q.id], not taking into account previous rx_stats(rxo)->rx_drops_no_frags value. Your changelog is about wrap around, while the bug might have be different (No real sum) Now you say : previous value is meaningfull, and we add to it 16bits values. > + for_all_rx_queues(adapter, rxo, i) { > + /* below erx HW counter can actually wrap around after > + * 65535. Driver accumulates a 32-bit value > + */ > + accumulate_16bit_val(&rx_stats(rxo)->rx_drops_no_frags, > + (u16)erx->rx_drops_no_fragments[rxo->q.id]); > + } > } > Arent multiple calls to be_parse_stats() will have wrong final rx_drops_no_frags value ? Or are the rx_drops_no_fragments[rxo->q.id] cleared when read ? I am afraid that if HW maintains 16bit values, then the only way is to also have a 16bit accumulator. You cannot detect wraparounds without also keeping a copy of previous 16bit samples. u16 accum = 0; or_all_rx_queues(adapter, rxo, i) { accum += erx->rx_drops_no_fragments[rxo->q.id]; } rx_stats(rxo)->rx_drops_no_frags = accum; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>-----Original Message----- >From: Eric Dumazet [mailto:eric.dumazet@gmail.com] >Sent: Tuesday, August 23, 2011 12:11 PM > >Le mardi 23 août 2011 à 11:11 +0530, Sathya Perla a écrit : >> The rx_drops_no_frags HW counter for RSS rings is 16bits in HW and can >> wraparound often. Maintain a 32-bit accumulator in the driver to prevent >> frequent wraparound. >> >> Also, incorporated Eric's feedback to use ACCESS_ONCE() for the >accumulator >> write. <...> >> >> +static void accumulate_16bit_val(u32 *acc, u16 val) >> +{ >> +#define lo(x) (x & 0xFFFF) >> +#define hi(x) (x & 0xFFFF0000) >> + bool wrapped = val < lo(*acc); >> + u32 newacc = hi(*acc) + val; >> + >> + if (wrapped) >> + newacc += 65536; >> + ACCESS_ONCE(*acc) = newacc; >> +} >> + > >I still feel something is wrong here : > >> void be_parse_stats(struct be_adapter *adapter) >> { >> struct be_erx_stats_v1 *erx = be_erx_stats_from_cmd(adapter); >> @@ -394,9 +406,13 @@ void be_parse_stats(struct be_adapter *adapter) >> } >> >> /* as erx_v1 is longer than v0, ok to use v1 defn for v0 access */ >> - for_all_rx_queues(adapter, rxo, i) >> - rx_stats(rxo)->rx_drops_no_frags = >> - erx->rx_drops_no_fragments[rxo->q.id]; > >previous code was not doing a sum_of_all_queues. Yes, the new code still *does not* do a sum of all queues. The code just parses per-queue stats. For each queue, the 16 bit HW stats value is taken and stored in a 32-bit *per-queue* variable. The comment that "driver accumulates a 32-bit val" may be misleading. The code here is not doing a sum of the per-queue stat. + for_all_rx_queues(adapter, rxo, i) { + /* below erx HW counter can actually wrap around after + * 65535. Driver accumulates a 32-bit value + */ + accumulate_16bit_val(&rx_stats(rxo)->rx_drops_no_frags, + (u16)erx->rx_drops_no_fragments[rxo->q.id]); + } > >It only gave the final erx->rx_drops_no_fragments[rxo->q.id], not taking >into account previous rx_stats(rxo)->rx_drops_no_frags value. > >Your changelog is about wrap around, while the bug might have be >different (No real sum) > >Now you say : previous value is meaningfull, and we add to it 16bits >values. > >> + for_all_rx_queues(adapter, rxo, i) { >> + /* below erx HW counter can actually wrap around after >> + * 65535. Driver accumulates a 32-bit value >> + */ >> + accumulate_16bit_val(&rx_stats(rxo)->rx_drops_no_frags, >> + (u16)erx->rx_drops_no_fragments[rxo->q.id]); >> + } >> } >> > >Arent multiple calls to be_parse_stats() will have wrong final >rx_drops_no_frags value ? > >Or are the rx_drops_no_fragments[rxo->q.id] cleared when read ? The following logic should take care of that: u32 newacc = hi(*acc) + val; Only the upper 16 bits of the accumulator are retained and the new-val of the 16-bit stat is used for the lower 16 bits. So, even if I call this routine 10 times with the same value for val, the accumulator value will not change. Say: accumulator is 0x00010001 Say: new 16-bit hw counter value is now 2 (it was previously 1 and had wraped-aroud before that) accumulate(acc, 2) ==> newacc = 0x00010000 + 2 = 0x00010002 any number of calls to accumulate(acc, 2) will still produce acc = 0x00010002 > >I am afraid that if HW maintains 16bit values, then the only way is to >also have a 16bit accumulator. You cannot detect wraparounds without >also keeping a copy of previous 16bit samples. I don't agree: Say: accumulator is 0x0000FFFF Say: new 16-bit hw counter value is now 0 (after a wrap-aroud) accumulate(acc, 0) ==> newacc = 0x00000000 + 0 = 0x00000000 After the wraparound check newacc = 0x00010000 > >u16 accum = 0; >or_all_rx_queues(adapter, rxo, i) { > accum += erx->rx_drops_no_fragments[rxo->q.id]; >} >rx_stats(rxo)->rx_drops_no_frags = accum; >
Le mardi 23 août 2011 à 00:06 -0700, Sathya.Perla@Emulex.Com a écrit : > >-----Original Message----- > >From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > >Sent: Tuesday, August 23, 2011 12:11 PM > > > >Le mardi 23 août 2011 à 11:11 +0530, Sathya Perla a écrit : > >> The rx_drops_no_frags HW counter for RSS rings is 16bits in HW and can > >> wraparound often. Maintain a 32-bit accumulator in the driver to prevent > >> frequent wraparound. > >> > >> Also, incorporated Eric's feedback to use ACCESS_ONCE() for the > >accumulator > >> write. > <...> > >> > >> +static void accumulate_16bit_val(u32 *acc, u16 val) > >> +{ > >> +#define lo(x) (x & 0xFFFF) > >> +#define hi(x) (x & 0xFFFF0000) > >> + bool wrapped = val < lo(*acc); > >> + u32 newacc = hi(*acc) + val; > >> + > >> + if (wrapped) > >> + newacc += 65536; > >> + ACCESS_ONCE(*acc) = newacc; > >> +} > >> + > > > >I still feel something is wrong here : > > > >> void be_parse_stats(struct be_adapter *adapter) > >> { > >> struct be_erx_stats_v1 *erx = be_erx_stats_from_cmd(adapter); > >> @@ -394,9 +406,13 @@ void be_parse_stats(struct be_adapter *adapter) > >> } > >> > >> /* as erx_v1 is longer than v0, ok to use v1 defn for v0 access */ > >> - for_all_rx_queues(adapter, rxo, i) > >> - rx_stats(rxo)->rx_drops_no_frags = > >> - erx->rx_drops_no_fragments[rxo->q.id]; > > > >previous code was not doing a sum_of_all_queues. > Yes, the new code still *does not* do a sum of all queues. The code just > parses per-queue stats. For each queue, the 16 bit > HW stats value is taken and stored in a 32-bit *per-queue* variable. > The comment that "driver accumulates a 32-bit val" may be misleading. The > code here is not doing a sum of the per-queue stat. Ah ok, I now see the code intent, and everything seems fine now, thanks for explaining. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 2375c0c..fb2eda0 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -378,6 +378,18 @@ static void populate_lancer_stats(struct be_adapter *adapter) pport_stats->rx_drops_too_many_frags_lo; } +static void accumulate_16bit_val(u32 *acc, u16 val) +{ +#define lo(x) (x & 0xFFFF) +#define hi(x) (x & 0xFFFF0000) + bool wrapped = val < lo(*acc); + u32 newacc = hi(*acc) + val; + + if (wrapped) + newacc += 65536; + ACCESS_ONCE(*acc) = newacc; +} + void be_parse_stats(struct be_adapter *adapter) { struct be_erx_stats_v1 *erx = be_erx_stats_from_cmd(adapter); @@ -394,9 +406,13 @@ void be_parse_stats(struct be_adapter *adapter) } /* as erx_v1 is longer than v0, ok to use v1 defn for v0 access */ - for_all_rx_queues(adapter, rxo, i) - rx_stats(rxo)->rx_drops_no_frags = - erx->rx_drops_no_fragments[rxo->q.id]; + for_all_rx_queues(adapter, rxo, i) { + /* below erx HW counter can actually wrap around after + * 65535. Driver accumulates a 32-bit value + */ + accumulate_16bit_val(&rx_stats(rxo)->rx_drops_no_frags, + (u16)erx->rx_drops_no_fragments[rxo->q.id]); + } } static struct rtnl_link_stats64 *be_get_stats64(struct net_device *netdev,
The rx_drops_no_frags HW counter for RSS rings is 16bits in HW and can wraparound often. Maintain a 32-bit accumulator in the driver to prevent frequent wraparound. Also, incorporated Eric's feedback to use ACCESS_ONCE() for the accumulator write. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> --- drivers/net/ethernet/emulex/benet/be_main.c | 22 +++++++++++++++++++--- 1 files changed, 19 insertions(+), 3 deletions(-)