diff mbox series

[iwl-net] ice: fix stats being updated by way too large values

Message ID 20240227143124.21015-1-przemyslaw.kitszel@intel.com
State Accepted
Delegated to: Anthony Nguyen
Headers show
Series [iwl-net] ice: fix stats being updated by way too large values | expand

Commit Message

Przemek Kitszel Feb. 27, 2024, 2:31 p.m. UTC
Simplify stats accumulation logic to fix the case where we don't take
previous stat value into account, we should always respect it.

Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
magnitude too big during OpenStack reconfiguration events, possibly other
reconfiguration cases too.

The regression was reported to be between 6.1 and 6.2, so I was almost
certain that on of the two "preserve stats over reset" commits were the
culprit. While reading the code, it was found that in some cases we will
increase the stats by arbitrarily large number (thanks to ignoring "-prev"
part of condition, after zeroing it).

Note that this fixes also the case where we were around limits of u64, but
that was not the regression reported.

Full disclosure: I remember suggesting this particular piece of code to
Ben a few years ago, so blame on me.

Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
Reported-by: Christian Rohmann <christian.rohmann@inovex.de>
Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 24 +++++++++++------------
 1 file changed, 11 insertions(+), 13 deletions(-)


base-commit: 9b23fceb4158a3636ce4a2bda28ab03dcfa6a26f

Comments

Simon Horman Feb. 28, 2024, 10:12 a.m. UTC | #1
On Tue, Feb 27, 2024 at 03:31:06PM +0100, Przemek Kitszel wrote:
> Simplify stats accumulation logic to fix the case where we don't take
> previous stat value into account, we should always respect it.
> 
> Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
> magnitude too big during OpenStack reconfiguration events, possibly other
> reconfiguration cases too.
> 
> The regression was reported to be between 6.1 and 6.2, so I was almost
> certain that on of the two "preserve stats over reset" commits were the
> culprit. While reading the code, it was found that in some cases we will
> increase the stats by arbitrarily large number (thanks to ignoring "-prev"
> part of condition, after zeroing it).
> 
> Note that this fixes also the case where we were around limits of u64, but
> that was not the regression reported.
> 
> Full disclosure: I remember suggesting this particular piece of code to
> Ben a few years ago, so blame on me.
> 
> Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
> Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
> Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
> Reported-by: Christian Rohmann <christian.rohmann@inovex.de>
> Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Pucha, HimasekharX Reddy March 6, 2024, 12:42 p.m. UTC | #2
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Przemek Kitszel
> Sent: Tuesday, February 27, 2024 8:01 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>; netdev@vger.kernel.org; Czapnik, Lukasz <lukasz.czapnik@intel.com>; Lobakin, Aleksander <aleksander.lobakin@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Christian Rohmann <christian.rohmann@inovex.de>
> Subject: [Intel-wired-lan] [PATCH iwl-net] ice: fix stats being updated by way too large values
>
> Simplify stats accumulation logic to fix the case where we don't take
> previous stat value into account, we should always respect it.
>
> Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
> magnitude too big during OpenStack reconfiguration events, possibly other
> reconfiguration cases too.
>
> The regression was reported to be between 6.1 and 6.2, so I was almost
> certain that on of the two "preserve stats over reset" commits were the
> culprit. While reading the code, it was found that in some cases we will
> increase the stats by arbitrarily large number (thanks to ignoring "-prev"
> part of condition, after zeroing it).
>
> Note that this fixes also the case where we were around limits of u64, but
> that was not the regression reported.
>
> Full disclosure: I remember suggesting this particular piece of code to
> Ben a few years ago, so blame on me.
>
> Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
> Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
> Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
> Reported-by: Christian Rohmann <christian.rohmann@inovex.de>
> Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 24 +++++++++++------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>

Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index dd4a9bc0dfdc..a7c7b1b633a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6736,6 +6736,7 @@  static void ice_update_vsi_ring_stats(struct ice_vsi *vsi)
 {
 	struct rtnl_link_stats64 *net_stats, *stats_prev;
 	struct rtnl_link_stats64 *vsi_stats;
+	struct ice_pf *pf = vsi->back;
 	u64 pkts, bytes;
 	int i;
 
@@ -6781,21 +6782,18 @@  static void ice_update_vsi_ring_stats(struct ice_vsi *vsi)
 	net_stats = &vsi->net_stats;
 	stats_prev = &vsi->net_stats_prev;
 
-	/* clear prev counters after reset */
-	if (vsi_stats->tx_packets < stats_prev->tx_packets ||
-	    vsi_stats->rx_packets < stats_prev->rx_packets) {
-		stats_prev->tx_packets = 0;
-		stats_prev->tx_bytes = 0;
-		stats_prev->rx_packets = 0;
-		stats_prev->rx_bytes = 0;
+	/* Update netdev counters, but keep in mind that values could start at
+	 * random value after PF reset. And as we increase the reported stat by
+	 * diff of Prev-Cur, we need to be sure that Prev is valid. If it's not,
+	 * let's skip this round.
+	 */
+	if (likely(pf->stat_prev_loaded)) {
+		net_stats->tx_packets += vsi_stats->tx_packets - stats_prev->tx_packets;
+		net_stats->tx_bytes += vsi_stats->tx_bytes - stats_prev->tx_bytes;
+		net_stats->rx_packets += vsi_stats->rx_packets - stats_prev->rx_packets;
+		net_stats->rx_bytes += vsi_stats->rx_bytes - stats_prev->rx_bytes;
 	}
 
-	/* update netdev counters */
-	net_stats->tx_packets += vsi_stats->tx_packets - stats_prev->tx_packets;
-	net_stats->tx_bytes += vsi_stats->tx_bytes - stats_prev->tx_bytes;
-	net_stats->rx_packets += vsi_stats->rx_packets - stats_prev->rx_packets;
-	net_stats->rx_bytes += vsi_stats->rx_bytes - stats_prev->rx_bytes;
-
 	stats_prev->tx_packets = vsi_stats->tx_packets;
 	stats_prev->tx_bytes = vsi_stats->tx_bytes;
 	stats_prev->rx_packets = vsi_stats->rx_packets;