Message ID | 1480539652.18162.205.camel@edumazet-glaptop3.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 30 Nov 2016 13:00:52 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote: > > On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote: [...] > > > > > > I am suggesting I will fix the bug I introduced. > > > > > > Do not panic. > > > > > > > > > > Not at all, I trust you are the only one who is capable of providing > > the best solution. > > I am just trying to read your mind :-). > > > > As i said i like the solution and i want to adapt it to mlx5, so I am > > a little bit enthusiastic :) > > What about the following fix guys ? Confirming this fixed the crash on shutdown for me, thanks! > As a bonus we update the stats right before they are sent to monitors > via rtnetlink ;) > > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > index 12ea3405f442717478bf0e8882edaf0de77986cb..091b904262bc7932d3edf99cf850affb23b9ce6e 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > @@ -1809,8 +1809,12 @@ void mlx4_en_stop_port(struct net_device *dev, int detach) > > netif_tx_disable(dev); > > + spin_lock_bh(&priv->stats_lock); > + mlx4_en_fold_software_stats(dev); > /* Set port as not active */ > priv->port_up = false; > + spin_unlock_bh(&priv->stats_lock); > + > priv->counter_index = MLX4_SINK_COUNTER_INDEX(mdev->dev); > > /* Promsicuous mode */ > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c > index c6c4f1238923e09eced547454b86c68720292859..9166d90e732858610b1407fe85cbf6cbe27f5e0b 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c > @@ -154,7 +154,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev) > unsigned long packets, bytes; > int i; > > - if (mlx4_is_master(mdev->dev)) > + if (!priv->port_up || mlx4_is_master(mdev->dev)) > return; > > packets = 0; > >
On Wed, Nov 30, 2016 at 11:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote: >> On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote: >> > >> >> we had/still have the proper stats they are the ones that >> >> mlx4_en_fold_software_stats is trying to cache into (they always >> >> exist), >> >> but the ones that you are trying to read from (the mlx4 rings) are gone ! >> >> >> >> This bug is totally new and as i warned, this is another symptom of >> >> the real root cause (can't sleep while reading stats). >> >> >> >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats and >> >> always iterate over all of them to query stats ? >> >> what if you have one ring/none/1K ? how would you know how many to query ? >> > >> > I am suggesting I will fix the bug I introduced. >> > >> > Do not panic. >> > >> > >> >> Not at all, I trust you are the only one who is capable of providing >> the best solution. >> I am just trying to read your mind :-). >> >> As i said i like the solution and i want to adapt it to mlx5, so I am >> a little bit enthusiastic :) > > What about the following fix guys ? > > As a bonus we update the stats right before they are sent to monitors > via rtnetlink ;) Hi Eric, Thanks for the patch, I already acked it. I have one educational question (not related to this patch, but related to stats reading in general). I was wondering why do we need to disable bh every time we read stats "spin_lock_bh" ? is it essential ? I checked and in mlx4 we don't hold stats_lock in softirq (en_rx.c/en_tx.c), so I don't see any deadlock risk in here.. Thanks Saeed.
On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote: > > Hi Eric, Thanks for the patch, I already acked it. Thanks ! > > I have one educational question (not related to this patch, but > related to stats reading in general). > I was wondering why do we need to disable bh every time we read stats > "spin_lock_bh" ? is it essential ? > > I checked and in mlx4 we don't hold stats_lock in softirq > (en_rx.c/en_tx.c), so I don't see any deadlock risk in here.. Excellent question, and I chose to keep the spinlock. That would be doable, only if we do not overwrite dev->stats. Current code is : static struct rtnl_link_stats64 * mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct mlx4_en_priv *priv = netdev_priv(dev); spin_lock_bh(&priv->stats_lock); mlx4_en_fold_software_stats(dev); netdev_stats_to_stats64(stats, &dev->stats); spin_unlock_bh(&priv->stats_lock); return stats; } If you remove the spin_lock_bh() : static struct rtnl_link_stats64 * mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct mlx4_en_priv *priv = netdev_priv(dev); mlx4_en_fold_software_stats(dev); // possible races netdev_stats_to_stats64(stats, &dev->stats); return stats; } 1) one mlx4_en_fold_software_stats(dev) could be preempted on a CONFIG_PREEMPT kernel, or interrupted by long irqs. 2) Another cpu would also call mlx4_en_fold_software_stats(dev) while first cpu is busy. 3) Then when resuming first cpu/thread, part of the dev->stats fieds would be updated with 'old counters', while another thread might have updated them with newer values. 4) A SNMP reader could then get counters that are not monotonically increasing, which would be confusing/buggy. So removing the spinlock is doable, but needs to add a new parameter to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64() before mlx4_en_fold_software_stats(dev) static struct rtnl_link_stats64 * mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct mlx4_en_priv *priv = netdev_priv(dev); netdev_stats_to_stats64(stats, &dev->stats); // Passing a non NULL stats asks mlx4_en_fold_software_stats() // to not update dev->stats, but stats directly. mlx4_en_fold_software_stats(dev, stats) return stats; }
On Thu, Dec 1, 2016 at 5:55 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote: > >> >> Hi Eric, Thanks for the patch, I already acked it. > > Thanks ! > >> >> I have one educational question (not related to this patch, but >> related to stats reading in general). >> I was wondering why do we need to disable bh every time we read stats >> "spin_lock_bh" ? is it essential ? >> >> I checked and in mlx4 we don't hold stats_lock in softirq >> (en_rx.c/en_tx.c), so I don't see any deadlock risk in here.. > > Excellent question, and I chose to keep the spinlock. > > That would be doable, only if we do not overwrite dev->stats. > > Current code is : > > static struct rtnl_link_stats64 * > mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > { > struct mlx4_en_priv *priv = netdev_priv(dev); > > spin_lock_bh(&priv->stats_lock); > mlx4_en_fold_software_stats(dev); > netdev_stats_to_stats64(stats, &dev->stats); > spin_unlock_bh(&priv->stats_lock); > > return stats; > } > > If you remove the spin_lock_bh() : > > > static struct rtnl_link_stats64 * > mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > { > struct mlx4_en_priv *priv = netdev_priv(dev); > > mlx4_en_fold_software_stats(dev); // possible races > > netdev_stats_to_stats64(stats, &dev->stats); > > return stats; > } > > 1) one mlx4_en_fold_software_stats(dev) could be preempted > on a CONFIG_PREEMPT kernel, or interrupted by long irqs. > > 2) Another cpu would also call mlx4_en_fold_software_stats(dev) while > first cpu is busy. > > 3) Then when resuming first cpu/thread, part of the dev->stats fieds > would be updated with 'old counters', > while another thread might have updated them with newer values. > > 4) A SNMP reader could then get counters that are not monotonically > increasing, > which would be confusing/buggy. > > So removing the spinlock is doable, but needs to add a new parameter > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64() > before mlx4_en_fold_software_stats(dev) > > static struct rtnl_link_stats64 * > mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > { > struct mlx4_en_priv *priv = netdev_priv(dev); > > netdev_stats_to_stats64(stats, &dev->stats); > > // Passing a non NULL stats asks mlx4_en_fold_software_stats() > // to not update dev->stats, but stats directly. > > mlx4_en_fold_software_stats(dev, stats) > > > return stats; > } > > Thanks for the detailed answer !! BTW you went 5 steps ahead of my original question :)), so far you already have a patch without locking at all (really impressive). What i wanted to ask originally, was regarding the "_bh", i didn't mean to completely remove the "spin_lock_bh", I meant, what happens if we replace "spin_lock_bh" with "spin_lock", without disabling bh ? I gues raw "sping_lock" handles points (2 to 4) from above, but it won't handle long irqs.
On Thu, 2016-12-01 at 18:33 +0200, Saeed Mahameed wrote: > Thanks for the detailed answer !! You're welcome. > > BTW you went 5 steps ahead of my original question :)), so far you > already have a patch without locking at all (really impressive). > > What i wanted to ask originally, was regarding the "_bh", i didn't > mean to completely remove the "spin_lock_bh", > I meant, what happens if we replace "spin_lock_bh" with "spin_lock", > without disabling bh ? > I gues raw "sping_lock" handles points (2 to 4) from above, but it > won't handle long irqs. Thats a very good point, the _bh prefix can totally be removed, since stats_lock is only acquired from process context.
On Thu, Dec 1, 2016 at 7:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2016-12-01 at 18:33 +0200, Saeed Mahameed wrote: > >> Thanks for the detailed answer !! > > You're welcome. > >> >> BTW you went 5 steps ahead of my original question :)), so far you >> already have a patch without locking at all (really impressive). >> >> What i wanted to ask originally, was regarding the "_bh", i didn't >> mean to completely remove the "spin_lock_bh", >> I meant, what happens if we replace "spin_lock_bh" with "spin_lock", >> without disabling bh ? >> I gues raw "sping_lock" handles points (2 to 4) from above, but it >> won't handle long irqs. > > Thats a very good point, the _bh prefix can totally be removed, since > stats_lock is only acquired from process context. > > That was my initial point, Thanks for the help. will provide a fix patch later once 4.9 is release.
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 12ea3405f442717478bf0e8882edaf0de77986cb..091b904262bc7932d3edf99cf850affb23b9ce6e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -1809,8 +1809,12 @@ void mlx4_en_stop_port(struct net_device *dev, int detach) netif_tx_disable(dev); + spin_lock_bh(&priv->stats_lock); + mlx4_en_fold_software_stats(dev); /* Set port as not active */ priv->port_up = false; + spin_unlock_bh(&priv->stats_lock); + priv->counter_index = MLX4_SINK_COUNTER_INDEX(mdev->dev); /* Promsicuous mode */ diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c index c6c4f1238923e09eced547454b86c68720292859..9166d90e732858610b1407fe85cbf6cbe27f5e0b 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c @@ -154,7 +154,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev) unsigned long packets, bytes; int i; - if (mlx4_is_master(mdev->dev)) + if (!priv->port_up || mlx4_is_master(mdev->dev)) return; packets = 0;