Message ID | 4AC4BD9A.2050101@mellanox.co.il |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
This feels like a pretty risky thing to do while the device might be handling all sorts of other traffic at the same time. Are you sure there are no races you expose here? Have you actually seen cases where the interrupt test during initialization works but then this test catches a problem? (My experience has been that if any MSI-X interrupts work from a device, then they'll all work) > +/* A test that verifies that we can accept interrupts on all > + * the irq vectors of the device. > + * Interrupts are checked using the NOP command. > + */ > +int mlx4_test_interrupts(struct mlx4_dev *dev) > +{ > + struct mlx4_priv *priv = mlx4_priv(dev); > + int i; > + int err; > + > + err = mlx4_NOP(dev); > + /* When not in MSI_X, there is only one irq to check */ > + if (!(dev->flags & MLX4_FLAG_MSI_X)) > + return err; > + > + /* A loop over all completion vectors, for each vector we will check > + * whether it works by mapping command completions to that vector > + * and performing a NOP command > + */ > + for (i = 0; !err && (i < dev->caps.num_comp_vectors); ++i) { > + /* Temporary use polling for command completions */ you want the adverb form here: "Temporarily" > + mlx4_cmd_use_polling(dev); > + > + /* Map the new eq to handle all asyncronous events */ "asynchronous" > + err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0, > + priv->eq_table.eq[i].eqn); > + if (err) { > + mlx4_warn(dev, "Failed mapping eq for interrupt test\n"); > + mlx4_cmd_use_events(dev); > + break; > + } > + > + /* Go back to using events */ > + mlx4_cmd_use_events(dev); > + err = mlx4_NOP(dev); You could simplify the code a bit by moving the mlx4_cmd_use_events() to before where you test err, ie: err = mlx4_MAP_EQ(...); mlx4_cmd_user_events(dev); if (err) mlx4_warn(dev, ...) else err = mlx4_NOP(dev); > + } > + > + /* Return to default */ > + mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0, > + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); > + return err; > +} -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Roland Dreier <rdreier@cisco.com> Date: Thu, 01 Oct 2009 20:32:08 -0700 > This feels like a pretty risky thing to do while the device might be > handling all sorts of other traffic at the same time. Are you sure > there are no races you expose here? Have you actually seen cases where > the interrupt test during initialization works but then this test > catches a problem? (My experience has been that if any MSI-X interrupts > work from a device, then they'll all work) I would suggest only allowing the test while the interface is down. That way the test has exclusive control of the IRQ. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Roland Dreier wrote: > This feels like a pretty risky thing to do while the device might be > handling all sorts of other traffic at the same time. Are you sure > there are no races you expose here? We did testing on this, during heavy traffic. A race can happen when there are two processes that execute that test simultaneously, but the tests always executed from Ethtool context. I will add additional protection in case somebody tries to execute this test from another context. > Have you actually seen cases where > the interrupt test during initialization works but then this test > catches a problem? (My experience has been that if any MSI-X interrupts > work from a device, then they'll all work) It also checks that all the EQs work properly. During initialization we only check the asynchronous EQ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Have you actually seen cases where > > the interrupt test during initialization works but then this test > > catches a problem? (My experience has been that if any MSI-X interrupts > > work from a device, then they'll all work) > It also checks that all the EQs work properly. During initialization > we only check the asynchronous EQ Yes, I understand what the code does. My question was whether you have ever actually observed a case where the async EQ works but another EQ doesn't? In other words is this test useful in practice? - R. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index bffb799..81619a1 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -698,3 +698,47 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev) kfree(priv->eq_table.uar_map); } + +/* A test that verifies that we can accept interrupts on all + * the irq vectors of the device. + * Interrupts are checked using the NOP command. + */ +int mlx4_test_interrupts(struct mlx4_dev *dev) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + int i; + int err; + + err = mlx4_NOP(dev); + /* When not in MSI_X, there is only one irq to check */ + if (!(dev->flags & MLX4_FLAG_MSI_X)) + return err; + + /* A loop over all completion vectors, for each vector we will check + * whether it works by mapping command completions to that vector + * and performing a NOP command + */ + for (i = 0; !err && (i < dev->caps.num_comp_vectors); ++i) { + /* Temporary use polling for command completions */ + mlx4_cmd_use_polling(dev); + + /* Map the new eq to handle all asyncronous events */ + err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0, + priv->eq_table.eq[i].eqn); + if (err) { + mlx4_warn(dev, "Failed mapping eq for interrupt test\n"); + mlx4_cmd_use_events(dev); + break; + } + + /* Go back to using events */ + mlx4_cmd_use_events(dev); + err = mlx4_NOP(dev); + } + + /* Return to default */ + mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0, + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); + return err; +} +EXPORT_SYMBOL(mlx4_test_interrupts); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ce7cc6c..e27a68d 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -480,4 +480,5 @@ void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr, int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_SYNC_TPT(struct mlx4_dev *dev); +int mlx4_test_interrupts(struct mlx4_dev *dev); #endif /* MLX4_DEVICE_H */
A test that verifies that we can accept interrupts on all the irq vectors of the device. Interrupts are checked using the NOP command. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> --- drivers/net/mlx4/eq.c | 44 +++++++++++++++++++++++++++++++++++++++++++ include/linux/mlx4/device.h | 1 + 2 files changed, 45 insertions(+), 0 deletions(-)