diff mbox

[1/7] mlx4: Added interrupts test support

Message ID 4AC4BD9A.2050101@mellanox.co.il
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Yevgeny Petrilin Oct. 1, 2009, 2:32 p.m. UTC
A test that verifies that we can accept interrupts on all
the irq vectors of the device.
Interrupts are checked using the NOP command.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/eq.c       |   44 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mlx4/device.h |    1 +
 2 files changed, 45 insertions(+), 0 deletions(-)

Comments

Roland Dreier Oct. 2, 2009, 3:32 a.m. UTC | #1
This feels like a pretty risky thing to do while the device might be
handling all sorts of other traffic at the same time.  Are you sure
there are no races you expose here?  Have you actually seen cases where
the interrupt test during initialization works but then this test
catches a problem?  (My experience has been that if any MSI-X interrupts
work from a device, then they'll all work)

 > +/* A test that verifies that we can accept interrupts on all
 > + * the irq vectors of the device.
 > + * Interrupts are checked using the NOP command.
 > + */
 > +int mlx4_test_interrupts(struct mlx4_dev *dev)
 > +{
 > +	struct mlx4_priv *priv = mlx4_priv(dev);
 > +	int i;
 > +	int err;
 > +
 > +	err = mlx4_NOP(dev);
 > +	/* When not in MSI_X, there is only one irq to check */
 > +	if (!(dev->flags & MLX4_FLAG_MSI_X))
 > +		return err;
 > +
 > +	/* A loop over all completion vectors, for each vector we will check
 > +	 * whether it works by mapping command completions to that vector
 > +	 * and performing a NOP command
 > +	 */
 > +	for (i = 0; !err && (i < dev->caps.num_comp_vectors); ++i) {
 > +		/* Temporary use polling for command completions */

you want the adverb form here: "Temporarily"

 > +		mlx4_cmd_use_polling(dev);
 > +
 > +		/* Map the new eq to handle all asyncronous events */

"asynchronous"

 > +		err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0,
 > +				  priv->eq_table.eq[i].eqn);
 > +		if (err) {
 > +			mlx4_warn(dev, "Failed mapping eq for interrupt test\n");
 > +			mlx4_cmd_use_events(dev);
 > +			break;
 > +		}
 > +
 > +		/* Go back to using events */
 > +		mlx4_cmd_use_events(dev);
 > +		err = mlx4_NOP(dev);

You could simplify the code a bit by moving the mlx4_cmd_use_events() to
before where you test err, ie:

		err = mlx4_MAP_EQ(...);
		mlx4_cmd_user_events(dev);
		if (err)
			mlx4_warn(dev, ...)
		else
			err = mlx4_NOP(dev);

 > +	}
 > +
 > +	/* Return to default */
 > +	mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0,
 > +		    priv->eq_table.eq[dev->caps.num_comp_vectors].eqn);
 > +	return err;
 > +}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 2, 2009, 5:27 a.m. UTC | #2
From: Roland Dreier <rdreier@cisco.com>
Date: Thu, 01 Oct 2009 20:32:08 -0700

> This feels like a pretty risky thing to do while the device might be
> handling all sorts of other traffic at the same time.  Are you sure
> there are no races you expose here?  Have you actually seen cases where
> the interrupt test during initialization works but then this test
> catches a problem?  (My experience has been that if any MSI-X interrupts
> work from a device, then they'll all work)

I would suggest only allowing the test while the interface is down.
That way the test has exclusive control of the IRQ.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yevgeny Petrilin Oct. 5, 2009, 11:16 a.m. UTC | #3
Roland Dreier wrote:
> This feels like a pretty risky thing to do while the device might be
> handling all sorts of other traffic at the same time.  Are you sure
> there are no races you expose here? 
We did testing on this, during heavy traffic.
A race can happen when there are two processes that execute that test simultaneously,
but the tests always executed from Ethtool context.
I will add additional protection in case somebody tries to execute this test from another context.

> Have you actually seen cases where
> the interrupt test during initialization works but then this test
> catches a problem?  (My experience has been that if any MSI-X interrupts
> work from a device, then they'll all work)
It also checks that all the EQs work properly. During initialization we only check the asynchronous EQ

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier Oct. 5, 2009, 4:18 p.m. UTC | #4
> > Have you actually seen cases where
 > > the interrupt test during initialization works but then this test
 > > catches a problem?  (My experience has been that if any MSI-X interrupts
 > > work from a device, then they'll all work)

 > It also checks that all the EQs work properly. During initialization
 > we only check the asynchronous EQ

Yes, I understand what the code does.  My question was whether you have
ever actually observed a case where the async EQ works but another EQ
doesn't?  In other words is this test useful in practice?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index bffb799..81619a1 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -698,3 +698,47 @@  void mlx4_cleanup_eq_table(struct mlx4_dev *dev)
 
 	kfree(priv->eq_table.uar_map);
 }
+
+/* A test that verifies that we can accept interrupts on all
+ * the irq vectors of the device.
+ * Interrupts are checked using the NOP command.
+ */
+int mlx4_test_interrupts(struct mlx4_dev *dev)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int i;
+	int err;
+
+	err = mlx4_NOP(dev);
+	/* When not in MSI_X, there is only one irq to check */
+	if (!(dev->flags & MLX4_FLAG_MSI_X))
+		return err;
+
+	/* A loop over all completion vectors, for each vector we will check
+	 * whether it works by mapping command completions to that vector
+	 * and performing a NOP command
+	 */
+	for (i = 0; !err && (i < dev->caps.num_comp_vectors); ++i) {
+		/* Temporary use polling for command completions */
+		mlx4_cmd_use_polling(dev);
+
+		/* Map the new eq to handle all asyncronous events */
+		err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0,
+				  priv->eq_table.eq[i].eqn);
+		if (err) {
+			mlx4_warn(dev, "Failed mapping eq for interrupt test\n");
+			mlx4_cmd_use_events(dev);
+			break;
+		}
+
+		/* Go back to using events */
+		mlx4_cmd_use_events(dev);
+		err = mlx4_NOP(dev);
+	}
+
+	/* Return to default */
+	mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0,
+		    priv->eq_table.eq[dev->caps.num_comp_vectors].eqn);
+	return err;
+}
+EXPORT_SYMBOL(mlx4_test_interrupts);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ce7cc6c..e27a68d 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -480,4 +480,5 @@  void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
 int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
 
+int mlx4_test_interrupts(struct mlx4_dev *dev);
 #endif /* MLX4_DEVICE_H */