diff mbox

[net-next] net/mlx4_core: Warn if device doesn't have enough PCI bandwidth

Message ID 1388863024-8718-1-git-send-email-amirv@mellanox.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Amir Vadai Jan. 4, 2014, 7:17 p.m. UTC
From: Eyal Perry <eyalpe@mellanox.com>

Check if the device get enough bandwidth from the entire PCI chain to satisfy
its capabilities. This patch determines the PCIe device's bandwidth capabilities
by reading its PCIe Link Capabilities registers and then call the
pcie_get_minimum_link function to ensure that the adapter is hooked into a slot
which is capable of providing the necessary bandwidth capabilities.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>

---
We're still working on fixing/discussing the other 2 patches for
mlx4_core/mlx4_en drivers, which are under review in the list now.
meanwhile I would like to push this patch which has no conflicts with those 2.

Thanks,
Amir

 drivers/net/ethernet/mellanox/mlx4/main.c | 84 +++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)

Comments

David Miller Jan. 5, 2014, 1:16 a.m. UTC | #1
From: Amir Vadai <amirv@mellanox.com>
Date: Sat,  4 Jan 2014 21:17:04 +0200

> +static int mlx4_get_pcie_dev_link_caps(struct mlx4_dev *dev,
 ...
> +	mlx4_check_pcie_caps(dev);

You don't care about the return value, make it return void.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Jan. 6, 2014, 9:15 p.m. UTC | #2
On Sat, 2014-01-04 at 21:17 +0200, Amir Vadai wrote:
> From: Eyal Perry <eyalpe@mellanox.com>
> 
> Check if the device get enough bandwidth from the entire PCI chain to satisfy
> its capabilities. This patch determines the PCIe device's bandwidth capabilities
> by reading its PCIe Link Capabilities registers and then call the
> pcie_get_minimum_link function to ensure that the adapter is hooked into a slot
> which is capable of providing the necessary bandwidth capabilities.
[...]

This is essentially another duplicate of what ixgbe and i40e are
doing...  (And the out-of-tree version of sfc does it too, but I never
felt that was ready for in-tree.)

We ought to have a generic PCI layer function that warns when a PCIe
device is running below maximum link width/speed.  Maybe even run it as
soon as the device is enumerated, so that a driver doesn't need to do
anything.

Ben.
David Miller Jan. 6, 2014, 9:21 p.m. UTC | #3
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 6 Jan 2014 21:15:55 +0000

> We ought to have a generic PCI layer function that warns when a PCIe
> device is running below maximum link width/speed.  Maybe even run it as
> soon as the device is enumerated, so that a driver doesn't need to do
> anything.

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keller, Jacob E Jan. 15, 2014, 11:15 p.m. UTC | #4
On Mon, 2014-01-06 at 21:15 +0000, Ben Hutchings wrote:
> On Sat, 2014-01-04 at 21:17 +0200, Amir Vadai wrote:

> > From: Eyal Perry <eyalpe@mellanox.com>

> > 

> > Check if the device get enough bandwidth from the entire PCI chain to satisfy

> > its capabilities. This patch determines the PCIe device's bandwidth capabilities

> > by reading its PCIe Link Capabilities registers and then call the

> > pcie_get_minimum_link function to ensure that the adapter is hooked into a slot

> > which is capable of providing the necessary bandwidth capabilities.

> [...]

> 

> This is essentially another duplicate of what ixgbe and i40e are

> doing...  (And the out-of-tree version of sfc does it too, but I never

> felt that was ready for in-tree.)

> 

> We ought to have a generic PCI layer function that warns when a PCIe

> device is running below maximum link width/speed.  Maybe even run it as

> soon as the device is enumerated, so that a driver doesn't need to do

> anything.

> 

> Ben.

> 


Hi,

I was thinking about this again, was wondering a few things. Is this
something you were already investigating?

On an implementation note, how would this function know how much
bandwidth a particular device (or function?) would require? I'm thinking
of something along the lines of a driver essentially saying how much the
devices it supports require?

Thanks,
Jake
Ben Hutchings Jan. 15, 2014, 11:33 p.m. UTC | #5
On Wed, 2014-01-15 at 23:15 +0000, Keller, Jacob E wrote:
> On Mon, 2014-01-06 at 21:15 +0000, Ben Hutchings wrote:
> > On Sat, 2014-01-04 at 21:17 +0200, Amir Vadai wrote:
> > > From: Eyal Perry <eyalpe@mellanox.com>
> > > 
> > > Check if the device get enough bandwidth from the entire PCI chain to satisfy
> > > its capabilities. This patch determines the PCIe device's bandwidth capabilities
> > > by reading its PCIe Link Capabilities registers and then call the
> > > pcie_get_minimum_link function to ensure that the adapter is hooked into a slot
> > > which is capable of providing the necessary bandwidth capabilities.
> > [...]
> > 
> > This is essentially another duplicate of what ixgbe and i40e are
> > doing...  (And the out-of-tree version of sfc does it too, but I never
> > felt that was ready for in-tree.)
> > 
> > We ought to have a generic PCI layer function that warns when a PCIe
> > device is running below maximum link width/speed.  Maybe even run it as
> > soon as the device is enumerated, so that a driver doesn't need to do
> > anything.
> > 
> > Ben.
> > 
> 
> Hi,
> 
> I was thinking about this again, was wondering a few things. Is this
> something you were already investigating?

No, I'm busy with other things.

> On an implementation note, how would this function know how much
> bandwidth a particular device (or function?) would require? I'm thinking
> of something along the lines of a driver essentially saying how much the
> devices it supports require?

I was thinking you could generically compare the link status with link
capabilities of the endpoint, i.e. actual versus maximum possible
bandwidth.

In some cases the link capabilities may be more than you really need.
For example, given a 10/40G controller capable of PCIe gen3 x8, on a
board that only has a single 10G port, you could put the board in a gen1
x8 slot and still have enough PCIe bandwidth to saturate the Ethernet
link.  However it will have higher latency compared to a gen3 x8 slot.
So I think the generic comparison would be OK as long as the log message
and severity is not too alarming.

Ben.
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index d2b8b39..417a595 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -388,6 +388,84 @@  static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 
 	return 0;
 }
+
+static int mlx4_get_pcie_dev_link_caps(struct mlx4_dev *dev,
+				       enum pci_bus_speed *speed,
+				       enum pcie_link_width *width)
+{
+	u32 lnkcap1, lnkcap2;
+	int err1, err2;
+
+#define  PCIE_MLW_CAP_SHIFT 4	/* start of MLW mask in link capabilities */
+
+	*speed = PCI_SPEED_UNKNOWN;
+	*width = PCIE_LNK_WIDTH_UNKNOWN;
+
+	err1 = pcie_capability_read_dword(dev->pdev, PCI_EXP_LNKCAP, &lnkcap1);
+	err2 = pcie_capability_read_dword(dev->pdev, PCI_EXP_LNKCAP2, &lnkcap2);
+	if (!err2 && lnkcap2) { /* PCIe r3.0-compliant */
+		if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
+			*speed = PCIE_SPEED_8_0GT;
+		else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB)
+			*speed = PCIE_SPEED_5_0GT;
+		else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB)
+			*speed = PCIE_SPEED_2_5GT;
+	}
+	if (!err1) {
+		*width = (lnkcap1 & PCI_EXP_LNKCAP_MLW) >> PCIE_MLW_CAP_SHIFT;
+		if (!lnkcap2) { /* pre-r3.0 */
+			if (lnkcap1 & PCI_EXP_LNKCAP_SLS_5_0GB)
+				*speed = PCIE_SPEED_5_0GT;
+			else if (lnkcap1 & PCI_EXP_LNKCAP_SLS_2_5GB)
+				*speed = PCIE_SPEED_2_5GT;
+		}
+	}
+
+	if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN) {
+		return err1 ? err1 :
+			err2 ? err2 : -EINVAL;
+	}
+	return 0;
+}
+
+static int mlx4_check_pcie_caps(struct mlx4_dev *dev)
+{
+	enum pcie_link_width width, width_cap;
+	enum pci_bus_speed speed, speed_cap;
+	int err;
+
+#define PCIE_SPEED_STR(speed) \
+	(speed == PCIE_SPEED_8_0GT ? "8.0GT/s" : \
+	 speed == PCIE_SPEED_5_0GT ? "5.0GT/s" : \
+	 speed == PCIE_SPEED_2_5GT ? "2.5GT/s" : \
+	 "Unknown")
+
+	err = mlx4_get_pcie_dev_link_caps(dev, &speed_cap, &width_cap);
+	if (err) {
+		mlx4_warn(dev,
+			  "Unable to determine PCIe device BW capabilities\n");
+		return err;
+	}
+
+	err = pcie_get_minimum_link(dev->pdev, &speed, &width);
+	if (err || speed == PCI_SPEED_UNKNOWN ||
+	    width == PCIE_LNK_WIDTH_UNKNOWN) {
+		mlx4_warn(dev,
+			  "Unable to determine PCI device chain minimum BW\n");
+		return err ? err : -EINVAL;
+	}
+
+	if (width != width_cap || speed != speed_cap)
+		mlx4_warn(dev,
+			  "PCIe BW is different than device's capability\n");
+
+	mlx4_info(dev, "PCIe link speed is %s, device supports %s\n",
+		  PCIE_SPEED_STR(speed), PCIE_SPEED_STR(speed_cap));
+	mlx4_info(dev, "PCIe link width is x%d, device supports x%d\n",
+		  width, width_cap);
+	return 0;
+}
+
 /*The function checks if there are live vf, return the num of them*/
 static int mlx4_how_many_lives_vf(struct mlx4_dev *dev)
 {
@@ -2306,6 +2384,12 @@  slave_start:
 			goto err_mfunc;
 	}
 
+	/* check if the device is functioning at its maximum possible speed
+	 * ignoring function return code, just warn the user in case of PCI
+	 * express device capabilities are under-satisfied by the bus.
+	 */
+	mlx4_check_pcie_caps(dev);
+
 	/* In master functions, the communication channel must be initialized
 	 * after obtaining its address from fw */
 	if (mlx4_is_master(dev)) {