Message ID | 1388863024-8718-1-git-send-email-amirv@mellanox.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Amir Vadai <amirv@mellanox.com> Date: Sat, 4 Jan 2014 21:17:04 +0200 > +static int mlx4_get_pcie_dev_link_caps(struct mlx4_dev *dev, ... > + mlx4_check_pcie_caps(dev); You don't care about the return value, make it return void. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2014-01-04 at 21:17 +0200, Amir Vadai wrote: > From: Eyal Perry <eyalpe@mellanox.com> > > Check if the device get enough bandwidth from the entire PCI chain to satisfy > its capabilities. This patch determines the PCIe device's bandwidth capabilities > by reading its PCIe Link Capabilities registers and then call the > pcie_get_minimum_link function to ensure that the adapter is hooked into a slot > which is capable of providing the necessary bandwidth capabilities. [...] This is essentially another duplicate of what ixgbe and i40e are doing... (And the out-of-tree version of sfc does it too, but I never felt that was ready for in-tree.) We ought to have a generic PCI layer function that warns when a PCIe device is running below maximum link width/speed. Maybe even run it as soon as the device is enumerated, so that a driver doesn't need to do anything. Ben.
From: Ben Hutchings <bhutchings@solarflare.com> Date: Mon, 6 Jan 2014 21:15:55 +0000 > We ought to have a generic PCI layer function that warns when a PCIe > device is running below maximum link width/speed. Maybe even run it as > soon as the device is enumerated, so that a driver doesn't need to do > anything. Agreed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-01-06 at 21:15 +0000, Ben Hutchings wrote: > On Sat, 2014-01-04 at 21:17 +0200, Amir Vadai wrote: > > From: Eyal Perry <eyalpe@mellanox.com> > > > > Check if the device get enough bandwidth from the entire PCI chain to satisfy > > its capabilities. This patch determines the PCIe device's bandwidth capabilities > > by reading its PCIe Link Capabilities registers and then call the > > pcie_get_minimum_link function to ensure that the adapter is hooked into a slot > > which is capable of providing the necessary bandwidth capabilities. > [...] > > This is essentially another duplicate of what ixgbe and i40e are > doing... (And the out-of-tree version of sfc does it too, but I never > felt that was ready for in-tree.) > > We ought to have a generic PCI layer function that warns when a PCIe > device is running below maximum link width/speed. Maybe even run it as > soon as the device is enumerated, so that a driver doesn't need to do > anything. > > Ben. > Hi, I was thinking about this again, was wondering a few things. Is this something you were already investigating? On an implementation note, how would this function know how much bandwidth a particular device (or function?) would require? I'm thinking of something along the lines of a driver essentially saying how much the devices it supports require? Thanks, Jake
On Wed, 2014-01-15 at 23:15 +0000, Keller, Jacob E wrote: > On Mon, 2014-01-06 at 21:15 +0000, Ben Hutchings wrote: > > On Sat, 2014-01-04 at 21:17 +0200, Amir Vadai wrote: > > > From: Eyal Perry <eyalpe@mellanox.com> > > > > > > Check if the device get enough bandwidth from the entire PCI chain to satisfy > > > its capabilities. This patch determines the PCIe device's bandwidth capabilities > > > by reading its PCIe Link Capabilities registers and then call the > > > pcie_get_minimum_link function to ensure that the adapter is hooked into a slot > > > which is capable of providing the necessary bandwidth capabilities. > > [...] > > > > This is essentially another duplicate of what ixgbe and i40e are > > doing... (And the out-of-tree version of sfc does it too, but I never > > felt that was ready for in-tree.) > > > > We ought to have a generic PCI layer function that warns when a PCIe > > device is running below maximum link width/speed. Maybe even run it as > > soon as the device is enumerated, so that a driver doesn't need to do > > anything. > > > > Ben. > > > > Hi, > > I was thinking about this again, was wondering a few things. Is this > something you were already investigating? No, I'm busy with other things. > On an implementation note, how would this function know how much > bandwidth a particular device (or function?) would require? I'm thinking > of something along the lines of a driver essentially saying how much the > devices it supports require? I was thinking you could generically compare the link status with link capabilities of the endpoint, i.e. actual versus maximum possible bandwidth. In some cases the link capabilities may be more than you really need. For example, given a 10/40G controller capable of PCIe gen3 x8, on a board that only has a single 10G port, you could put the board in a gen1 x8 slot and still have enough PCIe bandwidth to saturate the Ethernet link. However it will have higher latency compared to a gen3 x8 slot. So I think the generic comparison would be OK as long as the log message and severity is not too alarming. Ben.
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index d2b8b39..417a595 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -388,6 +388,84 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) return 0; } + +static int mlx4_get_pcie_dev_link_caps(struct mlx4_dev *dev, + enum pci_bus_speed *speed, + enum pcie_link_width *width) +{ + u32 lnkcap1, lnkcap2; + int err1, err2; + +#define PCIE_MLW_CAP_SHIFT 4 /* start of MLW mask in link capabilities */ + + *speed = PCI_SPEED_UNKNOWN; + *width = PCIE_LNK_WIDTH_UNKNOWN; + + err1 = pcie_capability_read_dword(dev->pdev, PCI_EXP_LNKCAP, &lnkcap1); + err2 = pcie_capability_read_dword(dev->pdev, PCI_EXP_LNKCAP2, &lnkcap2); + if (!err2 && lnkcap2) { /* PCIe r3.0-compliant */ + if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB) + *speed = PCIE_SPEED_8_0GT; + else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB) + *speed = PCIE_SPEED_5_0GT; + else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB) + *speed = PCIE_SPEED_2_5GT; + } + if (!err1) { + *width = (lnkcap1 & PCI_EXP_LNKCAP_MLW) >> PCIE_MLW_CAP_SHIFT; + if (!lnkcap2) { /* pre-r3.0 */ + if (lnkcap1 & PCI_EXP_LNKCAP_SLS_5_0GB) + *speed = PCIE_SPEED_5_0GT; + else if (lnkcap1 & PCI_EXP_LNKCAP_SLS_2_5GB) + *speed = PCIE_SPEED_2_5GT; + } + } + + if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN) { + return err1 ? err1 : + err2 ? err2 : -EINVAL; + } + return 0; +} + +static int mlx4_check_pcie_caps(struct mlx4_dev *dev) +{ + enum pcie_link_width width, width_cap; + enum pci_bus_speed speed, speed_cap; + int err; + +#define PCIE_SPEED_STR(speed) \ + (speed == PCIE_SPEED_8_0GT ? "8.0GT/s" : \ + speed == PCIE_SPEED_5_0GT ? "5.0GT/s" : \ + speed == PCIE_SPEED_2_5GT ? "2.5GT/s" : \ + "Unknown") + + err = mlx4_get_pcie_dev_link_caps(dev, &speed_cap, &width_cap); + if (err) { + mlx4_warn(dev, + "Unable to determine PCIe device BW capabilities\n"); + return err; + } + + err = pcie_get_minimum_link(dev->pdev, &speed, &width); + if (err || speed == PCI_SPEED_UNKNOWN || + width == PCIE_LNK_WIDTH_UNKNOWN) { + mlx4_warn(dev, + "Unable to determine PCI device chain minimum BW\n"); + return err ? err : -EINVAL; + } + + if (width != width_cap || speed != speed_cap) + mlx4_warn(dev, + "PCIe BW is different than device's capability\n"); + + mlx4_info(dev, "PCIe link speed is %s, device supports %s\n", + PCIE_SPEED_STR(speed), PCIE_SPEED_STR(speed_cap)); + mlx4_info(dev, "PCIe link width is x%d, device supports x%d\n", + width, width_cap); + return 0; +} + /*The function checks if there are live vf, return the num of them*/ static int mlx4_how_many_lives_vf(struct mlx4_dev *dev) { @@ -2306,6 +2384,12 @@ slave_start: goto err_mfunc; } + /* check if the device is functioning at its maximum possible speed + * ignoring function return code, just warn the user in case of PCI + * express device capabilities are under-satisfied by the bus. + */ + mlx4_check_pcie_caps(dev); + /* In master functions, the communication channel must be initialized * after obtaining its address from fw */ if (mlx4_is_master(dev)) {