diff mbox series

[v6,8/9] net/mlx5: Do not call pcie_print_link_status()

Message ID 20180806232600.25694-8-mr.nuke.me@gmail.com
State Awaiting Upstream
Headers show
Series [v6,1/9] PCI: Check for PCIe downtraining conditions | expand

Commit Message

Alexandru Gagniuc Aug. 6, 2018, 11:25 p.m. UTC
This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Leon Romanovsky Aug. 8, 2018, 6:08 a.m. UTC | #1
On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>  1 file changed, 4 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Tal Gilboa Aug. 8, 2018, 2:23 p.m. UTC | #2
On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
> 
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> 

Alex,
I loaded mlx5 driver with and without these series. The report in dmesg 
is now missing. From what I understood, the status should be reported at 
least once, even if everything is in order. We need this functionality 
to stay.

net-next (dmesg output for 07:00.0):
[270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
bandwidth (8 GT/s x8 link)
[270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged

net-next + patches (dmesg output for 07:00.0):
[  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged
Leon Romanovsky Aug. 8, 2018, 3:41 p.m. UTC | #3
On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > >
> > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > ---
> > >   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > >   1 file changed, 4 deletions(-)
> > >
> >
> > Thanks,
> > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> >
>
> Alex,
> I loaded mlx5 driver with and without these series. The report in dmesg is
> now missing. From what I understood, the status should be reported at least
> once, even if everything is in order.

It is not what this series is doing and it removes prints completely if
fabric can deliver more than card is capable.

> We need this functionality to stay.

I'm not sure that you need this information in driver's dmesg output,
but most probably something globally visible and accessible per-pci
device.

>
> net-next (dmesg output for 07:00.0):
> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
> (8 GT/s x8 link)
> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
> net-next + patches (dmesg output for 07:00.0):
> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
>
Tal Gilboa Aug. 8, 2018, 3:56 p.m. UTC | #4
On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>> ---
>>>>    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>    1 file changed, 4 deletions(-)
>>>>
>>>
>>> Thanks,
>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>
>>
>> Alex,
>> I loaded mlx5 driver with and without these series. The report in dmesg is
>> now missing. From what I understood, the status should be reported at least
>> once, even if everything is in order.
> 
> It is not what this series is doing and it removes prints completely if
> fabric can deliver more than card is capable.
> 
>> We need this functionality to stay.
> 
> I'm not sure that you need this information in driver's dmesg output,
> but most probably something globally visible and accessible per-pci
> device.

Currently we have users that look for it. If we remove the dmesg print 
we need this to be reported elsewhere. Adding it to sysfs for example 
should be a valid solution for our case.

> 
>>
>> net-next (dmesg output for 07:00.0):
>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
>> (8 GT/s x8 link)
>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>> net-next + patches (dmesg output for 07:00.0):
>> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>>
Alexandru Gagniuc Aug. 8, 2018, 4:33 p.m. UTC | #5
On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
>> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>>
>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>> ---
>>>>>    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>>    1 file changed, 4 deletions(-)
>>>>>
>>>>
>>>> Thanks,
>>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>>
>>>
>>> Alex,
>>> I loaded mlx5 driver with and without these series. The report in 
>>> dmesg is
>>> now missing. From what I understood, the status should be reported at 
>>> least
>>> once, even if everything is in order.
>>
>> It is not what this series is doing and it removes prints completely if
>> fabric can deliver more than card is capable.
>>
>>> We need this functionality to stay.
>>
>> I'm not sure that you need this information in driver's dmesg output,
>> but most probably something globally visible and accessible per-pci
>> device.
> 
> Currently we have users that look for it. If we remove the dmesg print 
> we need this to be reported elsewhere. Adding it to sysfs for example 
> should be a valid solution for our case.

I think a stop-gap measure is to leave the pcie_print_link_status() call 
in drivers that really need it for whatever reason. Implementing a 
reliable reporting through sysfs might take some tinkering, and I don't 
think it's a sufficient reason to block the heart of this series -- 
being able to detect bottlenecks and link downtraining.

Alex

>>
>>>
>>> net-next (dmesg output for 07:00.0):
>>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
>>> bandwidth
>>> (8 GT/s x8 link)
>>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>> net-next + patches (dmesg output for 07:00.0):
>>> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>>
Leon Romanovsky Aug. 8, 2018, 5:27 p.m. UTC | #6
On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
>
>
> On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > >
> > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > ---
> > > > > >    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > >    1 file changed, 4 deletions(-)
> > > > > >
> > > > >
> > > > > Thanks,
> > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > >
> > > >
> > > > Alex,
> > > > I loaded mlx5 driver with and without these series. The report
> > > > in dmesg is
> > > > now missing. From what I understood, the status should be
> > > > reported at least
> > > > once, even if everything is in order.
> > >
> > > It is not what this series is doing and it removes prints completely if
> > > fabric can deliver more than card is capable.
> > >
> > > > We need this functionality to stay.
> > >
> > > I'm not sure that you need this information in driver's dmesg output,
> > > but most probably something globally visible and accessible per-pci
> > > device.
> >
> > Currently we have users that look for it. If we remove the dmesg print
> > we need this to be reported elsewhere. Adding it to sysfs for example
> > should be a valid solution for our case.
>
> I think a stop-gap measure is to leave the pcie_print_link_status() call in
> drivers that really need it for whatever reason. Implementing a reliable
> reporting through sysfs might take some tinkering, and I don't think it's a
> sufficient reason to block the heart of this series -- being able to detect
> bottlenecks and link downtraining.

IMHO, you did right change and it is better to replace this print to some
more generic solution now while you are doing it and don't leave leftovers.

Thanks

>
> Alex
>
> > >
> > > >
> > > > net-next (dmesg output for 07:00.0):
> > > > [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available
> > > > PCIe bandwidth
> > > > (8 GT/s x8 link)
> > > > [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [270499.182358] mlx5_core 0000:07:00.0: Port module event:
> > > > module 0, Cable
> > > > plugged
> > > >
> > > > net-next + patches (dmesg output for 07:00.0):
> > > > [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [  332.616271] mlx5_core 0000:07:00.0: Port module event: module
> > > > 0, Cable
> > > > plugged
> > > >
> > > >
Bjorn Helgaas Aug. 9, 2018, 2:02 p.m. UTC | #7
On Wed, Aug 08, 2018 at 08:27:36PM +0300, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
> >
> >
> > On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > > >
> > > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > > ---
> > > > > > >    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > > >    1 file changed, 4 deletions(-)
> > > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > > >
> > > > >
> > > > > Alex,
> > > > > I loaded mlx5 driver with and without these series. The report
> > > > > in dmesg is
> > > > > now missing. From what I understood, the status should be
> > > > > reported at least
> > > > > once, even if everything is in order.
> > > >
> > > > It is not what this series is doing and it removes prints completely if
> > > > fabric can deliver more than card is capable.
> > > >
> > > > > We need this functionality to stay.
> > > >
> > > > I'm not sure that you need this information in driver's dmesg output,
> > > > but most probably something globally visible and accessible per-pci
> > > > device.
> > >
> > > Currently we have users that look for it. If we remove the dmesg print
> > > we need this to be reported elsewhere. Adding it to sysfs for example
> > > should be a valid solution for our case.
> >
> > I think a stop-gap measure is to leave the pcie_print_link_status() call in
> > drivers that really need it for whatever reason. Implementing a reliable
> > reporting through sysfs might take some tinkering, and I don't think it's a
> > sufficient reason to block the heart of this series -- being able to detect
> > bottlenecks and link downtraining.
> 
> IMHO, you did right change and it is better to replace this print to some
> more generic solution now while you are doing it and don't leave leftovers.

I'd like to make forward progress on this, so I propose we merge only
the PCI core change (patch 1/9) and drop the individual driver
changes.  That would mean:

  - We'll get a message from every NIC driver that calls
    pcie_print_link_status() as before.

  - We'll get a new message from the core for every downtrained link.

  - If a link leading to the NIC is downtrained, there will be
    duplicate messages.  Maybe that's overkill but it's not terrible.

I provisionally put the patch below on my pci/enumeration branch.
Objections?


commit c870cc8cbc4d79014f3daa74d1e412f32e42bf1b
Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Date:   Mon Aug 6 18:25:35 2018 -0500

    PCI: Check for PCIe Link downtraining
    
    When both ends of a PCIe Link are capable of a higher bandwidth than is
    currently in use, the Link is said to be "downtrained".  A downtrained Link
    may indicate hardware or configuration problems in the system, but it's
    hard to identify such Links from userspace.
    
    Refactor pcie_print_link_status() so it continues to always print PCIe
    bandwidth information, as several NIC drivers desire.
    
    Add a new internal __pcie_print_link_status() to emit a message only when a
    device's bandwidth is constrained by the fabric and call it from the PCI
    core for all devices, which identifies all downtrained Links.  It also
    emits messages for a few cases that are technically not downtrained, such
    as a x4 device in an open-ended x1 slot.
    
    Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
    [bhelgaas: changelog, move __pcie_print_link_status() declaration to
    drivers/pci/, rename pcie_check_upstream_link() to
    pcie_report_downtraining()]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 97acba712e4e..a84d341504a5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5264,14 +5264,16 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Print info even when enough bandwidth is available
  *
- * Report the available bandwidth at the device.  If this is less than the
- * device is capable of, report the device's maximum possible bandwidth and
- * the upstream link that limits its performance to less than that.
+ * If the available bandwidth at the device is less than the device is
+ * capable of, report the device's maximum possible bandwidth and the
+ * upstream link that limits its performance.  If @verbose, always print
+ * the available bandwidth, even if the device isn't constrained.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5281,11 +5283,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5293,6 +5295,17 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 70808c168fb9..ce880dab5bc8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -263,6 +263,7 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
 enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
 u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 			   enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 
 /* Single Root I/O Virtualization */
 struct pci_sriov {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bc147c586643..387fc8ac54ec 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2231,6 +2231,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_report_downtraining(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe devices share the same link/status */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	/* Print link status only if the device is constrained by the fabric */
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2266,6 +2285,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	pcie_report_downtraining(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 615005e63819..e10f9c2bea3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1045,10 +1045,6 @@  static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	dev_info(&pdev->dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
 		 fw_rev_min(dev), fw_rev_sub(dev));
 
-	/* Only PFs hold the relevant PCIe information for this query */
-	if (mlx5_core_is_pf(dev))
-		pcie_print_link_status(dev->pdev);
-
 	/* on load removing any previous indication of internal error, device is
 	 * up
 	 */