Message ID | alpine.LRH.2.02.1810301232310.23106@file01.intranet.prod.int.rdu2.redhat.com |
---|---|
State | Accepted |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | pci: fix incorrect value returned from pcie_get_speed_cap | expand |
On Tue, Oct 30, 2018 at 12:36 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > the register and compare it against them. > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > the slot is being incorrectly reported as PCIe-v3 capable. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > Cc: stable@vger.kernel.org # v4.17+ Acked-by: Alex Deucher <alexander.deucher@amd.com> > > --- > drivers/pci/pci.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > Index: linux-4.19/drivers/pci/pci.c > =================================================================== > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > if (lnkcap) { > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > return PCIE_SPEED_16_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > return PCIE_SPEED_8_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > return PCIE_SPEED_5_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > return PCIE_SPEED_2_5GT; > } >
On Tue, Oct 30, 2018 at 2:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Tue, Oct 30, 2018 at 12:36 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > the register and compare it against them. > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > Cc: stable@vger.kernel.org # v4.17+ > > Acked-by: Alex Deucher <alexander.deucher@amd.com> Has this patch been picked up yet? It fixes a ton of bugs. Please apply! Alex > > > > > --- > > drivers/pci/pci.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > Index: linux-4.19/drivers/pci/pci.c > > =================================================================== > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > if (lnkcap) { > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > return PCIE_SPEED_16_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > return PCIE_SPEED_8_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > return PCIE_SPEED_5_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > return PCIE_SPEED_2_5GT; > > } > >
On Wed, Nov 14, 2018 at 10:44 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Tue, Oct 30, 2018 at 2:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > On Tue, Oct 30, 2018 at 12:36 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > the register and compare it against them. > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > Cc: stable@vger.kernel.org # v4.17+ > > > > Acked-by: Alex Deucher <alexander.deucher@amd.com> > > > Has this patch been picked up yet? It fixes a ton of bugs. Please apply! Any word on this? Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108704 https://bugs.freedesktop.org/show_bug.cgi?id=108778 and several others. Alex > > Alex > > > > > > > > > --- > > > drivers/pci/pci.c | 8 ++++---- > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > Index: linux-4.19/drivers/pci/pci.c > > > =================================================================== > > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > if (lnkcap) { > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > > return PCIE_SPEED_16_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > > return PCIE_SPEED_8_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > > return PCIE_SPEED_5_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > return PCIE_SPEED_2_5GT; > > > } > > >
On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > the register and compare it against them. > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > the slot is being incorrectly reported as PCIe-v3 capable. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > Cc: stable@vger.kernel.org # v4.17+ > > --- > drivers/pci/pci.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > Index: linux-4.19/drivers/pci/pci.c > =================================================================== > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > if (lnkcap) { > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > return PCIE_SPEED_16_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > return PCIE_SPEED_8_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > return PCIE_SPEED_5_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > return PCIE_SPEED_2_5GT; > } > I'd like to apply this as below, where I removed the 8_0GB and 16_0GB cases as recommended by the spec. I can't test it myself, and the bugzillas don't contain enough information for me to confirm that the patch below is enough (the "lspci -vv" output of the root port and GPU is what I would need). I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but v4.18 works fine according to both bugzillas. I also don't have a good feel for whether this is urgent enough to be a v4.20 fix or whether it can wait for v4.21. Evidence either way would help. We also need similar fixes in pci_set_bus_speed(), pcie_speeds() (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), qla24xx_pci_info_str(), and maybe a couple other places. Bjorn commit 871f73abf4b8e6aee8a206775f944ede7c7d7250 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Tue Oct 30 12:36:08 2018 -0400 PCI: Fix incorrect value returned from pcie_get_speed_cap() The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask the register and compare it against them. This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the slot is being incorrectly reported as PCIe-v3 capable. Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704 Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # v4.17+ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d068f11d08a7..8563d1d9b102 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) u32 lnkcap2, lnkcap; /* - * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link - * Speeds Vector in Link Capabilities 2 when supported, falling - * back to Max Link Speed in Link Capabilities otherwise. + * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18. The + * implementation note there recommends using the Supported Link + * Speeds Vector in Link Capabilities 2 when supported. + * + * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software + * should use the Supported Link Speeds field in Link Capabilities, + * where only 2.5 GT/s and 5.0 GT/s speeds were defined. */ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2); if (lnkcap2) { /* PCIe r3.0-compliant */ @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); if (lnkcap) { - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) - return PCIE_SPEED_16_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) - return PCIE_SPEED_8_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB) return PCIE_SPEED_5_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) return PCIE_SPEED_2_5GT; }
On Mon, Nov 19, 2018 at 7:47 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > the register and compare it against them. > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > Cc: stable@vger.kernel.org # v4.17+ > > > > --- > > drivers/pci/pci.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > Index: linux-4.19/drivers/pci/pci.c > > =================================================================== > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > if (lnkcap) { > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > return PCIE_SPEED_16_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > return PCIE_SPEED_8_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > return PCIE_SPEED_5_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > return PCIE_SPEED_2_5GT; > > } > > > > I'd like to apply this as below, where I removed the 8_0GB and 16_0GB > cases as recommended by the spec. I can't test it myself, and the > bugzillas don't contain enough information for me to confirm that the > patch below is enough (the "lspci -vv" output of the root port and GPU > is what I would need). > > I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but > v4.18 works fine according to both bugzillas. This issue affects AMD GPUs because we switched from using an open coded check for pcie link speeds in the driver to using the common pcie variants in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.19&id=5d9a6330403271fbb1244f14380a7cc44662796f The patch below would regress performance, at least on AMD GPUs, since we'd end up reporting a max speed of gen 2 (5 GT/s) which would cause the driver to limit the speed to gen2 even if gen3 or 4 are available. > > I also don't have a good feel for whether this is urgent enough to be > a v4.20 fix or whether it can wait for v4.21. Evidence either way > would help. I'd like it to land for 4.19 and 4.20 at least. Alternatively, we could revert all of the drm patches to and bring back all the open coded implementations, but it's a fairly large number of patches to revert. Alex > > We also need similar fixes in pci_set_bus_speed(), pcie_speeds() > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), > qla24xx_pci_info_str(), and maybe a couple other places. > > Bjorn > > > commit 871f73abf4b8e6aee8a206775f944ede7c7d7250 > Author: Mikulas Patocka <mpatocka@redhat.com> > Date: Tue Oct 30 12:36:08 2018 -0400 > > PCI: Fix incorrect value returned from pcie_get_speed_cap() > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > the register and compare it against them. > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret > is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the > slot is being incorrectly reported as PCIe-v3 capable. > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704 > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778 > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and > PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2] > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > Acked-by: Alex Deucher <alexander.deucher@amd.com> > Cc: stable@vger.kernel.org # v4.17+ > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index d068f11d08a7..8563d1d9b102 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > u32 lnkcap2, lnkcap; > > /* > - * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link > - * Speeds Vector in Link Capabilities 2 when supported, falling > - * back to Max Link Speed in Link Capabilities otherwise. > + * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18. The > + * implementation note there recommends using the Supported Link > + * Speeds Vector in Link Capabilities 2 when supported. > + * > + * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software > + * should use the Supported Link Speeds field in Link Capabilities, > + * where only 2.5 GT/s and 5.0 GT/s speeds were defined. > */ > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2); > if (lnkcap2) { /* PCIe r3.0-compliant */ > @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > if (lnkcap) { > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > - return PCIE_SPEED_16_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > - return PCIE_SPEED_8_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB) > return PCIE_SPEED_5_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > return PCIE_SPEED_2_5GT; > } >
On Tue, Nov 20, 2018 at 09:17:52AM -0500, Alex Deucher wrote: > On Mon, Nov 19, 2018 at 7:47 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > the register and compare it against them. > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > Cc: stable@vger.kernel.org # v4.17+ > > > > > > --- > > > drivers/pci/pci.c | 8 ++++---- > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > Index: linux-4.19/drivers/pci/pci.c > > > =================================================================== > > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > if (lnkcap) { > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > > return PCIE_SPEED_16_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > > return PCIE_SPEED_8_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > > return PCIE_SPEED_5_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > return PCIE_SPEED_2_5GT; > > > } > > > > > > > I'd like to apply this as below, where I removed the 8_0GB and 16_0GB > > cases as recommended by the spec. I can't test it myself, and the > > bugzillas don't contain enough information for me to confirm that the > > patch below is enough (the "lspci -vv" output of the root port and GPU > > is what I would need). > > > > I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but > > v4.18 works fine according to both bugzillas. > > This issue affects AMD GPUs because we switched from using an open > coded check for pcie link speeds in the driver to using the common > pcie variants in > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.19&id=5d9a6330403271fbb1244f14380a7cc44662796f OK, thanks. I added that SHA1 and a note to explain the connection. > The patch below would regress performance, at least on AMD GPUs, since > we'd end up reporting a max speed of gen 2 (5 GT/s) which would cause > the driver to limit the speed to gen2 even if gen3 or 4 are available. I guess this means these parts are broken with respect to the spec, since they support gen3 speeds but don't implement LnkCap2? Gen2, i.e., PCIe r2.1, only defined 2.5 GT/s and 5 GT/s. Devices capable of the higher speeds added by PCIe r3.0 are supposed to implement LnkCap2, but if we're even getting to this code, it means LnkCap2 was zero. If we confirm that this is a device defect, the question is what the best way to work around it is. Probably the original patch is easier than some sort of quirk, but we'd need to expand the comment a little bit to explain why we're not following the spec recommendation. It looks like lspci probably also needs some updating here -- it currently doesn't do anything at all with LnkCap2. > > I also don't have a good feel for whether this is urgent enough to be > > a v4.20 fix or whether it can wait for v4.21. Evidence either way > > would help. > > I'd like it to land for 4.19 and 4.20 at least. Alternatively, we > could revert all of the drm patches to and bring back all the open > coded implementations, but it's a fairly large number of patches to > revert. OK, sounds like it makes sense to do this for v4.20 and backport it at least to v4.19 stable. I do want to get the places below fixed also. They may not be as urgent, but we might as well try and make everything consistent while we're looking at it. > > We also need similar fixes in pci_set_bus_speed(), pcie_speeds() > > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), > > qla24xx_pci_info_str(), and maybe a couple other places. > > > > Bjorn > > > > > > commit 871f73abf4b8e6aee8a206775f944ede7c7d7250 > > Author: Mikulas Patocka <mpatocka@redhat.com> > > Date: Tue Oct 30 12:36:08 2018 -0400 > > > > PCI: Fix incorrect value returned from pcie_get_speed_cap() > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > the register and compare it against them. > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret > > is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the > > slot is being incorrectly reported as PCIe-v3 capable. > > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704 > > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778 > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and > > PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2] > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > Acked-by: Alex Deucher <alexander.deucher@amd.com> > > Cc: stable@vger.kernel.org # v4.17+ > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index d068f11d08a7..8563d1d9b102 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > > u32 lnkcap2, lnkcap; > > > > /* > > - * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link > > - * Speeds Vector in Link Capabilities 2 when supported, falling > > - * back to Max Link Speed in Link Capabilities otherwise. > > + * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18. The > > + * implementation note there recommends using the Supported Link > > + * Speeds Vector in Link Capabilities 2 when supported. > > + * > > + * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software > > + * should use the Supported Link Speeds field in Link Capabilities, > > + * where only 2.5 GT/s and 5.0 GT/s speeds were defined. > > */ > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2); > > if (lnkcap2) { /* PCIe r3.0-compliant */ > > @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > if (lnkcap) { > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > - return PCIE_SPEED_16_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > - return PCIE_SPEED_8_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB) > > return PCIE_SPEED_5_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > return PCIE_SPEED_2_5GT; > > } > >
On Tue, Nov 20, 2018 at 10:13 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Nov 20, 2018 at 09:17:52AM -0500, Alex Deucher wrote: > > On Mon, Nov 19, 2018 at 7:47 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > > the register and compare it against them. > > > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > > Cc: stable@vger.kernel.org # v4.17+ > > > > > > > > --- > > > > drivers/pci/pci.c | 8 ++++---- > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > Index: linux-4.19/drivers/pci/pci.c > > > > =================================================================== > > > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > > if (lnkcap) { > > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > > > return PCIE_SPEED_16_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > > > return PCIE_SPEED_8_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > > > return PCIE_SPEED_5_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > > return PCIE_SPEED_2_5GT; > > > > } > > > > > > > > > > I'd like to apply this as below, where I removed the 8_0GB and 16_0GB > > > cases as recommended by the spec. I can't test it myself, and the > > > bugzillas don't contain enough information for me to confirm that the > > > patch below is enough (the "lspci -vv" output of the root port and GPU > > > is what I would need). > > > > > > I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but > > > v4.18 works fine according to both bugzillas. > > > > This issue affects AMD GPUs because we switched from using an open > > coded check for pcie link speeds in the driver to using the common > > pcie variants in > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.19&id=5d9a6330403271fbb1244f14380a7cc44662796f > > OK, thanks. I added that SHA1 and a note to explain the connection. > > > The patch below would regress performance, at least on AMD GPUs, since > > we'd end up reporting a max speed of gen 2 (5 GT/s) which would cause > > the driver to limit the speed to gen2 even if gen3 or 4 are available. > > I guess this means these parts are broken with respect to the spec, > since they support gen3 speeds but don't implement LnkCap2? Sorry, I mis-read the patch. The patch is correct and our parts are compliant. We are only hitting this issue on non-gen3 platforms which would end up in the second case. Sorry for the confusion. You new patch is: Acked-by: Alex Deucher <alexander.deucher@amd.com> Alex > > Gen2, i.e., PCIe r2.1, only defined 2.5 GT/s and 5 GT/s. Devices > capable of the higher speeds added by PCIe r3.0 are supposed to > implement LnkCap2, but if we're even getting to this code, it means > LnkCap2 was zero. > > If we confirm that this is a device defect, the question is what the > best way to work around it is. Probably the original patch is easier > than some sort of quirk, but we'd need to expand the comment a little > bit to explain why we're not following the spec recommendation. > > It looks like lspci probably also needs some updating here -- it > currently doesn't do anything at all with LnkCap2. > > > > I also don't have a good feel for whether this is urgent enough to be > > > a v4.20 fix or whether it can wait for v4.21. Evidence either way > > > would help. > > > > I'd like it to land for 4.19 and 4.20 at least. Alternatively, we > > could revert all of the drm patches to and bring back all the open > > coded implementations, but it's a fairly large number of patches to > > revert. > > OK, sounds like it makes sense to do this for v4.20 and backport it at > least to v4.19 stable. I do want to get the places below fixed also. > They may not be as urgent, but we might as well try and make > everything consistent while we're looking at it. > > > > We also need similar fixes in pci_set_bus_speed(), pcie_speeds() > > > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), > > > qla24xx_pci_info_str(), and maybe a couple other places. > > > > > > Bjorn > > > > > > > > > commit 871f73abf4b8e6aee8a206775f944ede7c7d7250 > > > Author: Mikulas Patocka <mpatocka@redhat.com> > > > Date: Tue Oct 30 12:36:08 2018 -0400 > > > > > > PCI: Fix incorrect value returned from pcie_get_speed_cap() > > > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > the register and compare it against them. > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret > > > is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the > > > slot is being incorrectly reported as PCIe-v3 capable. > > > > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704 > > > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778 > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and > > > PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2] > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > > Acked-by: Alex Deucher <alexander.deucher@amd.com> > > > Cc: stable@vger.kernel.org # v4.17+ > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index d068f11d08a7..8563d1d9b102 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > > > u32 lnkcap2, lnkcap; > > > > > > /* > > > - * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link > > > - * Speeds Vector in Link Capabilities 2 when supported, falling > > > - * back to Max Link Speed in Link Capabilities otherwise. > > > + * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18. The > > > + * implementation note there recommends using the Supported Link > > > + * Speeds Vector in Link Capabilities 2 when supported. > > > + * > > > + * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software > > > + * should use the Supported Link Speeds field in Link Capabilities, > > > + * where only 2.5 GT/s and 5.0 GT/s speeds were defined. > > > */ > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2); > > > if (lnkcap2) { /* PCIe r3.0-compliant */ > > > @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > if (lnkcap) { > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > - return PCIE_SPEED_16_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > - return PCIE_SPEED_8_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB) > > > return PCIE_SPEED_5_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > return PCIE_SPEED_2_5GT; > > > } > > >
On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > the register and compare it against them. > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > the slot is being incorrectly reported as PCIe-v3 capable. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Applied the patch below to for-linus for v4.20 and backport to v4.17+ stable kernels, thanks! I removed the "if (lnkcap)" test because the LNKCAP case is not parallel to the LNKCAP2 case. LNKCAP2 is optional so we have to test for lnkcap2 being non-zero. But LNKCAP is required for all devices. If the config read of it fails, we should get either 0 or ~0, and neither one will satisfy the 5.0 or 2.5 checks. > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > Cc: stable@vger.kernel.org # v4.17+ > > --- > drivers/pci/pci.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > Index: linux-4.19/drivers/pci/pci.c > =================================================================== > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > if (lnkcap) { > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > return PCIE_SPEED_16_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > return PCIE_SPEED_8_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > return PCIE_SPEED_5_0GT; > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > return PCIE_SPEED_2_5GT; > } > commit 94ea01a6d9a6 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Mon Nov 26 10:37:13 2018 -0600 PCI: Fix incorrect value returned from pcie_get_speed_cap() The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask the register and compare it against them. This fixes errors like this: amdgpu: [powerplay] failed to send message 261 ret is 0 when PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is being incorrectly reported as PCIe-v3 capable. 6cf57be0f78e, which appeared in v4.17, added pcie_get_speed_cap() with the incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask. 5d9a63304032, which appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the amdgpu bug reports below are regressions in v4.19. Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") Fixes: 5d9a63304032 ("drm/amdgpu: use pcie functions for link width and speed") Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704 Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2, remove test of PCI_EXP_LNKCAP for zero, since that register is required] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # v4.17+ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d068f11d08a7..c9d8e3c837de 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) u32 lnkcap2, lnkcap; /* - * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link - * Speeds Vector in Link Capabilities 2 when supported, falling - * back to Max Link Speed in Link Capabilities otherwise. + * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18. The + * implementation note there recommends using the Supported Link + * Speeds Vector in Link Capabilities 2 when supported. + * + * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software + * should use the Supported Link Speeds field in Link Capabilities, + * where only 2.5 GT/s and 5.0 GT/s speeds were defined. */ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2); if (lnkcap2) { /* PCIe r3.0-compliant */ @@ -5574,16 +5578,10 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) } pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); - if (lnkcap) { - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) - return PCIE_SPEED_16_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) - return PCIE_SPEED_8_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) - return PCIE_SPEED_5_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) - return PCIE_SPEED_2_5GT; - } + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB) + return PCIE_SPEED_5_0GT; + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) + return PCIE_SPEED_2_5GT; return PCI_SPEED_UNKNOWN; }
On Mon, Nov 19, 2018 at 06:47:04PM -0600, Bjorn Helgaas wrote: > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > the register and compare it against them. > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > Cc: stable@vger.kernel.org # v4.17+ > > > > --- > > drivers/pci/pci.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > Index: linux-4.19/drivers/pci/pci.c > > =================================================================== > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > if (lnkcap) { > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > return PCIE_SPEED_16_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > return PCIE_SPEED_8_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > return PCIE_SPEED_5_0GT; > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > return PCIE_SPEED_2_5GT; > > } > We also need similar fixes in pci_set_bus_speed(), pcie_speeds() > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), > qla24xx_pci_info_str(), and maybe a couple other places. Does anybody want to volunteer to fix the places above as well? I found them by grepping for PCI_EXP_LNKCAP, and they're all broken in ways similar to pcie_get_speed_cap(). Possibly some of these places could use pcie_get_speed_cap() directly. Bjorn
On Mon, 26 Nov 2018, Bjorn Helgaas wrote: > On Mon, Nov 19, 2018 at 06:47:04PM -0600, Bjorn Helgaas wrote: > > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > the register and compare it against them. > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > Cc: stable@vger.kernel.org # v4.17+ > > > > > > --- > > > drivers/pci/pci.c | 8 ++++---- > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > Index: linux-4.19/drivers/pci/pci.c > > > =================================================================== > > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > if (lnkcap) { > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > > return PCIE_SPEED_16_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > > return PCIE_SPEED_8_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > > return PCIE_SPEED_5_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > return PCIE_SPEED_2_5GT; > > > } > > > We also need similar fixes in pci_set_bus_speed(), pcie_speeds() > > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), > > qla24xx_pci_info_str(), and maybe a couple other places. > > Does anybody want to volunteer to fix the places above as well? I > found them by grepping for PCI_EXP_LNKCAP, and they're all broken in > ways similar to pcie_get_speed_cap(). Possibly some of these places > could use pcie_get_speed_cap() directly. > > Bjorn > They are not broken, they are masking the value with PCI_EXP_LNKCAP_SLS - that is correct. pci_set_bus_speed: pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap); bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS]; pcie_speeds: if ((linkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_8_0GB) cobalt_pcie_status_show: just prints the values without doing anything with them hba_ioctl_callback: gai->pci.link_speed_max = (u8)(caps & PCI_EXP_LNKCAP_SLS); gai->pci.link_width_max = (u8)((caps & PCI_EXP_LNKCAP_MLW) >> 4); qla24xx_pci_info_str: lspeed = lstat & PCI_EXP_LNKCAP_SLS; lwidth = (lstat & PCI_EXP_LNKCAP_MLW) >> 4; Mikulas
On Tue, Nov 27, 2018 at 11:49:43AM -0500, Mikulas Patocka wrote: > On Mon, 26 Nov 2018, Bjorn Helgaas wrote: > > On Mon, Nov 19, 2018 at 06:47:04PM -0600, Bjorn Helgaas wrote: > > > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > > the register and compare it against them. > > > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > > Cc: stable@vger.kernel.org # v4.17+ > > > > > > > > --- > > > > drivers/pci/pci.c | 8 ++++---- > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > Index: linux-4.19/drivers/pci/pci.c > > > > =================================================================== > > > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > > if (lnkcap) { > > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > > > return PCIE_SPEED_16_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > > > return PCIE_SPEED_8_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > > > return PCIE_SPEED_5_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > > return PCIE_SPEED_2_5GT; > > > > } > > > > > We also need similar fixes in pci_set_bus_speed(), pcie_speeds() > > > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), > > > qla24xx_pci_info_str(), and maybe a couple other places. > > > > Does anybody want to volunteer to fix the places above as well? I > > found them by grepping for PCI_EXP_LNKCAP, and they're all broken in > > ways similar to pcie_get_speed_cap(). Possibly some of these places > > could use pcie_get_speed_cap() directly. > > They are not broken, they are masking the value with PCI_EXP_LNKCAP_SLS - > that is correct. You're right, "broken" is an overstatement except in terms of maintainance: they do happen to work correctly, but there's no reason to reimplement the same functionality several places in slightly different ways. pcie_get_speed_cap() looks at LNKCAP2, then LNKCAP. The other places look only at LNKCAP. Since these are all finding only the max link speed (which can be determined using only LKNCAP), not the set of supported speeds (which requires both LNKCAP2 and LNKCAP), they should all do it the same way. > pci_set_bus_speed: > pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap); > bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS]; > > pcie_speeds: > if ((linkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_8_0GB) > > cobalt_pcie_status_show: > just prints the values without doing anything with them > > hba_ioctl_callback: > gai->pci.link_speed_max = (u8)(caps & PCI_EXP_LNKCAP_SLS); > > gai->pci.link_width_max = (u8)((caps & PCI_EXP_LNKCAP_MLW) >> 4); > > qla24xx_pci_info_str: > lspeed = lstat & PCI_EXP_LNKCAP_SLS; > lwidth = (lstat & PCI_EXP_LNKCAP_MLW) >> 4; > > Mikulas
Index: linux-4.19/drivers/pci/pci.c =================================================================== --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); if (lnkcap) { - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) return PCIE_SPEED_16_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) return PCIE_SPEED_8_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) return PCIE_SPEED_5_0GT; - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) return PCIE_SPEED_2_5GT; }
The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask the register and compare it against them. This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the slot is being incorrectly reported as PCIe-v3 capable. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") Cc: stable@vger.kernel.org # v4.17+ --- drivers/pci/pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)