diff mbox series

[V3] PCI: Add quirk for AMD Navi14 to disable ATS support

Message ID 20210602021255.939090-1-evan.quan@amd.com
State New
Headers show
Series [V3] PCI: Add quirk for AMD Navi14 to disable ATS support | expand

Commit Message

Quan, Evan June 2, 2021, 2:12 a.m. UTC
Unexpected GPU hang was observed during runpm stress test
on 0x7341 rev 0x00. Further debugging shows broken ATS is
related. Thus as a followup of commit 5e89cd303e3a ("PCI:
Mark AMD Navi14 GPU rev 0xc5 ATS as broken"), we disable
the ATS for the specific SKU also.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
---
ChangeLog v2->v3:
- further update for description part(suggested by Krzysztof)
ChangeLog v1->v2:
- cosmetic fix for description part(suggested by Krzysztof)
---
 drivers/pci/quirks.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas June 4, 2021, 8:59 p.m. UTC | #1
On Wed, Jun 02, 2021 at 10:12:55AM +0800, Evan Quan wrote:
> Unexpected GPU hang was observed during runpm stress test
> on 0x7341 rev 0x00. Further debugging shows broken ATS is
> related. Thus as a followup of commit 5e89cd303e3a ("PCI:
> Mark AMD Navi14 GPU rev 0xc5 ATS as broken"), we disable
> the ATS for the specific SKU also.
> 
> Signed-off-by: Evan Quan <evan.quan@amd.com>
> Suggested-by: Alex Deucher <alexander.deucher@amd.com>
> Reviewed-by: Krzysztof Wilczyński <kw@linux.com>

Applied to pci/virtualization for v5.14, thanks.

I updated the commit log like this:

    PCI: Mark AMD Navi14 GPU ATS as broken

    Observed unexpected GPU hang during runpm stress test on 0x7341 rev 0x00.
    Further debugging shows broken ATS is related.

    Disable ATS on this part.  Similar issues on other devices:

      a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms")
      45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
      5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")

    Suggested-by: Alex Deucher <alexander.deucher@amd.com>
    Link: https://lore.kernel.org/r/20210602021255.939090-1-evan.quan@amd.com
    Signed-off-by: Evan Quan <evan.quan@amd.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Krzysztof Wilczyński <kw@linux.com>

> ---
> ChangeLog v2->v3:
> - further update for description part(suggested by Krzysztof)
> ChangeLog v1->v2:
> - cosmetic fix for description part(suggested by Krzysztof)
> ---
>  drivers/pci/quirks.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b7e19bbb901a..70803ad6d2ac 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5176,7 +5176,8 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags);
>  static void quirk_amd_harvest_no_ats(struct pci_dev *pdev)
>  {
>  	if ((pdev->device == 0x7312 && pdev->revision != 0x00) ||
> -	    (pdev->device == 0x7340 && pdev->revision != 0xc5))
> +	    (pdev->device == 0x7340 && pdev->revision != 0xc5) ||
> +	    (pdev->device == 0x7341 && pdev->revision != 0x00))
>  		return;
>  
>  	if (pdev->device == 0x15d8) {
> @@ -5203,6 +5204,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats);
>  /* AMD Navi14 dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, quirk_amd_harvest_no_ats);
>  /* AMD Raven platform iGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
> -- 
> 2.29.0
>
Quan, Evan June 7, 2021, 3:41 a.m. UTC | #2
[AMD Official Use Only]

Thanks Bjorn.
@Deucher, Alexander can you advise whether this is needed for stable kernel branches and which branches if yes?

BR
Evan
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Saturday, June 5, 2021 4:59 AM
> To: Quan, Evan <Evan.Quan@amd.com>
> Cc: linux-pci@vger.kernel.org; kw@linux.com; Deucher, Alexander
> <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH V3] PCI: Add quirk for AMD Navi14 to disable ATS
> support
> 
> On Wed, Jun 02, 2021 at 10:12:55AM +0800, Evan Quan wrote:
> > Unexpected GPU hang was observed during runpm stress test on 0x7341
> > rev 0x00. Further debugging shows broken ATS is related. Thus as a
> > followup of commit 5e89cd303e3a ("PCI:
> > Mark AMD Navi14 GPU rev 0xc5 ATS as broken"), we disable the ATS for
> > the specific SKU also.
> >
> > Signed-off-by: Evan Quan <evan.quan@amd.com>
> > Suggested-by: Alex Deucher <alexander.deucher@amd.com>
> > Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
> 
> Applied to pci/virtualization for v5.14, thanks.
> 
> I updated the commit log like this:
> 
>     PCI: Mark AMD Navi14 GPU ATS as broken
> 
>     Observed unexpected GPU hang during runpm stress test on 0x7341 rev
> 0x00.
>     Further debugging shows broken ATS is related.
> 
>     Disable ATS on this part.  Similar issues on other devices:
> 
>       a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some
> platforms")
>       45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
>       5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
> 
>     Suggested-by: Alex Deucher <alexander.deucher@amd.com>
>     Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fr%2F20210602021255.939090-1-
> evan.quan%40amd.com&amp;data=04%7C01%7Cevan.quan%40amd.com%7
> C2999a40d134142c2fdd608d9279b9ddb%7C3dd8961fe4884e608e11a82d994e
> 183d%7C0%7C0%7C637584371596788532%7CUnknown%7CTWFpbGZsb3d8ey
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C1000&amp;sdata=%2BgYq6SPJNCgqj%2By%2BLzkAGjmm5TONhApdYlze%
> 2FFz%2FiUM%3D&amp;reserved=0
>     Signed-off-by: Evan Quan <evan.quan@amd.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
> 
> > ---
> > ChangeLog v2->v3:
> > - further update for description part(suggested by Krzysztof)
> > ChangeLog v1->v2:
> > - cosmetic fix for description part(suggested by Krzysztof)
> > ---
> >  drivers/pci/quirks.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > b7e19bbb901a..70803ad6d2ac 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5176,7 +5176,8 @@
> > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422,
> > quirk_no_ext_tags);  static void quirk_amd_harvest_no_ats(struct pci_dev
> *pdev)  {
> >  	if ((pdev->device == 0x7312 && pdev->revision != 0x00) ||
> > -	    (pdev->device == 0x7340 && pdev->revision != 0xc5))
> > +	    (pdev->device == 0x7340 && pdev->revision != 0xc5) ||
> > +	    (pdev->device == 0x7341 && pdev->revision != 0x00))
> >  		return;
> >
> >  	if (pdev->device == 0x15d8) {
> > @@ -5203,6 +5204,7 @@
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> > 0x6900, quirk_amd_harvest_no_ats);
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312,
> > quirk_amd_harvest_no_ats);
> >  /* AMD Navi14 dGPU */
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> > quirk_amd_harvest_no_ats);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341,
> > +quirk_amd_harvest_no_ats);
> >  /* AMD Raven platform iGPU */
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8,
> > quirk_amd_harvest_no_ats);  #endif /* CONFIG_PCI_ATS */
> > --
> > 2.29.0
> >
Bjorn Helgaas June 7, 2021, 1:54 p.m. UTC | #3
On Mon, Jun 07, 2021 at 03:41:35AM +0000, Quan, Evan wrote:
> [AMD Official Use Only]
> 
> Thanks Bjorn.
> @Deucher, Alexander can you advise whether this is needed for stable kernel branches and which branches if yes?

Sorry, I should have done this already.  I went ahead and marked it
for stable.

> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Saturday, June 5, 2021 4:59 AM
> > To: Quan, Evan <Evan.Quan@amd.com>
> > Cc: linux-pci@vger.kernel.org; kw@linux.com; Deucher, Alexander
> > <Alexander.Deucher@amd.com>
> > Subject: Re: [PATCH V3] PCI: Add quirk for AMD Navi14 to disable ATS
> > support
> > 
> > On Wed, Jun 02, 2021 at 10:12:55AM +0800, Evan Quan wrote:
> > > Unexpected GPU hang was observed during runpm stress test on 0x7341
> > > rev 0x00. Further debugging shows broken ATS is related. Thus as a
> > > followup of commit 5e89cd303e3a ("PCI:
> > > Mark AMD Navi14 GPU rev 0xc5 ATS as broken"), we disable the ATS for
> > > the specific SKU also.
> > >
> > > Signed-off-by: Evan Quan <evan.quan@amd.com>
> > > Suggested-by: Alex Deucher <alexander.deucher@amd.com>
> > > Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
> > 
> > Applied to pci/virtualization for v5.14, thanks.
> > 
> > I updated the commit log like this:
> > 
> >     PCI: Mark AMD Navi14 GPU ATS as broken
> > 
> >     Observed unexpected GPU hang during runpm stress test on 0x7341 rev
> > 0x00.
> >     Further debugging shows broken ATS is related.
> > 
> >     Disable ATS on this part.  Similar issues on other devices:
> > 
> >       a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some
> > platforms")
> >       45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
> >       5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
> > 
> >     Suggested-by: Alex Deucher <alexander.deucher@amd.com>
> >     Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> > kernel.org%2Fr%2F20210602021255.939090-1-
> > evan.quan%40amd.com&amp;data=04%7C01%7Cevan.quan%40amd.com%7
> > C2999a40d134142c2fdd608d9279b9ddb%7C3dd8961fe4884e608e11a82d994e
> > 183d%7C0%7C0%7C637584371596788532%7CUnknown%7CTWFpbGZsb3d8ey
> > JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > 7C1000&amp;sdata=%2BgYq6SPJNCgqj%2By%2BLzkAGjmm5TONhApdYlze%
> > 2FFz%2FiUM%3D&amp;reserved=0
> >     Signed-off-by: Evan Quan <evan.quan@amd.com>
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >     Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
> > 
> > > ---
> > > ChangeLog v2->v3:
> > > - further update for description part(suggested by Krzysztof)
> > > ChangeLog v1->v2:
> > > - cosmetic fix for description part(suggested by Krzysztof)
> > > ---
> > >  drivers/pci/quirks.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > b7e19bbb901a..70803ad6d2ac 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -5176,7 +5176,8 @@
> > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422,
> > > quirk_no_ext_tags);  static void quirk_amd_harvest_no_ats(struct pci_dev
> > *pdev)  {
> > >  	if ((pdev->device == 0x7312 && pdev->revision != 0x00) ||
> > > -	    (pdev->device == 0x7340 && pdev->revision != 0xc5))
> > > +	    (pdev->device == 0x7340 && pdev->revision != 0xc5) ||
> > > +	    (pdev->device == 0x7341 && pdev->revision != 0x00))
> > >  		return;
> > >
> > >  	if (pdev->device == 0x15d8) {
> > > @@ -5203,6 +5204,7 @@
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> > > 0x6900, quirk_amd_harvest_no_ats);
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312,
> > > quirk_amd_harvest_no_ats);
> > >  /* AMD Navi14 dGPU */
> > >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> > > quirk_amd_harvest_no_ats);
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341,
> > > +quirk_amd_harvest_no_ats);
> > >  /* AMD Raven platform iGPU */
> > >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8,
> > > quirk_amd_harvest_no_ats);  #endif /* CONFIG_PCI_ATS */
> > > --
> > > 2.29.0
> > >
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b7e19bbb901a..70803ad6d2ac 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5176,7 +5176,8 @@  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags);
 static void quirk_amd_harvest_no_ats(struct pci_dev *pdev)
 {
 	if ((pdev->device == 0x7312 && pdev->revision != 0x00) ||
-	    (pdev->device == 0x7340 && pdev->revision != 0xc5))
+	    (pdev->device == 0x7340 && pdev->revision != 0xc5) ||
+	    (pdev->device == 0x7341 && pdev->revision != 0x00))
 		return;
 
 	if (pdev->device == 0x15d8) {
@@ -5203,6 +5204,7 @@  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats);
 /* AMD Navi14 dGPU */
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, quirk_amd_harvest_no_ats);
 /* AMD Raven platform iGPU */
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats);
 #endif /* CONFIG_PCI_ATS */