Message ID | 20210602021255.939090-1-evan.quan@amd.com |
---|---|
State | New |
Headers | show |
Series | [V3] PCI: Add quirk for AMD Navi14 to disable ATS support | expand |
On Wed, Jun 02, 2021 at 10:12:55AM +0800, Evan Quan wrote: > Unexpected GPU hang was observed during runpm stress test > on 0x7341 rev 0x00. Further debugging shows broken ATS is > related. Thus as a followup of commit 5e89cd303e3a ("PCI: > Mark AMD Navi14 GPU rev 0xc5 ATS as broken"), we disable > the ATS for the specific SKU also. > > Signed-off-by: Evan Quan <evan.quan@amd.com> > Suggested-by: Alex Deucher <alexander.deucher@amd.com> > Reviewed-by: Krzysztof Wilczyński <kw@linux.com> Applied to pci/virtualization for v5.14, thanks. I updated the commit log like this: PCI: Mark AMD Navi14 GPU ATS as broken Observed unexpected GPU hang during runpm stress test on 0x7341 rev 0x00. Further debugging shows broken ATS is related. Disable ATS on this part. Similar issues on other devices: a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms") 45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken") 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") Suggested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20210602021255.939090-1-evan.quan@amd.com Signed-off-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Krzysztof Wilczyński <kw@linux.com> > --- > ChangeLog v2->v3: > - further update for description part(suggested by Krzysztof) > ChangeLog v1->v2: > - cosmetic fix for description part(suggested by Krzysztof) > --- > drivers/pci/quirks.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index b7e19bbb901a..70803ad6d2ac 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -5176,7 +5176,8 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags); > static void quirk_amd_harvest_no_ats(struct pci_dev *pdev) > { > if ((pdev->device == 0x7312 && pdev->revision != 0x00) || > - (pdev->device == 0x7340 && pdev->revision != 0xc5)) > + (pdev->device == 0x7340 && pdev->revision != 0xc5) || > + (pdev->device == 0x7341 && pdev->revision != 0x00)) > return; > > if (pdev->device == 0x15d8) { > @@ -5203,6 +5204,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats); > /* AMD Navi14 dGPU */ > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, quirk_amd_harvest_no_ats); > /* AMD Raven platform iGPU */ > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats); > #endif /* CONFIG_PCI_ATS */ > -- > 2.29.0 >
[AMD Official Use Only] Thanks Bjorn. @Deucher, Alexander can you advise whether this is needed for stable kernel branches and which branches if yes? BR Evan > -----Original Message----- > From: Bjorn Helgaas <helgaas@kernel.org> > Sent: Saturday, June 5, 2021 4:59 AM > To: Quan, Evan <Evan.Quan@amd.com> > Cc: linux-pci@vger.kernel.org; kw@linux.com; Deucher, Alexander > <Alexander.Deucher@amd.com> > Subject: Re: [PATCH V3] PCI: Add quirk for AMD Navi14 to disable ATS > support > > On Wed, Jun 02, 2021 at 10:12:55AM +0800, Evan Quan wrote: > > Unexpected GPU hang was observed during runpm stress test on 0x7341 > > rev 0x00. Further debugging shows broken ATS is related. Thus as a > > followup of commit 5e89cd303e3a ("PCI: > > Mark AMD Navi14 GPU rev 0xc5 ATS as broken"), we disable the ATS for > > the specific SKU also. > > > > Signed-off-by: Evan Quan <evan.quan@amd.com> > > Suggested-by: Alex Deucher <alexander.deucher@amd.com> > > Reviewed-by: Krzysztof Wilczyński <kw@linux.com> > > Applied to pci/virtualization for v5.14, thanks. > > I updated the commit log like this: > > PCI: Mark AMD Navi14 GPU ATS as broken > > Observed unexpected GPU hang during runpm stress test on 0x7341 rev > 0x00. > Further debugging shows broken ATS is related. > > Disable ATS on this part. Similar issues on other devices: > > a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some > platforms") > 45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken") > 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") > > Suggested-by: Alex Deucher <alexander.deucher@amd.com> > Link: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore. > kernel.org%2Fr%2F20210602021255.939090-1- > evan.quan%40amd.com&data=04%7C01%7Cevan.quan%40amd.com%7 > C2999a40d134142c2fdd608d9279b9ddb%7C3dd8961fe4884e608e11a82d994e > 183d%7C0%7C0%7C637584371596788532%7CUnknown%7CTWFpbGZsb3d8ey > JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D% > 7C1000&sdata=%2BgYq6SPJNCgqj%2By%2BLzkAGjmm5TONhApdYlze% > 2FFz%2FiUM%3D&reserved=0 > Signed-off-by: Evan Quan <evan.quan@amd.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > Reviewed-by: Krzysztof Wilczyński <kw@linux.com> > > > --- > > ChangeLog v2->v3: > > - further update for description part(suggested by Krzysztof) > > ChangeLog v1->v2: > > - cosmetic fix for description part(suggested by Krzysztof) > > --- > > drivers/pci/quirks.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index > > b7e19bbb901a..70803ad6d2ac 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -5176,7 +5176,8 @@ > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, > > quirk_no_ext_tags); static void quirk_amd_harvest_no_ats(struct pci_dev > *pdev) { > > if ((pdev->device == 0x7312 && pdev->revision != 0x00) || > > - (pdev->device == 0x7340 && pdev->revision != 0xc5)) > > + (pdev->device == 0x7340 && pdev->revision != 0xc5) || > > + (pdev->device == 0x7341 && pdev->revision != 0x00)) > > return; > > > > if (pdev->device == 0x15d8) { > > @@ -5203,6 +5204,7 @@ > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, > > 0x6900, quirk_amd_harvest_no_ats); > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, > > quirk_amd_harvest_no_ats); > > /* AMD Navi14 dGPU */ > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, > > quirk_amd_harvest_no_ats); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, > > +quirk_amd_harvest_no_ats); > > /* AMD Raven platform iGPU */ > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, > > quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */ > > -- > > 2.29.0 > >
On Mon, Jun 07, 2021 at 03:41:35AM +0000, Quan, Evan wrote: > [AMD Official Use Only] > > Thanks Bjorn. > @Deucher, Alexander can you advise whether this is needed for stable kernel branches and which branches if yes? Sorry, I should have done this already. I went ahead and marked it for stable. > > -----Original Message----- > > From: Bjorn Helgaas <helgaas@kernel.org> > > Sent: Saturday, June 5, 2021 4:59 AM > > To: Quan, Evan <Evan.Quan@amd.com> > > Cc: linux-pci@vger.kernel.org; kw@linux.com; Deucher, Alexander > > <Alexander.Deucher@amd.com> > > Subject: Re: [PATCH V3] PCI: Add quirk for AMD Navi14 to disable ATS > > support > > > > On Wed, Jun 02, 2021 at 10:12:55AM +0800, Evan Quan wrote: > > > Unexpected GPU hang was observed during runpm stress test on 0x7341 > > > rev 0x00. Further debugging shows broken ATS is related. Thus as a > > > followup of commit 5e89cd303e3a ("PCI: > > > Mark AMD Navi14 GPU rev 0xc5 ATS as broken"), we disable the ATS for > > > the specific SKU also. > > > > > > Signed-off-by: Evan Quan <evan.quan@amd.com> > > > Suggested-by: Alex Deucher <alexander.deucher@amd.com> > > > Reviewed-by: Krzysztof Wilczyński <kw@linux.com> > > > > Applied to pci/virtualization for v5.14, thanks. > > > > I updated the commit log like this: > > > > PCI: Mark AMD Navi14 GPU ATS as broken > > > > Observed unexpected GPU hang during runpm stress test on 0x7341 rev > > 0x00. > > Further debugging shows broken ATS is related. > > > > Disable ATS on this part. Similar issues on other devices: > > > > a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some > > platforms") > > 45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken") > > 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") > > > > Suggested-by: Alex Deucher <alexander.deucher@amd.com> > > Link: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore. > > kernel.org%2Fr%2F20210602021255.939090-1- > > evan.quan%40amd.com&data=04%7C01%7Cevan.quan%40amd.com%7 > > C2999a40d134142c2fdd608d9279b9ddb%7C3dd8961fe4884e608e11a82d994e > > 183d%7C0%7C0%7C637584371596788532%7CUnknown%7CTWFpbGZsb3d8ey > > JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D% > > 7C1000&sdata=%2BgYq6SPJNCgqj%2By%2BLzkAGjmm5TONhApdYlze% > > 2FFz%2FiUM%3D&reserved=0 > > Signed-off-by: Evan Quan <evan.quan@amd.com> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > Reviewed-by: Krzysztof Wilczyński <kw@linux.com> > > > > > --- > > > ChangeLog v2->v3: > > > - further update for description part(suggested by Krzysztof) > > > ChangeLog v1->v2: > > > - cosmetic fix for description part(suggested by Krzysztof) > > > --- > > > drivers/pci/quirks.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index > > > b7e19bbb901a..70803ad6d2ac 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -5176,7 +5176,8 @@ > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, > > > quirk_no_ext_tags); static void quirk_amd_harvest_no_ats(struct pci_dev > > *pdev) { > > > if ((pdev->device == 0x7312 && pdev->revision != 0x00) || > > > - (pdev->device == 0x7340 && pdev->revision != 0xc5)) > > > + (pdev->device == 0x7340 && pdev->revision != 0xc5) || > > > + (pdev->device == 0x7341 && pdev->revision != 0x00)) > > > return; > > > > > > if (pdev->device == 0x15d8) { > > > @@ -5203,6 +5204,7 @@ > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, > > > 0x6900, quirk_amd_harvest_no_ats); > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, > > > quirk_amd_harvest_no_ats); > > > /* AMD Navi14 dGPU */ > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, > > > quirk_amd_harvest_no_ats); > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, > > > +quirk_amd_harvest_no_ats); > > > /* AMD Raven platform iGPU */ > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, > > > quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */ > > > -- > > > 2.29.0 > > >
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index b7e19bbb901a..70803ad6d2ac 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5176,7 +5176,8 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags); static void quirk_amd_harvest_no_ats(struct pci_dev *pdev) { if ((pdev->device == 0x7312 && pdev->revision != 0x00) || - (pdev->device == 0x7340 && pdev->revision != 0xc5)) + (pdev->device == 0x7340 && pdev->revision != 0xc5) || + (pdev->device == 0x7341 && pdev->revision != 0x00)) return; if (pdev->device == 0x15d8) { @@ -5203,6 +5204,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats); /* AMD Navi14 dGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, quirk_amd_harvest_no_ats); /* AMD Raven platform iGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */