mbox series

[0/2] Introduce PCI_FIXUP_IOMMU

Message ID 1590493749-13823-1-git-send-email-zhangfei.gao@linaro.org
Headers show
Series Introduce PCI_FIXUP_IOMMU | expand

Message

Zhangfei Gao May 26, 2020, 11:49 a.m. UTC
Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed, suggested by Joerg, [1]

For example:
Hisilicon platform device need fixup in
drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+    struct iommu_fwspec *fwspec;
+
+    pdev->eetlp_prefix_path = 1;
+    fwspec = dev_iommu_fwspec_get(&pdev->dev);
+    if (fwspec)
+        fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva); 

[1] https://www.spinics.net/lists/iommu/msg44591.html
[2] https://www.spinics.net/lists/linux-pci/msg94559.html

Zhangfei Gao (2):
  PCI: Introduce PCI_FIXUP_IOMMU
  iommu: calling pci_fixup_iommu in iommu_fwspec_init

 drivers/iommu/iommu.c             | 4 ++++
 drivers/pci/quirks.c              | 7 +++++++
 include/asm-generic/vmlinux.lds.h | 3 +++
 include/linux/pci.h               | 8 ++++++++
 4 files changed, 22 insertions(+)

Comments

Greg Kroah-Hartman May 27, 2020, 9 a.m. UTC | #1
On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:
> Some platform devices appear as PCI but are actually on the AMBA bus,

Why would these devices not just show up on the AMBA bus and use all of
that logic instead of being a PCI device and having to go through odd
fixes like this?

thanks,

greg k-h
Arnd Bergmann May 27, 2020, 9:53 a.m. UTC | #2
On Wed, May 27, 2020 at 11:00 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:
> > Some platform devices appear as PCI but are actually on the AMBA bus,
>
> Why would these devices not just show up on the AMBA bus and use all of
> that logic instead of being a PCI device and having to go through odd
> fixes like this?

There is a general move to having hardware be discoverable even with
ARM processors. Having on-chip devices be discoverable using PCI config
space is how x86 SoCs usually do it, and that is generally a good thing
as it means we don't need to describe them in DT

I guess as the hardware designers are still learning about it, this is not
always done correctly. In general, we can also describe PCI devices on
DT and do fixups during the probing there, but I suspect that won't work
as easily using ACPI probing, so the fixup is keyed off the hardware ID,
again as is common for x86 on-chip devices.

      Arnd
Zhangfei Gao May 27, 2020, 1:51 p.m. UTC | #3
On 2020/5/27 下午5:53, Arnd Bergmann wrote:
> On Wed, May 27, 2020 at 11:00 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
>> On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:
>>> Some platform devices appear as PCI but are actually on the AMBA bus,
>> Why would these devices not just show up on the AMBA bus and use all of
>> that logic instead of being a PCI device and having to go through odd
>> fixes like this?
> There is a general move to having hardware be discoverable even with
> ARM processors. Having on-chip devices be discoverable using PCI config
> space is how x86 SoCs usually do it, and that is generally a good thing
> as it means we don't need to describe them in DT
>
> I guess as the hardware designers are still learning about it, this is not
> always done correctly. In general, we can also describe PCI devices on
> DT and do fixups during the probing there, but I suspect that won't work
> as easily using ACPI probing, so the fixup is keyed off the hardware ID,
> again as is common for x86 on-chip devices.
>
>   
Yes, thanks Arnd :)

In order to use pasid, io page fault has to be supported,
either by PCI PRI feature (from pci device) or stall mode from smmu 
(platform device).
Here is letting system know the platform device can support smmu stall 
mode, as a result support pasid.
While stall is not a pci capability, so we use a fixup here.

Thanks
Bjorn Helgaas May 27, 2020, 6:18 p.m. UTC | #4
On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:
> Some platform devices appear as PCI but are actually on the AMBA bus,
> and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
> Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
> is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
> down iommu probing as all devices in fixup final list will be
> reprocessed, suggested by Joerg, [1]

Is this slowdown significant?  We already iterate over every device
when applying PCI_FIXUP_FINAL quirks, so if we used the existing
PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
adding two more iterations to the loop in pci_do_fixups() that tries
to match quirks against the current device.  I doubt that would be a
measurable slowdown.

> For example:
> Hisilicon platform device need fixup in
> drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]
> 
> +static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
> +{
> +    struct iommu_fwspec *fwspec;
> +
> +    pdev->eetlp_prefix_path = 1;
> +    fwspec = dev_iommu_fwspec_get(&pdev->dev);
> +    if (fwspec)
> +        fwspec->can_stall = 1;
> +}
> +
> +DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
> +DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva); 
> 
> [1] https://www.spinics.net/lists/iommu/msg44591.html
> [2] https://www.spinics.net/lists/linux-pci/msg94559.html

If you reference these in the commit logs, please use lore.kernel.org
links instead of spinics.

> Zhangfei Gao (2):
>   PCI: Introduce PCI_FIXUP_IOMMU
>   iommu: calling pci_fixup_iommu in iommu_fwspec_init
> 
>  drivers/iommu/iommu.c             | 4 ++++
>  drivers/pci/quirks.c              | 7 +++++++
>  include/asm-generic/vmlinux.lds.h | 3 +++
>  include/linux/pci.h               | 8 ++++++++
>  4 files changed, 22 insertions(+)
> 
> -- 
> 2.7.4
>
Zhangfei Gao May 28, 2020, 6:46 a.m. UTC | #5
Hi, Bjorn

On 2020/5/28 上午2:18, Bjorn Helgaas wrote:
> On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:
>> Some platform devices appear as PCI but are actually on the AMBA bus,
>> and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
>> Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
>> is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
>> down iommu probing as all devices in fixup final list will be
>> reprocessed, suggested by Joerg, [1]
> Is this slowdown significant?  We already iterate over every device
> when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> adding two more iterations to the loop in pci_do_fixups() that tries
> to match quirks against the current device.  I doubt that would be a
> measurable slowdown.
I do not notice the difference when compared fixup_iommu and fixup_final 
via get_jiffies_64,
since in our platform no other pci fixup is registered.

Here the plan is adding pci_fixup_device in iommu_fwspec_init,
so if using fixup_final the iteration will be done again here.

>
>> For example:
>> Hisilicon platform device need fixup in
>> drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]
>>
>> +static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
>> +{
>> +    struct iommu_fwspec *fwspec;
>> +
>> +    pdev->eetlp_prefix_path = 1;
>> +    fwspec = dev_iommu_fwspec_get(&pdev->dev);
>> +    if (fwspec)
>> +        fwspec->can_stall = 1;
>> +}
>> +
>> +DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
>> +DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
>>
>> [1] https://www.spinics.net/lists/iommu/msg44591.html
>> [2] https://www.spinics.net/lists/linux-pci/msg94559.html
> If you reference these in the commit logs, please use lore.kernel.org
> links instead of spinics.
Got it, thanks Bjorn.
Joerg Roedel May 28, 2020, 7:33 a.m. UTC | #6
On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> Is this slowdown significant?  We already iterate over every device
> when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> adding two more iterations to the loop in pci_do_fixups() that tries
> to match quirks against the current device.  I doubt that would be a
> measurable slowdown.

I don't know how significant it is, but I remember people complaining
about adding new PCI quirks because it takes too long for them to run
them all. That was in the discussion about the quirk disabling ATS on
AMD Stoney systems.

So it probably depends on how many PCI devices are in the system whether
it causes any measureable slowdown.

Regards,

	Joerg
Bjorn Helgaas June 1, 2020, 5:41 p.m. UTC | #7
On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > Is this slowdown significant?  We already iterate over every device
> > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > adding two more iterations to the loop in pci_do_fixups() that tries
> > to match quirks against the current device.  I doubt that would be a
> > measurable slowdown.
> 
> I don't know how significant it is, but I remember people complaining
> about adding new PCI quirks because it takes too long for them to run
> them all. That was in the discussion about the quirk disabling ATS on
> AMD Stoney systems.
> 
> So it probably depends on how many PCI devices are in the system whether
> it causes any measureable slowdown.

I found this [1] from Paul Menzel, which was a slowdown caused by
quirk_usb_early_handoff().  I think the real problem is individual
quirks that take a long time.

The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
course, they're only run for matching devices anyway.  So I'd rather
keep them as PCI_FIXUP_FINAL than add a whole new phase.

Bjorn

[1] https://lore.kernel.org/linux-pci/b1533fd5-1fae-7256-9597-36d3d5de9d2a@molgen.mpg.de/
Zhangfei Gao June 4, 2020, 1:33 p.m. UTC | #8
On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
>> On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
>>> Is this slowdown significant?  We already iterate over every device
>>> when applying PCI_FIXUP_FINAL quirks, so if we used the existing
>>> PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
>>> adding two more iterations to the loop in pci_do_fixups() that tries
>>> to match quirks against the current device.  I doubt that would be a
>>> measurable slowdown.
>> I don't know how significant it is, but I remember people complaining
>> about adding new PCI quirks because it takes too long for them to run
>> them all. That was in the discussion about the quirk disabling ATS on
>> AMD Stoney systems.
>>
>> So it probably depends on how many PCI devices are in the system whether
>> it causes any measureable slowdown.
> I found this [1] from Paul Menzel, which was a slowdown caused by
> quirk_usb_early_handoff().  I think the real problem is individual
> quirks that take a long time.
>
> The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> course, they're only run for matching devices anyway.  So I'd rather
> keep them as PCI_FIXUP_FINAL than add a whole new phase.
>
Thanks Bjorn for taking time for this.
If so, it would be much simpler.

+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct 
fwnode_handle *iommu_fwnode,
         fwspec->iommu_fwnode = iommu_fwnode;
         fwspec->ops = ops;
         dev_iommu_fwspec_set(dev, fwspec);
+
+       if (dev_is_pci(dev))
+               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
Will send this when 5.8-rc1 is open.

Thanks
Bjorn Helgaas June 5, 2020, 11:19 p.m. UTC | #9
On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
> On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> > > On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > > > Is this slowdown significant?  We already iterate over every device
> > > > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > > > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > > > adding two more iterations to the loop in pci_do_fixups() that tries
> > > > to match quirks against the current device.  I doubt that would be a
> > > > measurable slowdown.
> > > I don't know how significant it is, but I remember people complaining
> > > about adding new PCI quirks because it takes too long for them to run
> > > them all. That was in the discussion about the quirk disabling ATS on
> > > AMD Stoney systems.
> > > 
> > > So it probably depends on how many PCI devices are in the system whether
> > > it causes any measureable slowdown.
> > I found this [1] from Paul Menzel, which was a slowdown caused by
> > quirk_usb_early_handoff().  I think the real problem is individual
> > quirks that take a long time.
> > 
> > The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> > course, they're only run for matching devices anyway.  So I'd rather
> > keep them as PCI_FIXUP_FINAL than add a whole new phase.
> > 
> Thanks Bjorn for taking time for this.
> If so, it would be much simpler.
> 
> +++ b/drivers/iommu/iommu.c
> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> fwnode_handle *iommu_fwnode,
>         fwspec->iommu_fwnode = iommu_fwnode;
>         fwspec->ops = ops;
>         dev_iommu_fwspec_set(dev, fwspec);
> +
> +       if (dev_is_pci(dev))
> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> +
> 
> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?

Bjorn
Zhangfei Gao June 8, 2020, 2:54 a.m. UTC | #10
Hi, Bjorn

On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
>> On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
>>> On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
>>>> On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
>>>>> Is this slowdown significant?  We already iterate over every device
>>>>> when applying PCI_FIXUP_FINAL quirks, so if we used the existing
>>>>> PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
>>>>> adding two more iterations to the loop in pci_do_fixups() that tries
>>>>> to match quirks against the current device.  I doubt that would be a
>>>>> measurable slowdown.
>>>> I don't know how significant it is, but I remember people complaining
>>>> about adding new PCI quirks because it takes too long for them to run
>>>> them all. That was in the discussion about the quirk disabling ATS on
>>>> AMD Stoney systems.
>>>>
>>>> So it probably depends on how many PCI devices are in the system whether
>>>> it causes any measureable slowdown.
>>> I found this [1] from Paul Menzel, which was a slowdown caused by
>>> quirk_usb_early_handoff().  I think the real problem is individual
>>> quirks that take a long time.
>>>
>>> The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
>>> course, they're only run for matching devices anyway.  So I'd rather
>>> keep them as PCI_FIXUP_FINAL than add a whole new phase.
>>>
>> Thanks Bjorn for taking time for this.
>> If so, it would be much simpler.
>>
>> +++ b/drivers/iommu/iommu.c
>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
>> fwnode_handle *iommu_fwnode,
>>          fwspec->iommu_fwnode = iommu_fwnode;
>>          fwspec->ops = ops;
>>          dev_iommu_fwspec_set(dev, fwspec);
>> +
>> +       if (dev_is_pci(dev))
>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
>> +
>>
>> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
>> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
>> Will send this when 5.8-rc1 is open.
> Wait, this whole fixup approach seems wrong to me.  No matter how you
> do the fixup, it's still a fixup, which means it requires ongoing
> maintenance.  Surely we don't want to have to add the Vendor/Device ID
> for every new AMBA device that comes along, do we?
>
>
Here the fake pci device has standard PCI cfg space, but physical 
implementation is base on AMBA
They can provide pasid feature.
However,
1, does not support tlp since they are not real pci devices.
2. does not support pri, instead support stall (provided by smmu)
And stall is not a pci feature, so it is not described in struct 
pci_dev, but in struct iommu_fwspec.
So we use this fixup to tell pci system that the devices can support 
stall, and hereby support pasid.

Thanks
Bjorn Helgaas June 8, 2020, 4:41 p.m. UTC | #11
On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
> > > On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
> > > > On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> > > > > On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > > > > > Is this slowdown significant?  We already iterate over every device
> > > > > > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > > > > > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > > > > > adding two more iterations to the loop in pci_do_fixups() that tries
> > > > > > to match quirks against the current device.  I doubt that would be a
> > > > > > measurable slowdown.
> > > > > I don't know how significant it is, but I remember people complaining
> > > > > about adding new PCI quirks because it takes too long for them to run
> > > > > them all. That was in the discussion about the quirk disabling ATS on
> > > > > AMD Stoney systems.
> > > > > 
> > > > > So it probably depends on how many PCI devices are in the system whether
> > > > > it causes any measureable slowdown.
> > > > I found this [1] from Paul Menzel, which was a slowdown caused by
> > > > quirk_usb_early_handoff().  I think the real problem is individual
> > > > quirks that take a long time.
> > > > 
> > > > The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> > > > course, they're only run for matching devices anyway.  So I'd rather
> > > > keep them as PCI_FIXUP_FINAL than add a whole new phase.
> > > > 
> > > Thanks Bjorn for taking time for this.
> > > If so, it would be much simpler.
> > > 
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > > fwnode_handle *iommu_fwnode,
> > >          fwspec->iommu_fwnode = iommu_fwnode;
> > >          fwspec->ops = ops;
> > >          dev_iommu_fwspec_set(dev, fwspec);
> > > +
> > > +       if (dev_is_pci(dev))
> > > +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > > +
> > > 
> > > Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> > > Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> > > Will send this when 5.8-rc1 is open.
> >
> > Wait, this whole fixup approach seems wrong to me.  No matter how you
> > do the fixup, it's still a fixup, which means it requires ongoing
> > maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > for every new AMBA device that comes along, do we?
> > 
> Here the fake pci device has standard PCI cfg space, but physical
> implementation is base on AMBA
> They can provide pasid feature.
> However,
> 1, does not support tlp since they are not real pci devices.
> 2. does not support pri, instead support stall (provided by smmu)
> And stall is not a pci feature, so it is not described in struct pci_dev,
> but in struct iommu_fwspec.
> So we use this fixup to tell pci system that the devices can support stall,
> and hereby support pasid.

This did not answer my question.  Are you proposing that we update a
quirk every time a new AMBA device is released?  I don't think that
would be a good model.
Zhangfei Gao June 9, 2020, 4:01 a.m. UTC | #12
Hi, Bjorn

On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
>> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
>>> On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
>>>> On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
>>>>> On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
>>>>>> On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
>>>>>>> Is this slowdown significant?  We already iterate over every device
>>>>>>> when applying PCI_FIXUP_FINAL quirks, so if we used the existing
>>>>>>> PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
>>>>>>> adding two more iterations to the loop in pci_do_fixups() that tries
>>>>>>> to match quirks against the current device.  I doubt that would be a
>>>>>>> measurable slowdown.
>>>>>> I don't know how significant it is, but I remember people complaining
>>>>>> about adding new PCI quirks because it takes too long for them to run
>>>>>> them all. That was in the discussion about the quirk disabling ATS on
>>>>>> AMD Stoney systems.
>>>>>>
>>>>>> So it probably depends on how many PCI devices are in the system whether
>>>>>> it causes any measureable slowdown.
>>>>> I found this [1] from Paul Menzel, which was a slowdown caused by
>>>>> quirk_usb_early_handoff().  I think the real problem is individual
>>>>> quirks that take a long time.
>>>>>
>>>>> The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
>>>>> course, they're only run for matching devices anyway.  So I'd rather
>>>>> keep them as PCI_FIXUP_FINAL than add a whole new phase.
>>>>>
>>>> Thanks Bjorn for taking time for this.
>>>> If so, it would be much simpler.
>>>>
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
>>>> fwnode_handle *iommu_fwnode,
>>>>           fwspec->iommu_fwnode = iommu_fwnode;
>>>>           fwspec->ops = ops;
>>>>           dev_iommu_fwspec_set(dev, fwspec);
>>>> +
>>>> +       if (dev_is_pci(dev))
>>>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
>>>> +
>>>>
>>>> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
>>>> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
>>>> Will send this when 5.8-rc1 is open.
>>> Wait, this whole fixup approach seems wrong to me.  No matter how you
>>> do the fixup, it's still a fixup, which means it requires ongoing
>>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
>>> for every new AMBA device that comes along, do we?
>>>
>> Here the fake pci device has standard PCI cfg space, but physical
>> implementation is base on AMBA
>> They can provide pasid feature.
>> However,
>> 1, does not support tlp since they are not real pci devices.
>> 2. does not support pri, instead support stall (provided by smmu)
>> And stall is not a pci feature, so it is not described in struct pci_dev,
>> but in struct iommu_fwspec.
>> So we use this fixup to tell pci system that the devices can support stall,
>> and hereby support pasid.
> This did not answer my question.  Are you proposing that we update a
> quirk every time a new AMBA device is released?  I don't think that
> would be a good model.
Yes, you are right, but we do not have any better idea yet.
Currently we have three fake pci devices, which support stall and pasid.
We have to let pci system know the device can support pasid, because of 
stall feature, though not support pri.
Do you have any other ideas?

Thanks
Arnd Bergmann June 9, 2020, 9:15 a.m. UTC | #13
On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao <zhangfei.gao@linaro.org> wrote:
> On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> >> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> >>>> +++ b/drivers/iommu/iommu.c
> >>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> >>>> fwnode_handle *iommu_fwnode,
> >>>>           fwspec->iommu_fwnode = iommu_fwnode;
> >>>>           fwspec->ops = ops;
> >>>>           dev_iommu_fwspec_set(dev, fwspec);
> >>>> +
> >>>> +       if (dev_is_pci(dev))
> >>>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> >>>> +
> >>>>
> >>>> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> >>>> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> >>>> Will send this when 5.8-rc1 is open.
> >>> Wait, this whole fixup approach seems wrong to me.  No matter how you
> >>> do the fixup, it's still a fixup, which means it requires ongoing
> >>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
> >>> for every new AMBA device that comes along, do we?
> >>>
> >> Here the fake pci device has standard PCI cfg space, but physical
> >> implementation is base on AMBA
> >> They can provide pasid feature.
> >> However,
> >> 1, does not support tlp since they are not real pci devices.
> >> 2. does not support pri, instead support stall (provided by smmu)
> >> And stall is not a pci feature, so it is not described in struct pci_dev,
> >> but in struct iommu_fwspec.
> >> So we use this fixup to tell pci system that the devices can support stall,
> >> and hereby support pasid.
> > This did not answer my question.  Are you proposing that we update a
> > quirk every time a new AMBA device is released?  I don't think that
> > would be a good model.
>
> Yes, you are right, but we do not have any better idea yet.
> Currently we have three fake pci devices, which support stall and pasid.
> We have to let pci system know the device can support pasid, because of
> stall feature, though not support pri.
> Do you have any other ideas?

It sounds like the best way would be to allocate a PCI capability for it, so
detection can be done through config space, at least in future devices,
or possibly after a firmware update if the config space in your system
is controlled by firmware somewhere.  Once there is a proper mechanism
to do this, using fixups to detect the early devices that don't use that
should be uncontroversial. I have no idea what the process or timeline
is to add new capabilities into the PCIe specification, or if this one
would be acceptable to the PCI SIG at all.

If detection cannot be done through PCI config space, the next best
alternative is to pass auxiliary data through firmware. On DT based
machines, you can list non-hotpluggable PCIe devices and add custom
properties that could be read during device enumeration. I assume
ACPI has something similar, but I have not done that.

      Arnd
Bjorn Helgaas June 9, 2020, 4:49 p.m. UTC | #14
On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
> On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao <zhangfei.gao@linaro.org> wrote:
> > On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> > >> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > >>>> +++ b/drivers/iommu/iommu.c
> > >>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > >>>> fwnode_handle *iommu_fwnode,
> > >>>>           fwspec->iommu_fwnode = iommu_fwnode;
> > >>>>           fwspec->ops = ops;
> > >>>>           dev_iommu_fwspec_set(dev, fwspec);
> > >>>> +
> > >>>> +       if (dev_is_pci(dev))
> > >>>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > >>>> +
> > >>>>
> > >>>> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> > >>>> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> > >>>> Will send this when 5.8-rc1 is open.
> > >>> Wait, this whole fixup approach seems wrong to me.  No matter how you
> > >>> do the fixup, it's still a fixup, which means it requires ongoing
> > >>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > >>> for every new AMBA device that comes along, do we?
> > >>>
> > >> Here the fake pci device has standard PCI cfg space, but physical
> > >> implementation is base on AMBA
> > >> They can provide pasid feature.
> > >> However,
> > >> 1, does not support tlp since they are not real pci devices.
> > >> 2. does not support pri, instead support stall (provided by smmu)
> > >> And stall is not a pci feature, so it is not described in struct pci_dev,
> > >> but in struct iommu_fwspec.
> > >> So we use this fixup to tell pci system that the devices can support stall,
> > >> and hereby support pasid.
> > > This did not answer my question.  Are you proposing that we update a
> > > quirk every time a new AMBA device is released?  I don't think that
> > > would be a good model.
> >
> > Yes, you are right, but we do not have any better idea yet.
> > Currently we have three fake pci devices, which support stall and pasid.
> > We have to let pci system know the device can support pasid, because of
> > stall feature, though not support pri.
> > Do you have any other ideas?
> 
> It sounds like the best way would be to allocate a PCI capability for it, so
> detection can be done through config space, at least in future devices,
> or possibly after a firmware update if the config space in your system
> is controlled by firmware somewhere.  Once there is a proper mechanism
> to do this, using fixups to detect the early devices that don't use that
> should be uncontroversial. I have no idea what the process or timeline
> is to add new capabilities into the PCIe specification, or if this one
> would be acceptable to the PCI SIG at all.

That sounds like a possibility.  The spec already defines a
Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
be a candidate.

> If detection cannot be done through PCI config space, the next best
> alternative is to pass auxiliary data through firmware. On DT based
> machines, you can list non-hotpluggable PCIe devices and add custom
> properties that could be read during device enumeration. I assume
> ACPI has something similar, but I have not done that.

ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
like this better than a PCI capability because the property you need
to expose is not a PCI property.
Zhangfei Gao June 11, 2020, 2:54 a.m. UTC | #15
On 2020/6/10 上午12:49, Bjorn Helgaas wrote:
> On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
>> On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao <zhangfei.gao@linaro.org> wrote:
>>> On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
>>>> On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
>>>>> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
>>>>>>> +++ b/drivers/iommu/iommu.c
>>>>>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
>>>>>>> fwnode_handle *iommu_fwnode,
>>>>>>>            fwspec->iommu_fwnode = iommu_fwnode;
>>>>>>>            fwspec->ops = ops;
>>>>>>>            dev_iommu_fwspec_set(dev, fwspec);
>>>>>>> +
>>>>>>> +       if (dev_is_pci(dev))
>>>>>>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
>>>>>>> +
>>>>>>>
>>>>>>> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
>>>>>>> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
>>>>>>> Will send this when 5.8-rc1 is open.
>>>>>> Wait, this whole fixup approach seems wrong to me.  No matter how you
>>>>>> do the fixup, it's still a fixup, which means it requires ongoing
>>>>>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
>>>>>> for every new AMBA device that comes along, do we?
>>>>>>
>>>>> Here the fake pci device has standard PCI cfg space, but physical
>>>>> implementation is base on AMBA
>>>>> They can provide pasid feature.
>>>>> However,
>>>>> 1, does not support tlp since they are not real pci devices.
>>>>> 2. does not support pri, instead support stall (provided by smmu)
>>>>> And stall is not a pci feature, so it is not described in struct pci_dev,
>>>>> but in struct iommu_fwspec.
>>>>> So we use this fixup to tell pci system that the devices can support stall,
>>>>> and hereby support pasid.
>>>> This did not answer my question.  Are you proposing that we update a
>>>> quirk every time a new AMBA device is released?  I don't think that
>>>> would be a good model.
>>> Yes, you are right, but we do not have any better idea yet.
>>> Currently we have three fake pci devices, which support stall and pasid.
>>> We have to let pci system know the device can support pasid, because of
>>> stall feature, though not support pri.
>>> Do you have any other ideas?
>> It sounds like the best way would be to allocate a PCI capability for it, so
>> detection can be done through config space, at least in future devices,
>> or possibly after a firmware update if the config space in your system
>> is controlled by firmware somewhere.  Once there is a proper mechanism
>> to do this, using fixups to detect the early devices that don't use that
>> should be uncontroversial. I have no idea what the process or timeline
>> is to add new capabilities into the PCIe specification, or if this one
>> would be acceptable to the PCI SIG at all.
> That sounds like a possibility.  The spec already defines a
> Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
> be a candidate.
Will investigate this, thanks Bjorn
>
>> If detection cannot be done through PCI config space, the next best
>> alternative is to pass auxiliary data through firmware. On DT based
>> machines, you can list non-hotpluggable PCIe devices and add custom
>> properties that could be read during device enumeration. I assume
>> ACPI has something similar, but I have not done that.
Yes, thanks Arnd
> ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
> like this better than a PCI capability because the property you need
> to expose is not a PCI property.
_DSM may not workable, since it is working in runtime.
We need stall information in init stage, neither too early (after 
allocation of iommu_fwspec)
nor too late (before arm_smmu_add_device ).

By the way,
It would be a long time if we need modify either pcie spec or acpi spec.
Can we use pci_fixup_device in iommu_fwspec_init first, it is relatively 
simple
and meet the requirement of platform device using pasid, and they are 
already in product.

Thanks
Bjorn Helgaas June 11, 2020, 1:44 p.m. UTC | #16
On Thu, Jun 11, 2020 at 10:54:45AM +0800, Zhangfei Gao wrote:
> On 2020/6/10 上午12:49, Bjorn Helgaas wrote:
> > On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
> > > On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao <zhangfei.gao@linaro.org> wrote:
> > > > On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > > > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> > > > > > On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > > > > > > > +++ b/drivers/iommu/iommu.c
> > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > > > > > > > fwnode_handle *iommu_fwnode,
> > > > > > > >            fwspec->iommu_fwnode = iommu_fwnode;
> > > > > > > >            fwspec->ops = ops;
> > > > > > > >            dev_iommu_fwspec_set(dev, fwspec);
> > > > > > > > +
> > > > > > > > +       if (dev_is_pci(dev))
> > > > > > > > +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > > > > > > > +
> > > > > > > > 
> > > > > > > > Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> > > > > > > > Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> > > > > > > > Will send this when 5.8-rc1 is open.
> > > > > > > Wait, this whole fixup approach seems wrong to me.  No matter how you
> > > > > > > do the fixup, it's still a fixup, which means it requires ongoing
> > > > > > > maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > > > > > > for every new AMBA device that comes along, do we?
> > > > > > > 
> > > > > > Here the fake pci device has standard PCI cfg space, but physical
> > > > > > implementation is base on AMBA
> > > > > > They can provide pasid feature.
> > > > > > However,
> > > > > > 1, does not support tlp since they are not real pci devices.
> > > > > > 2. does not support pri, instead support stall (provided by smmu)
> > > > > > And stall is not a pci feature, so it is not described in struct pci_dev,
> > > > > > but in struct iommu_fwspec.
> > > > > > So we use this fixup to tell pci system that the devices can support stall,
> > > > > > and hereby support pasid.
> > > > > This did not answer my question.  Are you proposing that we update a
> > > > > quirk every time a new AMBA device is released?  I don't think that
> > > > > would be a good model.
> > > > Yes, you are right, but we do not have any better idea yet.
> > > > Currently we have three fake pci devices, which support stall and pasid.
> > > > We have to let pci system know the device can support pasid, because of
> > > > stall feature, though not support pri.
> > > > Do you have any other ideas?
> > > It sounds like the best way would be to allocate a PCI capability for it, so
> > > detection can be done through config space, at least in future devices,
> > > or possibly after a firmware update if the config space in your system
> > > is controlled by firmware somewhere.  Once there is a proper mechanism
> > > to do this, using fixups to detect the early devices that don't use that
> > > should be uncontroversial. I have no idea what the process or timeline
> > > is to add new capabilities into the PCIe specification, or if this one
> > > would be acceptable to the PCI SIG at all.
> > That sounds like a possibility.  The spec already defines a
> > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
> > be a candidate.
> Will investigate this, thanks Bjorn

FWIW, there's also a Vendor-Specific Capability that can appear in the
first 256 bytes of config space (the Vendor-Specific Extended
Capability must appear in the "Extended Configuration Space" from
0x100-0xfff).

> > > If detection cannot be done through PCI config space, the next best
> > > alternative is to pass auxiliary data through firmware. On DT based
> > > machines, you can list non-hotpluggable PCIe devices and add custom
> > > properties that could be read during device enumeration. I assume
> > > ACPI has something similar, but I have not done that.
> Yes, thanks Arnd
> > ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
> > like this better than a PCI capability because the property you need
> > to expose is not a PCI property.
> _DSM may not workable, since it is working in runtime.
> We need stall information in init stage, neither too early (after allocation
> of iommu_fwspec)
> nor too late (before arm_smmu_add_device ).

I'm not aware of a restriction on when _DSM can be evaluated.  I'm
looking at ACPI v6.3, sec 9.1.1.  Are you seeing something different?

> By the way, It would be a long time if we need modify either pcie
> spec or acpi spec.  Can we use pci_fixup_device in iommu_fwspec_init
> first, it is relatively simple and meet the requirement of platform
> device using pasid, and they are already in product.

Neither the PCI Vendor-Specific Capability nor the ACPI _DSM requires
a spec change.  Both can be completely vendor-defined.
Zhangfei Gao June 13, 2020, 2:30 p.m. UTC | #17
On 2020/6/11 下午9:44, Bjorn Helgaas wrote:
> +++ b/drivers/iommu/iommu.c
>>>>>>>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
>>>>>>>>> fwnode_handle *iommu_fwnode,
>>>>>>>>>             fwspec->iommu_fwnode = iommu_fwnode;
>>>>>>>>>             fwspec->ops = ops;
>>>>>>>>>             dev_iommu_fwspec_set(dev, fwspec);
>>>>>>>>> +
>>>>>>>>> +       if (dev_is_pci(dev))
>>>>>>>>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
>>>>>>>>> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
>>>>>>>>> Will send this when 5.8-rc1 is open.
>>>>>>>> Wait, this whole fixup approach seems wrong to me.  No matter how you
>>>>>>>> do the fixup, it's still a fixup, which means it requires ongoing
>>>>>>>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
>>>>>>>> for every new AMBA device that comes along, do we?
>>>>>>>>
>>>>>>> Here the fake pci device has standard PCI cfg space, but physical
>>>>>>> implementation is base on AMBA
>>>>>>> They can provide pasid feature.
>>>>>>> However,
>>>>>>> 1, does not support tlp since they are not real pci devices.
>>>>>>> 2. does not support pri, instead support stall (provided by smmu)
>>>>>>> And stall is not a pci feature, so it is not described in struct pci_dev,
>>>>>>> but in struct iommu_fwspec.
>>>>>>> So we use this fixup to tell pci system that the devices can support stall,
>>>>>>> and hereby support pasid.
>>>>>> This did not answer my question.  Are you proposing that we update a
>>>>>> quirk every time a new AMBA device is released?  I don't think that
>>>>>> would be a good model.
>>>>> Yes, you are right, but we do not have any better idea yet.
>>>>> Currently we have three fake pci devices, which support stall and pasid.
>>>>> We have to let pci system know the device can support pasid, because of
>>>>> stall feature, though not support pri.
>>>>> Do you have any other ideas?
>>>> It sounds like the best way would be to allocate a PCI capability for it, so
>>>> detection can be done through config space, at least in future devices,
>>>> or possibly after a firmware update if the config space in your system
>>>> is controlled by firmware somewhere.  Once there is a proper mechanism
>>>> to do this, using fixups to detect the early devices that don't use that
>>>> should be uncontroversial. I have no idea what the process or timeline
>>>> is to add new capabilities into the PCIe specification, or if this one
>>>> would be acceptable to the PCI SIG at all.
>>> That sounds like a possibility.  The spec already defines a
>>> Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
>>> be a candidate.
>> Will investigate this, thanks Bjorn
> FWIW, there's also a Vendor-Specific Capability that can appear in the
> first 256 bytes of config space (the Vendor-Specific Extended
> Capability must appear in the "Extended Configuration Space" from
> 0x100-0xfff).
Unfortunately our silicon does not have either Vendor-Specific Capability or
Vendor-Specific Extended Capability.

Studied commit 8531e283bee66050734fb0e89d53e85fd5ce24a4
Looks this method requires adding member (like can_stall) to struct 
pci_dev, looks difficult.

>
>>>> If detection cannot be done through PCI config space, the next best
>>>> alternative is to pass auxiliary data through firmware. On DT based
>>>> machines, you can list non-hotpluggable PCIe devices and add custom
>>>> properties that could be read during device enumeration. I assume
>>>> ACPI has something similar, but I have not done that.
>> Yes, thanks Arnd
>>> ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
>>> like this better than a PCI capability because the property you need
>>> to expose is not a PCI property.
>> _DSM may not workable, since it is working in runtime.
>> We need stall information in init stage, neither too early (after allocation
>> of iommu_fwspec)
>> nor too late (before arm_smmu_add_device ).
> I'm not aware of a restriction on when _DSM can be evaluated.  I'm
> looking at ACPI v6.3, sec 9.1.1.  Are you seeing something different?
DSM method seems requires vendor specific guid, and code would be vendor 
specific.
Except adding uuid to some spec like pci_acpi_dsm_guid.
obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
IGNORE_PCI_BOOT_CONFIG_DSM, NULL);

>> By the way, It would be a long time if we need modify either pcie
>> spec or acpi spec.  Can we use pci_fixup_device in iommu_fwspec_init
>> first, it is relatively simple and meet the requirement of platform
>> device using pasid, and they are already in product.
> Neither the PCI Vendor-Specific Capability nor the ACPI _DSM requires
> a spec change.  Both can be completely vendor-defined.
Adding vendor-specific code to common files looks a bit ugly.

Thanks
Bjorn Helgaas June 15, 2020, 11:52 p.m. UTC | #18
On Sat, Jun 13, 2020 at 10:30:56PM +0800, Zhangfei Gao wrote:
> On 2020/6/11 下午9:44, Bjorn Helgaas wrote:
> > +++ b/drivers/iommu/iommu.c
> > > > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > > > > > > > > > fwnode_handle *iommu_fwnode,
> > > > > > > > > >             fwspec->iommu_fwnode = iommu_fwnode;
> > > > > > > > > >             fwspec->ops = ops;
> > > > > > > > > >             dev_iommu_fwspec_set(dev, fwspec);
> > > > > > > > > > +
> > > > > > > > > > +       if (dev_is_pci(dev))
> > > > > > > > > > +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > > > > > > > > > +
> > > > > > > > > > 
> > > > > > > > > > Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> > > > > > > > > > Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> > > > > > > > > > Will send this when 5.8-rc1 is open.
> > > > > > > > > Wait, this whole fixup approach seems wrong to me.  No matter how you
> > > > > > > > > do the fixup, it's still a fixup, which means it requires ongoing
> > > > > > > > > maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > > > > > > > > for every new AMBA device that comes along, do we?
> > > > > > > > > 
> > > > > > > > Here the fake pci device has standard PCI cfg space, but physical
> > > > > > > > implementation is base on AMBA
> > > > > > > > They can provide pasid feature.
> > > > > > > > However,
> > > > > > > > 1, does not support tlp since they are not real pci devices.
> > > > > > > > 2. does not support pri, instead support stall (provided by smmu)
> > > > > > > > And stall is not a pci feature, so it is not described in struct pci_dev,
> > > > > > > > but in struct iommu_fwspec.
> > > > > > > > So we use this fixup to tell pci system that the devices can support stall,
> > > > > > > > and hereby support pasid.
> > > > > > > This did not answer my question.  Are you proposing that we update a
> > > > > > > quirk every time a new AMBA device is released?  I don't think that
> > > > > > > would be a good model.
> > > > > > Yes, you are right, but we do not have any better idea yet.
> > > > > > Currently we have three fake pci devices, which support stall and pasid.
> > > > > > We have to let pci system know the device can support pasid, because of
> > > > > > stall feature, though not support pri.
> > > > > > Do you have any other ideas?
> > > > > It sounds like the best way would be to allocate a PCI capability for it, so
> > > > > detection can be done through config space, at least in future devices,
> > > > > or possibly after a firmware update if the config space in your system
> > > > > is controlled by firmware somewhere.  Once there is a proper mechanism
> > > > > to do this, using fixups to detect the early devices that don't use that
> > > > > should be uncontroversial. I have no idea what the process or timeline
> > > > > is to add new capabilities into the PCIe specification, or if this one
> > > > > would be acceptable to the PCI SIG at all.
> > > > That sounds like a possibility.  The spec already defines a
> > > > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
> > > > be a candidate.
> > > Will investigate this, thanks Bjorn
> > FWIW, there's also a Vendor-Specific Capability that can appear in the
> > first 256 bytes of config space (the Vendor-Specific Extended
> > Capability must appear in the "Extended Configuration Space" from
> > 0x100-0xfff).
> Unfortunately our silicon does not have either Vendor-Specific Capability or
> Vendor-Specific Extended Capability.
> 
> Studied commit 8531e283bee66050734fb0e89d53e85fd5ce24a4
> Looks this method requires adding member (like can_stall) to struct pci_dev,
> looks difficult.

The problem is that we don't want to add device IDs every time a new
chip comes out.  Adding one or two device IDs for silicon that's
already released is not a problem as long as you have a strategy for
*future* devices so they don't require a quirk.

> > > > > If detection cannot be done through PCI config space, the next best
> > > > > alternative is to pass auxiliary data through firmware. On DT based
> > > > > machines, you can list non-hotpluggable PCIe devices and add custom
> > > > > properties that could be read during device enumeration. I assume
> > > > > ACPI has something similar, but I have not done that.
> > > Yes, thanks Arnd
> > > > ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
> > > > like this better than a PCI capability because the property you need
> > > > to expose is not a PCI property.
> > > _DSM may not workable, since it is working in runtime.
> > > We need stall information in init stage, neither too early (after allocation
> > > of iommu_fwspec)
> > > nor too late (before arm_smmu_add_device ).
> > I'm not aware of a restriction on when _DSM can be evaluated.  I'm
> > looking at ACPI v6.3, sec 9.1.1.  Are you seeing something different?
> DSM method seems requires vendor specific guid, and code would be vendor
> specific.

_DSM indeed requires a vendor-specific UUID, precisely *because*
vendors are free to define their own functionality without requiring
changes to the ACPI spec.  From the spec (ACPI v6.3, sec 9.1.1):

  New UUIDs may also be created by OEMs and IHVs for custom devices
  and other interface or device governing bodies (e.g. the PCI SIG),
  as long as the UUID is different from other published UUIDs.
Zhangfei Gao June 19, 2020, 2:26 a.m. UTC | #19
Hi, Bjorn

On 2020/6/16 上午7:52, Bjorn Helgaas wrote:
> On Sat, Jun 13, 2020 at 10:30:56PM +0800, Zhangfei Gao wrote:
>> On 2020/6/11 下午9:44, Bjorn Helgaas wrote:
>>> +++ b/drivers/iommu/iommu.c
>>>>>>>>>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
>>>>>>>>>>> fwnode_handle *iommu_fwnode,
>>>>>>>>>>>              fwspec->iommu_fwnode = iommu_fwnode;
>>>>>>>>>>>              fwspec->ops = ops;
>>>>>>>>>>>              dev_iommu_fwspec_set(dev, fwspec);
>>>>>>>>>>> +
>>>>>>>>>>> +       if (dev_is_pci(dev))
>>>>>>>>>>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
>>>>>>>>>>> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
>>>>>>>>>>> Will send this when 5.8-rc1 is open.
>>>>>>>>>> Wait, this whole fixup approach seems wrong to me.  No matter how you
>>>>>>>>>> do the fixup, it's still a fixup, which means it requires ongoing
>>>>>>>>>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
>>>>>>>>>> for every new AMBA device that comes along, do we?
>>>>>>>>>>
>>>>>>>>> Here the fake pci device has standard PCI cfg space, but physical
>>>>>>>>> implementation is base on AMBA
>>>>>>>>> They can provide pasid feature.
>>>>>>>>> However,
>>>>>>>>> 1, does not support tlp since they are not real pci devices.
>>>>>>>>> 2. does not support pri, instead support stall (provided by smmu)
>>>>>>>>> And stall is not a pci feature, so it is not described in struct pci_dev,
>>>>>>>>> but in struct iommu_fwspec.
>>>>>>>>> So we use this fixup to tell pci system that the devices can support stall,
>>>>>>>>> and hereby support pasid.
>>>>>>>> This did not answer my question.  Are you proposing that we update a
>>>>>>>> quirk every time a new AMBA device is released?  I don't think that
>>>>>>>> would be a good model.
>>>>>>> Yes, you are right, but we do not have any better idea yet.
>>>>>>> Currently we have three fake pci devices, which support stall and pasid.
>>>>>>> We have to let pci system know the device can support pasid, because of
>>>>>>> stall feature, though not support pri.
>>>>>>> Do you have any other ideas?
>>>>>> It sounds like the best way would be to allocate a PCI capability for it, so
>>>>>> detection can be done through config space, at least in future devices,
>>>>>> or possibly after a firmware update if the config space in your system
>>>>>> is controlled by firmware somewhere.  Once there is a proper mechanism
>>>>>> to do this, using fixups to detect the early devices that don't use that
>>>>>> should be uncontroversial. I have no idea what the process or timeline
>>>>>> is to add new capabilities into the PCIe specification, or if this one
>>>>>> would be acceptable to the PCI SIG at all.
>>>>> That sounds like a possibility.  The spec already defines a
>>>>> Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
>>>>> be a candidate.
>>>> Will investigate this, thanks Bjorn
>>> FWIW, there's also a Vendor-Specific Capability that can appear in the
>>> first 256 bytes of config space (the Vendor-Specific Extended
>>> Capability must appear in the "Extended Configuration Space" from
>>> 0x100-0xfff).
>> Unfortunately our silicon does not have either Vendor-Specific Capability or
>> Vendor-Specific Extended Capability.
>>
>> Studied commit 8531e283bee66050734fb0e89d53e85fd5ce24a4
>> Looks this method requires adding member (like can_stall) to struct pci_dev,
>> looks difficult.
> The problem is that we don't want to add device IDs every time a new
> chip comes out.  Adding one or two device IDs for silicon that's
> already released is not a problem as long as you have a strategy for
> *future* devices so they don't require a quirk.
>
>>>>>> If detection cannot be done through PCI config space, the next best
>>>>>> alternative is to pass auxiliary data through firmware. On DT based
>>>>>> machines, you can list non-hotpluggable PCIe devices and add custom
>>>>>> properties that could be read during device enumeration. I assume
>>>>>> ACPI has something similar, but I have not done that.
>>>> Yes, thanks Arnd
>>>>> ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
>>>>> like this better than a PCI capability because the property you need
>>>>> to expose is not a PCI property.
>>>> _DSM may not workable, since it is working in runtime.
>>>> We need stall information in init stage, neither too early (after allocation
>>>> of iommu_fwspec)
>>>> nor too late (before arm_smmu_add_device ).
>>> I'm not aware of a restriction on when _DSM can be evaluated.  I'm
>>> looking at ACPI v6.3, sec 9.1.1.  Are you seeing something different?
>> DSM method seems requires vendor specific guid, and code would be vendor
>> specific.
> _DSM indeed requires a vendor-specific UUID, precisely *because*
> vendors are free to define their own functionality without requiring
> changes to the ACPI spec.  From the spec (ACPI v6.3, sec 9.1.1):
>
>    New UUIDs may also be created by OEMs and IHVs for custom devices
>    and other interface or device governing bodies (e.g. the PCI SIG),
>    as long as the UUID is different from other published UUIDs.
Have studied _DSM method, two issues we met comparing using quirk.

1. Need change definition of either pci_host_bridge or pci_dev, like 
adding member can_stall,
while pci system does not know stall now.

a, pci devices do not have uuid: uuid need be described in dsdt, while 
pci devices are not defined in dsdt.
     so we have to use host bridge.

b,  Parsing dsdt is in in pci subsystem.
Like drivers/acpi/pci_root.c:
        obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), 
&pci_acpi_dsm_guid, 1,
                                 IGNORE_PCI_BOOT_CONFIG_DSM, NULL);

After parsing DSM in pci, we need record this info.
Currently, can_stall info is recorded in iommu_fwspec,
which is allocated in iommu_fwspec_init and called by 
iort_iommu_configure for uefi.

2. Guest kernel also need support sva.
Using quirk, the guest can boot with sva enabled, since quirk is 
self-contained by kernel.
If using  _DSM, a specific uefi or dtb has to be provided,
currently we can useQEMU_EFI.fd from apt install qemu-efi


Thanks
Joerg Roedel June 22, 2020, 11:53 a.m. UTC | #20
On Mon, Jun 01, 2020 at 12:41:04PM -0500, Bjorn Helgaas wrote:
> I found this [1] from Paul Menzel, which was a slowdown caused by
> quirk_usb_early_handoff().  I think the real problem is individual
> quirks that take a long time.
> 
> The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> course, they're only run for matching devices anyway.  So I'd rather
> keep them as PCI_FIXUP_FINAL than add a whole new phase.

Okay, so if it is not a performance problem, then I am fine with using
PCI_FIXUP_FINAL. But I dislike calling the fixups from IOMMU code, there
must be a better solution.


Regards,

	Joerg

> [1] https://lore.kernel.org/linux-pci/b1533fd5-1fae-7256-9597-36d3d5de9d2a@molgen.mpg.de/
Joerg Roedel June 22, 2020, 11:55 a.m. UTC | #21
On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
> +++ b/drivers/iommu/iommu.c
> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> fwnode_handle *iommu_fwnode,
>         fwspec->iommu_fwnode = iommu_fwnode;
>         fwspec->ops = ops;
>         dev_iommu_fwspec_set(dev, fwspec);
> +
> +       if (dev_is_pci(dev))
> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> +

That's not going to fly, I don't think we should run the fixups twice,
and they should not be run from IOMMU code. Is the only reason for this
second pass that iommu_fwspec is not yet allocated when it runs the
first time? I ask because it might be easier to just allocate the struct
earlier then.

Regards,

	Joerg
Zhangfei Gao June 23, 2020, 7:48 a.m. UTC | #22
Hi, Joerg

On 2020/6/22 下午7:55, Joerg Roedel wrote:
> On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
>> +++ b/drivers/iommu/iommu.c
>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
>> fwnode_handle *iommu_fwnode,
>>          fwspec->iommu_fwnode = iommu_fwnode;
>>          fwspec->ops = ops;
>>          dev_iommu_fwspec_set(dev, fwspec);
>> +
>> +       if (dev_is_pci(dev))
>> +               pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
>> +
> That's not going to fly, I don't think we should run the fixups twice,
> and they should not be run from IOMMU code. Is the only reason for this
> second pass that iommu_fwspec is not yet allocated when it runs the
> first time? I ask because it might be easier to just allocate the struct
> earlier then.
Thanks for looking this.

Yes, it is the only reason calling fixup secondly after iommu_fwspec is 
allocated.

The first time fixup final is very early in pci_bus_add_device.
If allocating iommu_fwspec earlier, it maybe in pci_alloc_dev.
And assigning ops still in iommu_fwspec_init.
Have tested it works.
Not sure is it acceptable?

Alternatively, adding can_stall to struct pci_dev is simple but ugly too,
since pci does not know stall now.


Thanks
Bjorn Helgaas June 23, 2020, 3:04 p.m. UTC | #23
On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote:
> Have studied _DSM method, two issues we met comparing using quirk.
> 
> 1. Need change definition of either pci_host_bridge or pci_dev, like adding
> member can_stall,
> while pci system does not know stall now.
> 
> a, pci devices do not have uuid: uuid need be described in dsdt, while pci
> devices are not defined in dsdt.
>     so we have to use host bridge.

PCI devices *can* be described in the DSDT.  IIUC these particular
devices are hardwired (not plug-in cards), so platform firmware can
know about them and could describe them in the DSDT.

> b,  Parsing dsdt is in in pci subsystem.
> Like drivers/acpi/pci_root.c:
>        obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid,
> 1,
>                                 IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> 
> After parsing DSM in pci, we need record this info.
> Currently, can_stall info is recorded in iommu_fwspec,
> which is allocated in iommu_fwspec_init and called by iort_iommu_configure
> for uefi.

You can look for a _DSM wherever it is convenient for you.  It could
be in an AMBA shim layer.

> 2. Guest kernel also need support sva.
> Using quirk, the guest can boot with sva enabled, since quirk is
> self-contained by kernel.
> If using  _DSM, a specific uefi or dtb has to be provided,
> currently we can useQEMU_EFI.fd from apt install qemu-efi

I don't quite understand what this means, but as I mentioned before, a
quirk for a *limited* number of devices is OK, as long as there is a
plan that removes the need for a quirk for future devices.

E.g., if the next platform version ships with a DTB or firmware with a
_DSM or other mechanism that enables the kernel to discover this
information without a kernel change, it's fine to use a quirk to cover
the early platform.

The principles are:

  - I don't want to have to update a quirk for every new Device ID
    that needs this.

  - I don't really want to have to manage non-PCI information in the
    struct pci_dev.  If this is AMBA- or IOMMU-related, it should be
    stored in a structure related to AMBA or the IOMMU.
Zhou Wang Dec. 16, 2020, 11:24 a.m. UTC | #24
On 2020/6/23 23:04, Bjorn Helgaas wrote:
> On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote:
>> Have studied _DSM method, two issues we met comparing using quirk.
>>
>> 1. Need change definition of either pci_host_bridge or pci_dev, like adding
>> member can_stall,
>> while pci system does not know stall now.
>>
>> a, pci devices do not have uuid: uuid need be described in dsdt, while pci
>> devices are not defined in dsdt.
>>     so we have to use host bridge.
> 
> PCI devices *can* be described in the DSDT.  IIUC these particular
> devices are hardwired (not plug-in cards), so platform firmware can
> know about them and could describe them in the DSDT.
> 
>> b,  Parsing dsdt is in in pci subsystem.
>> Like drivers/acpi/pci_root.c:
>>        obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid,
>> 1,
>>                                 IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
>>
>> After parsing DSM in pci, we need record this info.
>> Currently, can_stall info is recorded in iommu_fwspec,
>> which is allocated in iommu_fwspec_init and called by iort_iommu_configure
>> for uefi.
> 
> You can look for a _DSM wherever it is convenient for you.  It could
> be in an AMBA shim layer.
> 
>> 2. Guest kernel also need support sva.
>> Using quirk, the guest can boot with sva enabled, since quirk is
>> self-contained by kernel.
>> If using  _DSM, a specific uefi or dtb has to be provided,
>> currently we can useQEMU_EFI.fd from apt install qemu-efi
> 
> I don't quite understand what this means, but as I mentioned before, a
> quirk for a *limited* number of devices is OK, as long as there is a
> plan that removes the need for a quirk for future devices.
> 
> E.g., if the next platform version ships with a DTB or firmware with a
> _DSM or other mechanism that enables the kernel to discover this
> information without a kernel change, it's fine to use a quirk to cover
> the early platform.
> 
> The principles are:
> 
>   - I don't want to have to update a quirk for every new Device ID
>     that needs this.

Hi Bjorn and Zhangfei,

We plan to use ATS/PRI to support SVA in future PCI devices. However, for
current devices, we need to add limited number of quirk to let them
work. The device IDs of current quirk needed devices are ZIP engine(0xa250, 0xa251),
SEC engine(0xa255, 0xa256), HPRE engine(0xa258, 0xa259), revision id are
0x21 and 0x30.

Let's continue to upstream these quirks!

Best,
Zhou

> 
>   - I don't really want to have to manage non-PCI information in the
>     struct pci_dev.  If this is AMBA- or IOMMU-related, it should be
>     stored in a structure related to AMBA or the IOMMU.
> .
>
Bjorn Helgaas Dec. 17, 2020, 8:38 p.m. UTC | #25
On Wed, Dec 16, 2020 at 07:24:30PM +0800, Zhou Wang wrote:
> On 2020/6/23 23:04, Bjorn Helgaas wrote:
> > On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote:
> >> Have studied _DSM method, two issues we met comparing using quirk.
> >>
> >> 1. Need change definition of either pci_host_bridge or pci_dev, like adding
> >> member can_stall,
> >> while pci system does not know stall now.
> >>
> >> a, pci devices do not have uuid: uuid need be described in dsdt, while pci
> >> devices are not defined in dsdt.
> >>     so we have to use host bridge.
> > 
> > PCI devices *can* be described in the DSDT.  IIUC these particular
> > devices are hardwired (not plug-in cards), so platform firmware can
> > know about them and could describe them in the DSDT.
> > 
> >> b,  Parsing dsdt is in in pci subsystem.
> >> Like drivers/acpi/pci_root.c:
> >>        obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid,
> >> 1,
> >>                                 IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> >>
> >> After parsing DSM in pci, we need record this info.
> >> Currently, can_stall info is recorded in iommu_fwspec,
> >> which is allocated in iommu_fwspec_init and called by iort_iommu_configure
> >> for uefi.
> > 
> > You can look for a _DSM wherever it is convenient for you.  It could
> > be in an AMBA shim layer.
> > 
> >> 2. Guest kernel also need support sva.
> >> Using quirk, the guest can boot with sva enabled, since quirk is
> >> self-contained by kernel.
> >> If using  _DSM, a specific uefi or dtb has to be provided,
> >> currently we can useQEMU_EFI.fd from apt install qemu-efi
> > 
> > I don't quite understand what this means, but as I mentioned before, a
> > quirk for a *limited* number of devices is OK, as long as there is a
> > plan that removes the need for a quirk for future devices.
> > 
> > E.g., if the next platform version ships with a DTB or firmware with a
> > _DSM or other mechanism that enables the kernel to discover this
> > information without a kernel change, it's fine to use a quirk to cover
> > the early platform.
> > 
> > The principles are:
> > 
> >   - I don't want to have to update a quirk for every new Device ID
> >     that needs this.
> 
> Hi Bjorn and Zhangfei,
> 
> We plan to use ATS/PRI to support SVA in future PCI devices. However, for
> current devices, we need to add limited number of quirk to let them
> work. The device IDs of current quirk needed devices are ZIP engine(0xa250, 0xa251),
> SEC engine(0xa255, 0xa256), HPRE engine(0xa258, 0xa259), revision id are
> 0x21 and 0x30.
> 
> Let's continue to upstream these quirks!

Please post the patches you propose.  I don't think the previous ones
are in my queue.  Please include the lore URL for the previous
posting(s) in the cover letter so we can connect the discussion.

> >   - I don't really want to have to manage non-PCI information in the
> >     struct pci_dev.  If this is AMBA- or IOMMU-related, it should be
> >     stored in a structure related to AMBA or the IOMMU.
> > .
> >