mbox series

[0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

Message ID 20210621235248.2521620-1-dianders@chromium.org
Headers show
Series iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC | expand

Message

Doug Anderson June 21, 2021, 11:52 p.m. UTC
This patch attempts to put forward a proposal for enabling non-strict
DMA on a device-by-device basis. The patch series requests non-strict
DMA for the Qualcomm SDHCI controller as a first device to enable,
getting a nice bump in performance with what's believed to be a very
small drop in security / safety (see the patch for the full argument).

As part of this patch series I am end up slightly cleaning up some of
the interactions between the PCI subsystem and the IOMMU subsystem but
I don't go all the way to fully remove all the tentacles. Specifically
this patch series only concerns itself with a single aspect: strict
vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
to talk about / reason about for more subsystems compared to overall
deciding what it means for a device to be "external" or "untrusted".

If something like this patch series ends up being landable, it will
undoubtedly need coordination between many maintainers to land. I
believe it's fully bisectable but later patches in the series
definitely depend on earlier ones. Sorry for the long CC list. :(


Douglas Anderson (6):
  drivers: base: Add the concept of "pre_probe" to drivers
  drivers: base: Add bits to struct device to control iommu strictness
  PCI: Indicate that we want to force strict DMA for untrusted devices
  iommu: Combine device strictness requests with the global default
  iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
  mmc: sdhci-msm: Request non-strict IOMMU mode

 drivers/base/dd.c             | 10 +++++--
 drivers/iommu/dma-iommu.c     |  2 +-
 drivers/iommu/iommu.c         | 56 +++++++++++++++++++++++++++--------
 drivers/mmc/host/sdhci-msm.c  |  8 +++++
 drivers/pci/probe.c           |  4 ++-
 include/linux/device.h        | 11 +++++++
 include/linux/device/driver.h |  9 ++++++
 include/linux/iommu.h         |  2 ++
 8 files changed, 85 insertions(+), 17 deletions(-)

Comments

Robin Murphy June 22, 2021, 11:35 a.m. UTC | #1
Hi Doug,

On 2021-06-22 00:52, Douglas Anderson wrote:
> 
> This patch attempts to put forward a proposal for enabling non-strict
> DMA on a device-by-device basis. The patch series requests non-strict
> DMA for the Qualcomm SDHCI controller as a first device to enable,
> getting a nice bump in performance with what's believed to be a very
> small drop in security / safety (see the patch for the full argument).
> 
> As part of this patch series I am end up slightly cleaning up some of
> the interactions between the PCI subsystem and the IOMMU subsystem but
> I don't go all the way to fully remove all the tentacles. Specifically
> this patch series only concerns itself with a single aspect: strict
> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> to talk about / reason about for more subsystems compared to overall
> deciding what it means for a device to be "external" or "untrusted".
> 
> If something like this patch series ends up being landable, it will
> undoubtedly need coordination between many maintainers to land. I
> believe it's fully bisectable but later patches in the series
> definitely depend on earlier ones. Sorry for the long CC list. :(

Unfortunately, this doesn't work. In normal operation, the default 
domains should be established long before individual drivers are even 
loaded (if they are modules), let alone anywhere near probing. The fact 
that iommu_probe_device() sometimes gets called far too late off the 
back of driver probe is an unfortunate artefact of the original 
probe-deferral scheme, and causes other problems like potentially 
malformed groups - I've been forming a plan to fix that for a while now, 
so I for one really can't condone anything trying to rely on it. 
Non-deterministic behaviour based on driver probe order for multi-device 
groups is part of the existing problem, and your proposal seems equally 
vulnerable to that too.

FWIW we already have a go-faster knob for people who want to tweak the 
security/performance compromise for specific devices, namely the sysfs 
interface for changing a group's domain type before binding the relevant 
driver(s). Is that something you could use in your application, say from 
an initramfs script?

Thanks,
Robin.

> Douglas Anderson (6):
>    drivers: base: Add the concept of "pre_probe" to drivers
>    drivers: base: Add bits to struct device to control iommu strictness
>    PCI: Indicate that we want to force strict DMA for untrusted devices
>    iommu: Combine device strictness requests with the global default
>    iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
>    mmc: sdhci-msm: Request non-strict IOMMU mode
> 
>   drivers/base/dd.c             | 10 +++++--
>   drivers/iommu/dma-iommu.c     |  2 +-
>   drivers/iommu/iommu.c         | 56 +++++++++++++++++++++++++++--------
>   drivers/mmc/host/sdhci-msm.c  |  8 +++++
>   drivers/pci/probe.c           |  4 ++-
>   include/linux/device.h        | 11 +++++++
>   include/linux/device/driver.h |  9 ++++++
>   include/linux/iommu.h         |  2 ++
>   8 files changed, 85 insertions(+), 17 deletions(-)
>
Doug Anderson June 22, 2021, 4:06 p.m. UTC | #2
Hi,

On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> Hi Doug,
>
> On 2021-06-22 00:52, Douglas Anderson wrote:
> >
> > This patch attempts to put forward a proposal for enabling non-strict
> > DMA on a device-by-device basis. The patch series requests non-strict
> > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > getting a nice bump in performance with what's believed to be a very
> > small drop in security / safety (see the patch for the full argument).
> >
> > As part of this patch series I am end up slightly cleaning up some of
> > the interactions between the PCI subsystem and the IOMMU subsystem but
> > I don't go all the way to fully remove all the tentacles. Specifically
> > this patch series only concerns itself with a single aspect: strict
> > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > to talk about / reason about for more subsystems compared to overall
> > deciding what it means for a device to be "external" or "untrusted".
> >
> > If something like this patch series ends up being landable, it will
> > undoubtedly need coordination between many maintainers to land. I
> > believe it's fully bisectable but later patches in the series
> > definitely depend on earlier ones. Sorry for the long CC list. :(
>
> Unfortunately, this doesn't work. In normal operation, the default
> domains should be established long before individual drivers are even
> loaded (if they are modules), let alone anywhere near probing. The fact
> that iommu_probe_device() sometimes gets called far too late off the
> back of driver probe is an unfortunate artefact of the original
> probe-deferral scheme, and causes other problems like potentially
> malformed groups - I've been forming a plan to fix that for a while now,
> so I for one really can't condone anything trying to rely on it.
> Non-deterministic behaviour based on driver probe order for multi-device
> groups is part of the existing problem, and your proposal seems equally
> vulnerable to that too.

Doh! :( I definitely can't say I understand the iommu subsystem
amazingly well. It was working for me, but I could believe that I was
somehow violating a rule somewhere.

I'm having a bit of a hard time understanding where the problem is
though. Is there any chance that you missed the part of my series
where I introduced a "pre_probe" step? Specifically, I see this:

* really_probe() is called w/ a driver and a device.
* -> calls dev->bus->dma_configure() w/ a "struct device *"
* -> eventually calls iommu_probe_device() w/ the device.
* -> calls iommu_alloc_default_domain() w/ the device
* -> calls iommu_group_alloc_default_domain()
* -> always allocates a new domain

...so we always have a "struct device" when a domain is allocated if
that domain is going to be associated with a device.

I will agree that iommu_probe_device() is called before the driver
probe, but unless I missed something it's after the device driver is
loaded.  ...and assuming something like patch #1 in this series looks
OK then iommu_probe_device() will be called after "pre_probe".

So assuming I'm not missing something, I'm not actually relying the
IOMMU getting init off the back of driver probe.


> FWIW we already have a go-faster knob for people who want to tweak the
> security/performance compromise for specific devices, namely the sysfs
> interface for changing a group's domain type before binding the relevant
> driver(s). Is that something you could use in your application, say from
> an initramfs script?

We've never had an initramfs script in Chrome OS. I don't know all the
history of why (I'm trying to check), but I'm nearly certain it was a
conscious decision. Probably it has to do with the fact that we're not
trying to build a generic distribution where a single boot source can
boot a huge variety of hardware. We generally have one kernel for a
class of devices. I believe avoiding the initramfs just keeps things
simpler.

I think trying to revamp Chrome OS to switch to an initramfs type
system would be a pretty big undertaking since (as I understand it)
you can't just run a little command and then return to the normal boot
flow. Once you switch to initramfs you're committing to finding /
setting up the rootfs yourself and on Chrome OS I believe that means a
whole bunch of dm-verity work.


...so probably the initramfs is a no-go for me, but I'm still crossing
my fingers that the pre_probe() might be legit if you take a second
look at it?

-Doug
John Garry June 22, 2021, 5:39 p.m. UTC | #3
On 22/06/2021 00:52, Douglas Anderson wrote:
> 
> This patch attempts to put forward a proposal for enabling non-strict
> DMA on a device-by-device basis. The patch series requests non-strict
> DMA for the Qualcomm SDHCI controller as a first device to enable,
> getting a nice bump in performance with what's believed to be a very
> small drop in security / safety (see the patch for the full argument).
> 
> As part of this patch series I am end up slightly cleaning up some of
> the interactions between the PCI subsystem and the IOMMU subsystem but
> I don't go all the way to fully remove all the tentacles. Specifically
> this patch series only concerns itself with a single aspect: strict
> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> to talk about / reason about for more subsystems compared to overall
> deciding what it means for a device to be "external" or "untrusted".
> 
> If something like this patch series ends up being landable, it will
> undoubtedly need coordination between many maintainers to land. I
> believe it's fully bisectable but later patches in the series
> definitely depend on earlier ones. Sorry for the long CC list. :(
> 

JFYI, In case to missed it, and I know it's not the same thing as you 
want, above, but the following series will allow you to build the kernel 
to default to lazy mode:

https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.garry@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e

So iommu.strict=0 would be no longer always required for arm64.

Thanks,
John


> 
> Douglas Anderson (6):
>    drivers: base: Add the concept of "pre_probe" to drivers
>    drivers: base: Add bits to struct device to control iommu strictness
>    PCI: Indicate that we want to force strict DMA for untrusted devices
>    iommu: Combine device strictness requests with the global default
>    iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
>    mmc: sdhci-msm: Request non-strict IOMMU mode
> 
>   drivers/base/dd.c             | 10 +++++--
>   drivers/iommu/dma-iommu.c     |  2 +-
>   drivers/iommu/iommu.c         | 56 +++++++++++++++++++++++++++--------
>   drivers/mmc/host/sdhci-msm.c  |  8 +++++
>   drivers/pci/probe.c           |  4 ++-
>   include/linux/device.h        | 11 +++++++
>   include/linux/device/driver.h |  9 ++++++
>   include/linux/iommu.h         |  2 ++
>   8 files changed, 85 insertions(+), 17 deletions(-)
>
Doug Anderson June 22, 2021, 7:50 p.m. UTC | #4
Hi,

On Tue, Jun 22, 2021 at 10:46 AM John Garry <john.garry@huawei.com> wrote:
>
> On 22/06/2021 00:52, Douglas Anderson wrote:
> >
> > This patch attempts to put forward a proposal for enabling non-strict
> > DMA on a device-by-device basis. The patch series requests non-strict
> > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > getting a nice bump in performance with what's believed to be a very
> > small drop in security / safety (see the patch for the full argument).
> >
> > As part of this patch series I am end up slightly cleaning up some of
> > the interactions between the PCI subsystem and the IOMMU subsystem but
> > I don't go all the way to fully remove all the tentacles. Specifically
> > this patch series only concerns itself with a single aspect: strict
> > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > to talk about / reason about for more subsystems compared to overall
> > deciding what it means for a device to be "external" or "untrusted".
> >
> > If something like this patch series ends up being landable, it will
> > undoubtedly need coordination between many maintainers to land. I
> > believe it's fully bisectable but later patches in the series
> > definitely depend on earlier ones. Sorry for the long CC list. :(
> >
>
> JFYI, In case to missed it, and I know it's not the same thing as you
> want, above, but the following series will allow you to build the kernel
> to default to lazy mode:
>
> https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.garry@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e
>
> So iommu.strict=0 would be no longer always required for arm64.

Excellent! I'm never a fan of command line parameters as a replacement
for Kconfig. They are great for debugging or for cases where the user
of the kernel and the person compiling the kernel are not the same
(like with off-the-shelf Linux distros) but aren't great for setting a
default for embedded environments.

I actually think that something like my patchset may be even more
important atop yours. Do you agree? If the default is non-strict it
could be extra important to be able to mark a certain device to be in
"strict" mode.

...also, unfortunately I probably won't be able to use your patchest
for my use case. I think we want the extra level of paranoia by
default and really only want to allow non-strict mode for devices that
we're really convinced are safe.

-Doug
Rob Herring June 22, 2021, 8:02 p.m. UTC | #5
On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > Hi Doug,
> >
> > On 2021-06-22 00:52, Douglas Anderson wrote:
> > >
> > > This patch attempts to put forward a proposal for enabling non-strict
> > > DMA on a device-by-device basis. The patch series requests non-strict
> > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > getting a nice bump in performance with what's believed to be a very
> > > small drop in security / safety (see the patch for the full argument).
> > >
> > > As part of this patch series I am end up slightly cleaning up some of
> > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > I don't go all the way to fully remove all the tentacles. Specifically
> > > this patch series only concerns itself with a single aspect: strict
> > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > to talk about / reason about for more subsystems compared to overall
> > > deciding what it means for a device to be "external" or "untrusted".
> > >
> > > If something like this patch series ends up being landable, it will
> > > undoubtedly need coordination between many maintainers to land. I
> > > believe it's fully bisectable but later patches in the series
> > > definitely depend on earlier ones. Sorry for the long CC list. :(
> >
> > Unfortunately, this doesn't work. In normal operation, the default
> > domains should be established long before individual drivers are even
> > loaded (if they are modules), let alone anywhere near probing. The fact
> > that iommu_probe_device() sometimes gets called far too late off the
> > back of driver probe is an unfortunate artefact of the original
> > probe-deferral scheme, and causes other problems like potentially
> > malformed groups - I've been forming a plan to fix that for a while now,
> > so I for one really can't condone anything trying to rely on it.
> > Non-deterministic behaviour based on driver probe order for multi-device
> > groups is part of the existing problem, and your proposal seems equally
> > vulnerable to that too.
> 
> Doh! :( I definitely can't say I understand the iommu subsystem
> amazingly well. It was working for me, but I could believe that I was
> somehow violating a rule somewhere.
> 
> I'm having a bit of a hard time understanding where the problem is
> though. Is there any chance that you missed the part of my series
> where I introduced a "pre_probe" step? Specifically, I see this:
> 
> * really_probe() is called w/ a driver and a device.
> * -> calls dev->bus->dma_configure() w/ a "struct device *"
> * -> eventually calls iommu_probe_device() w/ the device.
> * -> calls iommu_alloc_default_domain() w/ the device
> * -> calls iommu_group_alloc_default_domain()
> * -> always allocates a new domain
> 
> ...so we always have a "struct device" when a domain is allocated if
> that domain is going to be associated with a device.
> 
> I will agree that iommu_probe_device() is called before the driver
> probe, but unless I missed something it's after the device driver is
> loaded.  ...and assuming something like patch #1 in this series looks
> OK then iommu_probe_device() will be called after "pre_probe".
> 
> So assuming I'm not missing something, I'm not actually relying the
> IOMMU getting init off the back of driver probe.
> 
> 
> > FWIW we already have a go-faster knob for people who want to tweak the
> > security/performance compromise for specific devices, namely the sysfs
> > interface for changing a group's domain type before binding the relevant
> > driver(s). Is that something you could use in your application, say from
> > an initramfs script?
> 
> We've never had an initramfs script in Chrome OS. I don't know all the
> history of why (I'm trying to check), but I'm nearly certain it was a
> conscious decision. Probably it has to do with the fact that we're not
> trying to build a generic distribution where a single boot source can
> boot a huge variety of hardware. We generally have one kernel for a
> class of devices. I believe avoiding the initramfs just keeps things
> simpler.
> 
> I think trying to revamp Chrome OS to switch to an initramfs type
> system would be a pretty big undertaking since (as I understand it)
> you can't just run a little command and then return to the normal boot
> flow. Once you switch to initramfs you're committing to finding /
> setting up the rootfs yourself and on Chrome OS I believe that means a
> whole bunch of dm-verity work.
> 
> 
> ...so probably the initramfs is a no-go for me, but I'm still crossing
> my fingers that the pre_probe() might be legit if you take a second
> look at it?

Couldn't you have a driver flag that has the same effect as twiddling 
sysfs? At the being of probe, check the flag and go set the underlying 
sysfs setting in the device.

Though you may want this to be per device, not per driver. To do that 
early, I think you'd need a DT property. I wouldn't be totally opposed 
to that and I appreciate you not starting there. :)

Rob
Saravana Kannan June 22, 2021, 8:05 p.m. UTC | #6
On Tue, Jun 22, 2021 at 1:02 PM Rob Herring <robh@kernel.org> wrote:
>
> On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > Hi Doug,
> > >
> > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > >
> > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > getting a nice bump in performance with what's believed to be a very
> > > > small drop in security / safety (see the patch for the full argument).
> > > >
> > > > As part of this patch series I am end up slightly cleaning up some of
> > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > this patch series only concerns itself with a single aspect: strict
> > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > to talk about / reason about for more subsystems compared to overall
> > > > deciding what it means for a device to be "external" or "untrusted".
> > > >
> > > > If something like this patch series ends up being landable, it will
> > > > undoubtedly need coordination between many maintainers to land. I
> > > > believe it's fully bisectable but later patches in the series
> > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > >
> > > Unfortunately, this doesn't work. In normal operation, the default
> > > domains should be established long before individual drivers are even
> > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > that iommu_probe_device() sometimes gets called far too late off the
> > > back of driver probe is an unfortunate artefact of the original
> > > probe-deferral scheme, and causes other problems like potentially
> > > malformed groups - I've been forming a plan to fix that for a while now,
> > > so I for one really can't condone anything trying to rely on it.
> > > Non-deterministic behaviour based on driver probe order for multi-device
> > > groups is part of the existing problem, and your proposal seems equally
> > > vulnerable to that too.
> >
> > Doh! :( I definitely can't say I understand the iommu subsystem
> > amazingly well. It was working for me, but I could believe that I was
> > somehow violating a rule somewhere.
> >
> > I'm having a bit of a hard time understanding where the problem is
> > though. Is there any chance that you missed the part of my series
> > where I introduced a "pre_probe" step? Specifically, I see this:
> >
> > * really_probe() is called w/ a driver and a device.
> > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > * -> eventually calls iommu_probe_device() w/ the device.
> > * -> calls iommu_alloc_default_domain() w/ the device
> > * -> calls iommu_group_alloc_default_domain()
> > * -> always allocates a new domain
> >
> > ...so we always have a "struct device" when a domain is allocated if
> > that domain is going to be associated with a device.
> >
> > I will agree that iommu_probe_device() is called before the driver
> > probe, but unless I missed something it's after the device driver is
> > loaded.  ...and assuming something like patch #1 in this series looks
> > OK then iommu_probe_device() will be called after "pre_probe".
> >
> > So assuming I'm not missing something, I'm not actually relying the
> > IOMMU getting init off the back of driver probe.
> >
> >
> > > FWIW we already have a go-faster knob for people who want to tweak the
> > > security/performance compromise for specific devices, namely the sysfs
> > > interface for changing a group's domain type before binding the relevant
> > > driver(s). Is that something you could use in your application, say from
> > > an initramfs script?
> >
> > We've never had an initramfs script in Chrome OS. I don't know all the
> > history of why (I'm trying to check), but I'm nearly certain it was a
> > conscious decision. Probably it has to do with the fact that we're not
> > trying to build a generic distribution where a single boot source can
> > boot a huge variety of hardware. We generally have one kernel for a
> > class of devices. I believe avoiding the initramfs just keeps things
> > simpler.
> >
> > I think trying to revamp Chrome OS to switch to an initramfs type
> > system would be a pretty big undertaking since (as I understand it)
> > you can't just run a little command and then return to the normal boot
> > flow. Once you switch to initramfs you're committing to finding /
> > setting up the rootfs yourself and on Chrome OS I believe that means a
> > whole bunch of dm-verity work.
> >
> >
> > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > my fingers that the pre_probe() might be legit if you take a second
> > look at it?
>
> Couldn't you have a driver flag that has the same effect as twiddling
> sysfs? At the being of probe, check the flag and go set the underlying
> sysfs setting in the device.

My understanding of what Robin is saying is that we'd need this info
well before the driver is even available. The pre_probe() is
effectively doing the same thing you are suggesting.

> Though you may want this to be per device, not per driver. To do that
> early, I think you'd need a DT property. I wouldn't be totally opposed
> to that and I appreciate you not starting there. :)

Which is what I'm suggest elsewhere in the thread:

https://lore.kernel.org/lkml/CAGETcx83qCZF5JN5cqXxdSFiEgfc4jYESJg-RepL2wJXJv0Eww@mail.gmail.com/

-Saravana
Doug Anderson June 22, 2021, 8:10 p.m. UTC | #7
Hi,

On Tue, Jun 22, 2021 at 1:06 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Jun 22, 2021 at 1:02 PM Rob Herring <robh@kernel.org> wrote:
> >
> > On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > > >
> > > > Hi Doug,
> > > >
> > > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > > >
> > > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > > getting a nice bump in performance with what's believed to be a very
> > > > > small drop in security / safety (see the patch for the full argument).
> > > > >
> > > > > As part of this patch series I am end up slightly cleaning up some of
> > > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > > this patch series only concerns itself with a single aspect: strict
> > > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > > to talk about / reason about for more subsystems compared to overall
> > > > > deciding what it means for a device to be "external" or "untrusted".
> > > > >
> > > > > If something like this patch series ends up being landable, it will
> > > > > undoubtedly need coordination between many maintainers to land. I
> > > > > believe it's fully bisectable but later patches in the series
> > > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > > >
> > > > Unfortunately, this doesn't work. In normal operation, the default
> > > > domains should be established long before individual drivers are even
> > > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > > that iommu_probe_device() sometimes gets called far too late off the
> > > > back of driver probe is an unfortunate artefact of the original
> > > > probe-deferral scheme, and causes other problems like potentially
> > > > malformed groups - I've been forming a plan to fix that for a while now,
> > > > so I for one really can't condone anything trying to rely on it.
> > > > Non-deterministic behaviour based on driver probe order for multi-device
> > > > groups is part of the existing problem, and your proposal seems equally
> > > > vulnerable to that too.
> > >
> > > Doh! :( I definitely can't say I understand the iommu subsystem
> > > amazingly well. It was working for me, but I could believe that I was
> > > somehow violating a rule somewhere.
> > >
> > > I'm having a bit of a hard time understanding where the problem is
> > > though. Is there any chance that you missed the part of my series
> > > where I introduced a "pre_probe" step? Specifically, I see this:
> > >
> > > * really_probe() is called w/ a driver and a device.
> > > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > > * -> eventually calls iommu_probe_device() w/ the device.
> > > * -> calls iommu_alloc_default_domain() w/ the device
> > > * -> calls iommu_group_alloc_default_domain()
> > > * -> always allocates a new domain
> > >
> > > ...so we always have a "struct device" when a domain is allocated if
> > > that domain is going to be associated with a device.
> > >
> > > I will agree that iommu_probe_device() is called before the driver
> > > probe, but unless I missed something it's after the device driver is
> > > loaded.  ...and assuming something like patch #1 in this series looks
> > > OK then iommu_probe_device() will be called after "pre_probe".
> > >
> > > So assuming I'm not missing something, I'm not actually relying the
> > > IOMMU getting init off the back of driver probe.
> > >
> > >
> > > > FWIW we already have a go-faster knob for people who want to tweak the
> > > > security/performance compromise for specific devices, namely the sysfs
> > > > interface for changing a group's domain type before binding the relevant
> > > > driver(s). Is that something you could use in your application, say from
> > > > an initramfs script?
> > >
> > > We've never had an initramfs script in Chrome OS. I don't know all the
> > > history of why (I'm trying to check), but I'm nearly certain it was a
> > > conscious decision. Probably it has to do with the fact that we're not
> > > trying to build a generic distribution where a single boot source can
> > > boot a huge variety of hardware. We generally have one kernel for a
> > > class of devices. I believe avoiding the initramfs just keeps things
> > > simpler.
> > >
> > > I think trying to revamp Chrome OS to switch to an initramfs type
> > > system would be a pretty big undertaking since (as I understand it)
> > > you can't just run a little command and then return to the normal boot
> > > flow. Once you switch to initramfs you're committing to finding /
> > > setting up the rootfs yourself and on Chrome OS I believe that means a
> > > whole bunch of dm-verity work.
> > >
> > >
> > > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > > my fingers that the pre_probe() might be legit if you take a second
> > > look at it?
> >
> > Couldn't you have a driver flag that has the same effect as twiddling
> > sysfs? At the being of probe, check the flag and go set the underlying
> > sysfs setting in the device.
>
> My understanding of what Robin is saying is that we'd need this info
> well before the driver is even available. The pre_probe() is
> effectively doing the same thing you are suggesting.

Right, I was just about to respond with the same. ;-) So overall right
now we're blocked waiting for someone to point out the error in my
logic. ;-)


> > Though you may want this to be per device, not per driver. To do that
> > early, I think you'd need a DT property. I wouldn't be totally opposed
> > to that and I appreciate you not starting there. :)
>
> Which is what I'm suggest elsewhere in the thread:
>
> https://lore.kernel.org/lkml/CAGETcx83qCZF5JN5cqXxdSFiEgfc4jYESJg-RepL2wJXJv0Eww@mail.gmail.com/

Rob: I'd be happy if you wanted to comment on that thread. If you say
that it's fine to add a generic device tree property to control
strictness then I'm more than happy to add support for it. I've been
going on the theory that you'd NAK such a property but I'm totally
good with being wrong. ;-)

I'd be more than happy if you could suggest what you'd envision such a
property to be named.


-Doug
Robin Murphy June 22, 2021, 10:10 p.m. UTC | #8
On 2021-06-22 17:06, Doug Anderson wrote:
> Hi,
> 
> On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> Hi Doug,
>>
>> On 2021-06-22 00:52, Douglas Anderson wrote:
>>>
>>> This patch attempts to put forward a proposal for enabling non-strict
>>> DMA on a device-by-device basis. The patch series requests non-strict
>>> DMA for the Qualcomm SDHCI controller as a first device to enable,
>>> getting a nice bump in performance with what's believed to be a very
>>> small drop in security / safety (see the patch for the full argument).
>>>
>>> As part of this patch series I am end up slightly cleaning up some of
>>> the interactions between the PCI subsystem and the IOMMU subsystem but
>>> I don't go all the way to fully remove all the tentacles. Specifically
>>> this patch series only concerns itself with a single aspect: strict
>>> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
>>> to talk about / reason about for more subsystems compared to overall
>>> deciding what it means for a device to be "external" or "untrusted".
>>>
>>> If something like this patch series ends up being landable, it will
>>> undoubtedly need coordination between many maintainers to land. I
>>> believe it's fully bisectable but later patches in the series
>>> definitely depend on earlier ones. Sorry for the long CC list. :(
>>
>> Unfortunately, this doesn't work. In normal operation, the default
>> domains should be established long before individual drivers are even
>> loaded (if they are modules), let alone anywhere near probing. The fact
>> that iommu_probe_device() sometimes gets called far too late off the
>> back of driver probe is an unfortunate artefact of the original
>> probe-deferral scheme, and causes other problems like potentially
>> malformed groups - I've been forming a plan to fix that for a while now,
>> so I for one really can't condone anything trying to rely on it.
>> Non-deterministic behaviour based on driver probe order for multi-device
>> groups is part of the existing problem, and your proposal seems equally
>> vulnerable to that too.
> 
> Doh! :( I definitely can't say I understand the iommu subsystem
> amazingly well. It was working for me, but I could believe that I was
> somehow violating a rule somewhere.
> 
> I'm having a bit of a hard time understanding where the problem is
> though. Is there any chance that you missed the part of my series
> where I introduced a "pre_probe" step? Specifically, I see this:
> 
> * really_probe() is called w/ a driver and a device.
> * -> calls dev->bus->dma_configure() w/ a "struct device *"
> * -> eventually calls iommu_probe_device() w/ the device.

This...

> * -> calls iommu_alloc_default_domain() w/ the device
> * -> calls iommu_group_alloc_default_domain()
> * -> always allocates a new domain
> 
> ...so we always have a "struct device" when a domain is allocated if
> that domain is going to be associated with a device.
> 
> I will agree that iommu_probe_device() is called before the driver
> probe, but unless I missed something it's after the device driver is
> loaded.  ...and assuming something like patch #1 in this series looks
> OK then iommu_probe_device() will be called after "pre_probe".
> 
> So assuming I'm not missing something, I'm not actually relying the
> IOMMU getting init off the back of driver probe.

...is implicitly that. Sorry that it's not obvious.

The "proper" flow is that iommu_probe_device() is called for everything 
which already exists during the IOMMU driver's own probe when it calls 
bus_set_iommu(), then at BUS_NOTIFY_ADD_DEVICE time for everything which 
appears subsequently. The only trouble is, to observe it in action on a 
DT-based system you'd currently have to go back at least 4 years, before 
09515ef5ddad...

Basically there were two issues: firstly we need the of_xlate step 
before add_device (now probe_device) for a DT-based IOMMU driver to know 
whether it should care about the given device or not. When -EPROBE_DEFER 
was the only tool we had to ensure probe ordering, and resolving the 
"iommus" DT property the only place to decide that, delaying it all 
until driver probe time was the only reasonable option, however ugly. 
The iommu_probe_device() "replay" in {of,acpi}_iommu_configure() is 
merely doing its best to fake up the previous behaviour. Try binding a 
dummy driver to your device first, then unbind it and bind the real one, 
and you'll see that iommu_probe_device() doesn't run the second or 
subsequent times. Now that we have fw_devlink to take care of ordering, 
the main reason for this weirdness is largely gone, so I'm keen to start 
getting rid of the divergence again as far as possible. Fundamentally, 
IOMMU drivers are supposed to be aware of all devices which the kernel 
knows about, regardless of whether they have a driver available or not.

The second issue is that when we have multiple IOMMU instances, the 
initial bus_set_iommu() "replay" is only useful for the first instance, 
so devices managed by other instances which aren't up and running yet 
will be glossed over. Currently this ends up being papered over by the 
solution to the first point on DT systems, while the x86 drivers hide 
their individual IOMMU units behind a single IOMMU API "instance", so 
it's been having little impact in practice. However, improving the core 
API's multi-instance support is an increasingly pressing issue now that 
new more varied systems are showing up, and it's that which is really 
going to be driving the aforementioned changes. FWIW the plan I 
currently have is to hang things off iommu_device_register() instead.

>> FWIW we already have a go-faster knob for people who want to tweak the
>> security/performance compromise for specific devices, namely the sysfs
>> interface for changing a group's domain type before binding the relevant
>> driver(s). Is that something you could use in your application, say from
>> an initramfs script?
> 
> We've never had an initramfs script in Chrome OS. I don't know all the
> history of why (I'm trying to check), but I'm nearly certain it was a
> conscious decision. Probably it has to do with the fact that we're not
> trying to build a generic distribution where a single boot source can
> boot a huge variety of hardware. We generally have one kernel for a
> class of devices. I believe avoiding the initramfs just keeps things
> simpler.
> 
> I think trying to revamp Chrome OS to switch to an initramfs type
> system would be a pretty big undertaking since (as I understand it)
> you can't just run a little command and then return to the normal boot
> flow. Once you switch to initramfs you're committing to finding /
> setting up the rootfs yourself and on Chrome OS I believe that means a
> whole bunch of dm-verity work.
> 
> 
> ...so probably the initramfs is a no-go for me, but I'm still crossing
> my fingers that the pre_probe() might be legit if you take a second
> look at it?

That's fair enough - TBH the current sysfs interface is a pretty 
specialist sport primarily for datacentre folks who can afford to take 
down their 40GBE NIC or whatever momentarily for a longer-term payoff, 
but it was worth exploring - I'm assuming the SDHCI holds your root 
filesystem so you wouldn't be able to do the unbinding dance from real 
userspace. As I said, the idea of embedding any sort of data in 
individual client drivers is a non-starter in general since it only has 
any hope of working on DT platforms (maybe arm64 ACPI too?), and only 
for very much the wrong reasons.

If this is something primarily demanded by QCom platforms in the short 
term, I'm tempted to say just try it with more device-matching magic in 
arm-smmu-qcom. Otherwise, the idea of growing the sysfs interface to 
allow switching a DMA domain from default-strict to non-strict is 
certainly an interesting prospect. Going a step beyond that to bring up 
a flush queue 'live' without rebuilding the whole group and domain could 
get ugly when it comes to drivers' interaction with io-pgtable, but I 
think it might be *technically* feasible...

Robin.
Rob Herring June 23, 2021, 1:54 p.m. UTC | #9
On Tue, Jun 22, 2021 at 2:10 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Tue, Jun 22, 2021 at 1:06 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Tue, Jun 22, 2021 at 1:02 PM Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > > > Hi,
> > > >
> > > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > > > >
> > > > > Hi Doug,
> > > > >
> > > > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > > > >
> > > > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > > > getting a nice bump in performance with what's believed to be a very
> > > > > > small drop in security / safety (see the patch for the full argument).
> > > > > >
> > > > > > As part of this patch series I am end up slightly cleaning up some of
> > > > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > > > this patch series only concerns itself with a single aspect: strict
> > > > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > > > to talk about / reason about for more subsystems compared to overall
> > > > > > deciding what it means for a device to be "external" or "untrusted".
> > > > > >
> > > > > > If something like this patch series ends up being landable, it will
> > > > > > undoubtedly need coordination between many maintainers to land. I
> > > > > > believe it's fully bisectable but later patches in the series
> > > > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > > > >
> > > > > Unfortunately, this doesn't work. In normal operation, the default
> > > > > domains should be established long before individual drivers are even
> > > > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > > > that iommu_probe_device() sometimes gets called far too late off the
> > > > > back of driver probe is an unfortunate artefact of the original
> > > > > probe-deferral scheme, and causes other problems like potentially
> > > > > malformed groups - I've been forming a plan to fix that for a while now,
> > > > > so I for one really can't condone anything trying to rely on it.
> > > > > Non-deterministic behaviour based on driver probe order for multi-device
> > > > > groups is part of the existing problem, and your proposal seems equally
> > > > > vulnerable to that too.
> > > >
> > > > Doh! :( I definitely can't say I understand the iommu subsystem
> > > > amazingly well. It was working for me, but I could believe that I was
> > > > somehow violating a rule somewhere.
> > > >
> > > > I'm having a bit of a hard time understanding where the problem is
> > > > though. Is there any chance that you missed the part of my series
> > > > where I introduced a "pre_probe" step? Specifically, I see this:
> > > >
> > > > * really_probe() is called w/ a driver and a device.
> > > > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > > > * -> eventually calls iommu_probe_device() w/ the device.
> > > > * -> calls iommu_alloc_default_domain() w/ the device
> > > > * -> calls iommu_group_alloc_default_domain()
> > > > * -> always allocates a new domain
> > > >
> > > > ...so we always have a "struct device" when a domain is allocated if
> > > > that domain is going to be associated with a device.
> > > >
> > > > I will agree that iommu_probe_device() is called before the driver
> > > > probe, but unless I missed something it's after the device driver is
> > > > loaded.  ...and assuming something like patch #1 in this series looks
> > > > OK then iommu_probe_device() will be called after "pre_probe".
> > > >
> > > > So assuming I'm not missing something, I'm not actually relying the
> > > > IOMMU getting init off the back of driver probe.
> > > >
> > > >
> > > > > FWIW we already have a go-faster knob for people who want to tweak the
> > > > > security/performance compromise for specific devices, namely the sysfs
> > > > > interface for changing a group's domain type before binding the relevant
> > > > > driver(s). Is that something you could use in your application, say from
> > > > > an initramfs script?
> > > >
> > > > We've never had an initramfs script in Chrome OS. I don't know all the
> > > > history of why (I'm trying to check), but I'm nearly certain it was a
> > > > conscious decision. Probably it has to do with the fact that we're not
> > > > trying to build a generic distribution where a single boot source can
> > > > boot a huge variety of hardware. We generally have one kernel for a
> > > > class of devices. I believe avoiding the initramfs just keeps things
> > > > simpler.
> > > >
> > > > I think trying to revamp Chrome OS to switch to an initramfs type
> > > > system would be a pretty big undertaking since (as I understand it)
> > > > you can't just run a little command and then return to the normal boot
> > > > flow. Once you switch to initramfs you're committing to finding /
> > > > setting up the rootfs yourself and on Chrome OS I believe that means a
> > > > whole bunch of dm-verity work.
> > > >
> > > >
> > > > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > > > my fingers that the pre_probe() might be legit if you take a second
> > > > look at it?
> > >
> > > Couldn't you have a driver flag that has the same effect as twiddling
> > > sysfs? At the being of probe, check the flag and go set the underlying
> > > sysfs setting in the device.
> >
> > My understanding of what Robin is saying is that we'd need this info
> > well before the driver is even available. The pre_probe() is
> > effectively doing the same thing you are suggesting.
>
> Right, I was just about to respond with the same. ;-) So overall right
> now we're blocked waiting for someone to point out the error in my
> logic. ;-)

Okay, I don't see how sysfs would work in that case either. You can't
assume the driver is not available until after sysfs. But I'll defer
to others...

> > > Though you may want this to be per device, not per driver. To do that
> > > early, I think you'd need a DT property. I wouldn't be totally opposed
> > > to that and I appreciate you not starting there. :)
> >
> > Which is what I'm suggest elsewhere in the thread:
> >
> > https://lore.kernel.org/lkml/CAGETcx83qCZF5JN5cqXxdSFiEgfc4jYESJg-RepL2wJXJv0Eww@mail.gmail.com/
>
> Rob: I'd be happy if you wanted to comment on that thread. If you say
> that it's fine to add a generic device tree property to control
> strictness then I'm more than happy to add support for it. I've been
> going on the theory that you'd NAK such a property but I'm totally
> good with being wrong. ;-)
>
> I'd be more than happy if you could suggest what you'd envision such a
> property to be named.

You want me to do the hard part? ;)

Would this work as a flag in iommus cell (either another cell or bit
in the existing cell)?

You could go the compatible match list route as well. At least until
you work out the kernel implementation.

Rob
Doug Anderson June 23, 2021, 5:29 p.m. UTC | #10
Hi,

On Tue, Jun 22, 2021 at 3:10 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-06-22 17:06, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> Hi Doug,
> >>
> >> On 2021-06-22 00:52, Douglas Anderson wrote:
> >>>
> >>> This patch attempts to put forward a proposal for enabling non-strict
> >>> DMA on a device-by-device basis. The patch series requests non-strict
> >>> DMA for the Qualcomm SDHCI controller as a first device to enable,
> >>> getting a nice bump in performance with what's believed to be a very
> >>> small drop in security / safety (see the patch for the full argument).
> >>>
> >>> As part of this patch series I am end up slightly cleaning up some of
> >>> the interactions between the PCI subsystem and the IOMMU subsystem but
> >>> I don't go all the way to fully remove all the tentacles. Specifically
> >>> this patch series only concerns itself with a single aspect: strict
> >>> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> >>> to talk about / reason about for more subsystems compared to overall
> >>> deciding what it means for a device to be "external" or "untrusted".
> >>>
> >>> If something like this patch series ends up being landable, it will
> >>> undoubtedly need coordination between many maintainers to land. I
> >>> believe it's fully bisectable but later patches in the series
> >>> definitely depend on earlier ones. Sorry for the long CC list. :(
> >>
> >> Unfortunately, this doesn't work. In normal operation, the default
> >> domains should be established long before individual drivers are even
> >> loaded (if they are modules), let alone anywhere near probing. The fact
> >> that iommu_probe_device() sometimes gets called far too late off the
> >> back of driver probe is an unfortunate artefact of the original
> >> probe-deferral scheme, and causes other problems like potentially
> >> malformed groups - I've been forming a plan to fix that for a while now,
> >> so I for one really can't condone anything trying to rely on it.
> >> Non-deterministic behaviour based on driver probe order for multi-device
> >> groups is part of the existing problem, and your proposal seems equally
> >> vulnerable to that too.
> >
> > Doh! :( I definitely can't say I understand the iommu subsystem
> > amazingly well. It was working for me, but I could believe that I was
> > somehow violating a rule somewhere.
> >
> > I'm having a bit of a hard time understanding where the problem is
> > though. Is there any chance that you missed the part of my series
> > where I introduced a "pre_probe" step? Specifically, I see this:
> >
> > * really_probe() is called w/ a driver and a device.
> > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > * -> eventually calls iommu_probe_device() w/ the device.
>
> This...
>
> > * -> calls iommu_alloc_default_domain() w/ the device
> > * -> calls iommu_group_alloc_default_domain()
> > * -> always allocates a new domain
> >
> > ...so we always have a "struct device" when a domain is allocated if
> > that domain is going to be associated with a device.
> >
> > I will agree that iommu_probe_device() is called before the driver
> > probe, but unless I missed something it's after the device driver is
> > loaded.  ...and assuming something like patch #1 in this series looks
> > OK then iommu_probe_device() will be called after "pre_probe".
> >
> > So assuming I'm not missing something, I'm not actually relying the
> > IOMMU getting init off the back of driver probe.
>
> ...is implicitly that. Sorry that it's not obvious.
>
> The "proper" flow is that iommu_probe_device() is called for everything
> which already exists during the IOMMU driver's own probe when it calls
> bus_set_iommu(), then at BUS_NOTIFY_ADD_DEVICE time for everything which
> appears subsequently. The only trouble is, to observe it in action on a
> DT-based system you'd currently have to go back at least 4 years, before
> 09515ef5ddad...
>
> Basically there were two issues: firstly we need the of_xlate step
> before add_device (now probe_device) for a DT-based IOMMU driver to know
> whether it should care about the given device or not. When -EPROBE_DEFER
> was the only tool we had to ensure probe ordering, and resolving the
> "iommus" DT property the only place to decide that, delaying it all
> until driver probe time was the only reasonable option, however ugly.
> The iommu_probe_device() "replay" in {of,acpi}_iommu_configure() is
> merely doing its best to fake up the previous behaviour. Try binding a
> dummy driver to your device first, then unbind it and bind the real one,
> and you'll see that iommu_probe_device() doesn't run the second or
> subsequent times. Now that we have fw_devlink to take care of ordering,

I probably haven't been following fw_devlink as closely as I should,
but does it completely replace -EPROBE_DEFER? From what I can tell
right now it helps optimize the boot ordering but it's not mandatory
(AKA Linux should boot/work fine with fw_devlink=off)?


> the main reason for this weirdness is largely gone, so I'm keen to start
> getting rid of the divergence again as far as possible. Fundamentally,
> IOMMU drivers are supposed to be aware of all devices which the kernel
> knows about, regardless of whether they have a driver available or not.
>
> The second issue is that when we have multiple IOMMU instances, the
> initial bus_set_iommu() "replay" is only useful for the first instance,
> so devices managed by other instances which aren't up and running yet
> will be glossed over. Currently this ends up being papered over by the
> solution to the first point on DT systems, while the x86 drivers hide
> their individual IOMMU units behind a single IOMMU API "instance", so
> it's been having little impact in practice. However, improving the core
> API's multi-instance support is an increasingly pressing issue now that
> new more varied systems are showing up, and it's that which is really
> going to be driving the aforementioned changes. FWIW the plan I
> currently have is to hang things off iommu_device_register() instead.

I tried to follow all the above the best I can and hopefully
understood some of it. ;-)

OK, so what do you think about these points?

* I think you are arguing that IOMMU strictness requirements needs to
be known regardless of whether we have a driver for a given device or
not. So the whole pre_probe stuff is just not right.

* One thing that's been proposed is putting strictness info in the
device tree. In theory on device-tree systems you could still put
strictness info in the per-device nodes assuming that all devices are
actually described in the device tree. That sounds nice, but I can
believe it would be pretty hard to parse backward like this. AKA when
the IOMMU probes it would need to find all the device tree nodes that
would end up grouped together parse their details to find out if they
should be non-strict.

* Instead of putting the details in per-device nodes, maybe it makes
sense to accept that we should be specifying these things at the IOMMU
level? If specifying things at the device tree level then the
device-tree node of the IOMMU itself would just have a list of things
that should be strict/non-strict. ...this could potentially be merged
with a hardcoded list of things in the IOMMU driver based on the IOMMU
compatible string.

Do those sound right?

I still haven't totally grokked the ideal way to identify devices. I
guess on Qualcomm systems each device is in its own group and so could
have its own strictness levels? ...or would it be better to list by
"stream ID" or something like that?

If we do something like this then maybe that's a solution that could
land short-ish term? We would know right at init time whether a given
domain should be strict or non-strict and there'd be no requirements
to transition it.


> >> FWIW we already have a go-faster knob for people who want to tweak the
> >> security/performance compromise for specific devices, namely the sysfs
> >> interface for changing a group's domain type before binding the relevant
> >> driver(s). Is that something you could use in your application, say from
> >> an initramfs script?
> >
> > We've never had an initramfs script in Chrome OS. I don't know all the
> > history of why (I'm trying to check), but I'm nearly certain it was a
> > conscious decision. Probably it has to do with the fact that we're not
> > trying to build a generic distribution where a single boot source can
> > boot a huge variety of hardware. We generally have one kernel for a
> > class of devices. I believe avoiding the initramfs just keeps things
> > simpler.
> >
> > I think trying to revamp Chrome OS to switch to an initramfs type
> > system would be a pretty big undertaking since (as I understand it)
> > you can't just run a little command and then return to the normal boot
> > flow. Once you switch to initramfs you're committing to finding /
> > setting up the rootfs yourself and on Chrome OS I believe that means a
> > whole bunch of dm-verity work.
> >
> >
> > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > my fingers that the pre_probe() might be legit if you take a second
> > look at it?
>
> That's fair enough - TBH the current sysfs interface is a pretty
> specialist sport primarily for datacentre folks who can afford to take
> down their 40GBE NIC or whatever momentarily for a longer-term payoff,
> but it was worth exploring - I'm assuming the SDHCI holds your root
> filesystem so you wouldn't be able to do the unbinding dance from real
> userspace. As I said, the idea of embedding any sort of data in
> individual client drivers is a non-starter in general since it only has
> any hope of working on DT platforms (maybe arm64 ACPI too?), and only
> for very much the wrong reasons.
>
> If this is something primarily demanded by QCom platforms in the short
> term,

At the moment I'm mostly focused on QCom platforms, but that's mostly
because they're the first board that I've dealt with that has iommus
on pretty much every peripheral under the sun. That's a _good_ thing,
but it also means it's where we're hitting all these performance
issues. I believe that the usage of "iommus everywhere" is about to
become a lot more widespread across the SoC ecosystem, especially as
they are important for virtualization and that seems to be a hot-topic
these days.

Basically: I could land a short-term hack a solution for me (and
probably this is the right thing for me to do?), but I'm very much
interested in finding a better solution, ideally something that we
could achieve in something less than a year? Am I dreaming?


> I'm tempted to say just try it with more device-matching magic in
> arm-smmu-qcom.

Yeah, I had this working (at least as far as my testing went)
downstream in the chromeos-5.4 tree. I basically just implemented
"init_context" for the arm-smmu-qcom driver and then if the "struct
device *" passed in matched the SDMMC compatible string I just ORed in
"IO_PGTABLE_QUIRK_NON_STRICT". That actually seemed to work fine in
the downstream kernel tree but not upstream. I believe it broke in
commit a250c23f15c2 ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE")
because the flush queue stopped getting initted.

I think I could revamp my patch series to continue to have the
strictness attributes in the "struct iommu_domain" but just get rid of
the bits in the "struct device" and obviously get rid of the
"pre_probe" patch. Would something like that be acceptable?


> Otherwise, the idea of growing the sysfs interface to
> allow switching a DMA domain from default-strict to non-strict is
> certainly an interesting prospect. Going a step beyond that to bring up
> a flush queue 'live' without rebuilding the whole group and domain could
> get ugly when it comes to drivers' interaction with io-pgtable, but I
> think it might be *technically* feasible...

It seems like it would be nice, but maybe with the above we could get
by without having to do this?

-Doug
Doug Anderson June 24, 2021, 5:23 p.m. UTC | #11
Hi,

On Wed, Jun 23, 2021 at 10:29 AM Doug Anderson <dianders@chromium.org> wrote:
>
> * Instead of putting the details in per-device nodes, maybe it makes
> sense to accept that we should be specifying these things at the IOMMU
> level? If specifying things at the device tree level then the
> device-tree node of the IOMMU itself would just have a list of things
> that should be strict/non-strict. ...this could potentially be merged
> with a hardcoded list of things in the IOMMU driver based on the IOMMU
> compatible string.
>
> Do those sound right?
>
> I still haven't totally grokked the ideal way to identify devices. I
> guess on Qualcomm systems each device is in its own group and so could
> have its own strictness levels? ...or would it be better to list by
> "stream ID" or something like that?
>
> If we do something like this then maybe that's a solution that could
> land short-ish term? We would know right at init time whether a given
> domain should be strict or non-strict and there'd be no requirements
> to transition it.

OK, so I have attempted to implement this in the Qualcomm IOMMU driver
in v2 of this series:

https://lore.kernel.org/r/20210624171759.4125094-1-dianders@chromium.org/

Hopefully that doesn't fragment the discussion too much, but it seemed
like it might help move us forward to see what this would look like in
code.

I'll also note that I removed a few people from the CC list on v2 of
the series because I'm no longer touching any code outside of the
IOMMU subsystem and I thought folks would appreciate less noise in
their inboxes. I've CCed a boatload of mailing lists though so it
should be easy to find. If I dropped you from the CC list of v2 and
you really want back on then I'm more than happy to re-add you.

-Doug