diff mbox series

[1/1] iommu/vt-d: use DMA domain for real DMA devices and subdevices

Message ID 20200409191736.6233-2-jonathan.derrick@intel.com
State New
Headers show
Series Real DMA dev DMA domain patch | expand

Commit Message

Jon Derrick April 9, 2020, 7:17 p.m. UTC
The PCI devices handled by intel-iommu may have a DMA requester on another bus,
such as VMD subdevices needing to use the VMD endpoint.

The real DMA device is now used for the DMA mapping, but one case was missed
earlier: if the VMD device (and hence subdevices too) are under
IOMMU_DOMAIN_IDENTITY, mappings do not work.

Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an
iommu DMA mapping, and fall back on dma_direct_map_page() for the
IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken
when intel_page_page() handles a subdevice.

We observe that at iommu attach time, dmar_insert_one_dev_info() for the
subdevices will never set dev->archdata.iommu. This is because that function
uses find_domain() to check if there is already an IOMMU for the device, and
find_domain() then defers to the real DMA device which does have one. Thus
dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu.

Then, later:

1. intel_map_page() checks if an IOMMU mapping is needed by calling
   iommu_need_mapping() on the subdevice. identity_mapping() returns
   false because dev->archdata.iommu is NULL, so this function
   returns false indicating that mapping is needed.
2. __intel_map_single() is called to create the mapping.
3. __intel_map_single() calls find_domain(). This function now returns
   the IDENTITY domain corresponding to the real DMA device.
4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
   A failure is hit and the entire operation is aborted, because this
   codepath is not intended to handle IDENTITY mappings:
       if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
                   return NULL;

This becomes problematic if the real DMA device and the subdevices have
different addressing capabilities and some require translation. Instead we can
put the real DMA dev and any subdevices on the DMA domain. This change assigns
subdevices to the DMA domain, and moves the real DMA device to the DMA domain
if necessary.

Reported-by: Daniel Drake <drake@endlessm.com>
Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Daniel Drake <drake@endlessm.com>
---
 drivers/iommu/intel-iommu.c | 56 ++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 13 deletions(-)

Comments

Baolu Lu April 10, 2020, 1:22 a.m. UTC | #1
Hi,

On 2020/4/10 3:17, Jon Derrick wrote:
> The PCI devices handled by intel-iommu may have a DMA requester on another bus,
> such as VMD subdevices needing to use the VMD endpoint.
> 
> The real DMA device is now used for the DMA mapping, but one case was missed
> earlier: if the VMD device (and hence subdevices too) are under
> IOMMU_DOMAIN_IDENTITY, mappings do not work.
> 
> Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an
> iommu DMA mapping, and fall back on dma_direct_map_page() for the
> IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken
> when intel_page_page() handles a subdevice.
> 
> We observe that at iommu attach time, dmar_insert_one_dev_info() for the
> subdevices will never set dev->archdata.iommu. This is because that function

Do you mind telling why not setting this?

> uses find_domain() to check if there is already an IOMMU for the device, and
> find_domain() then defers to the real DMA device which does have one. Thus
> dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu.
> 
> Then, later:
> 
> 1. intel_map_page() checks if an IOMMU mapping is needed by calling
>     iommu_need_mapping() on the subdevice. identity_mapping() returns
>     false because dev->archdata.iommu is NULL, so this function
>     returns false indicating that mapping is needed.
> 2. __intel_map_single() is called to create the mapping.
> 3. __intel_map_single() calls find_domain(). This function now returns
>     the IDENTITY domain corresponding to the real DMA device.
> 4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
>     A failure is hit and the entire operation is aborted, because this
>     codepath is not intended to handle IDENTITY mappings:
>         if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
>                     return NULL;

This is caused by the fragile private domain implementation. We are in
process of removing it by enhancing the iommu subsystem with per-group
default domain.

https://www.spinics.net/lists/iommu/msg42976.html

So ultimately VMD subdevices should have their own per-device iommu data
and support per-device dma ops.

Best regards,
baolu
Daniel Drake April 12, 2020, 3:50 a.m. UTC | #2
Hi Jon,

Thanks for picking this up. Apologies for my absence here - I wasn't
able to work on this recently, but I'm back again now.

On Fri, Apr 10, 2020 at 3:32 AM Jon Derrick <jonathan.derrick@intel.com> wrote:
> This becomes problematic if the real DMA device and the subdevices have
> different addressing capabilities and some require translation. Instead we can
> put the real DMA dev and any subdevices on the DMA domain. This change assigns
> subdevices to the DMA domain, and moves the real DMA device to the DMA domain
> if necessary.

Have you tested this with the real DMA device in identity mode?
It is not quite working for me. (Again, I'm not using VMD here, but
have looked closely and believe we're working under the same
constraints)

First, the real DMA device gets added to the group:
 pci 0000:00:17.0: Adding to iommu group 9
(it's in IDENTITY mode here)

Then later, the first subdevice comes along, and these are the results:
 pci 10000:00:00.0: [8086:02d7] type 00 class 0x010601
 pci 10000:00:00.0: reg 0x10: [mem 0xae1a0000-0xae1a7fff]
 pci 10000:00:00.0: reg 0x14: [mem 0xae1a8000-0xae1a80ff]
 pci 10000:00:00.0: reg 0x18: [io  0x3090-0x3097]
 pci 10000:00:00.0: reg 0x1c: [io  0x3080-0x3083]
 pci 10000:00:00.0: reg 0x20: [io  0x3060-0x307f]
 pci 10000:00:00.0: reg 0x24: [mem 0xae100000-0xae103fff]
 pci 10000:00:00.0: PME# supported from D3hot
 pci 10000:00:00.0: Adding to iommu group 9
 pci 10000:00:00.0: DMAR: Failed to get a private domain.

That final message is added by your patch and indicates that it's not working.

This is because the subdevice got added to the iommu group before the
code you added tried to change to the DMA domain.

It first gets added to the group through this call path:
    intel_iommu_add_device
-> iommu_group_get_for_dev
-> iommu_group_add_device

Then, continuing within intel_iommu_add_device we get to the code you
added, which tries to move the real DMA dev to DMA mode instead. It
calls:

   intel_iommu_request_dma_domain_for_dev
-> iommu_request_dma_domain_for_dev
-> request_default_domain_for_dev

Which fails here:
    /* Don't change mappings of existing devices */
    ret = -EBUSY;
    if (iommu_group_device_count(group) != 1)
        goto out;

because we already have 2 devices in the group (the real DMA dev, plus
the subdevice we're in the process of handling now).

Next I'll look into the iommu group rework that Baolu mentioned.

Thanks,
Daniel
Christoph Hellwig April 12, 2020, 7:35 a.m. UTC | #3
On Sun, Apr 12, 2020 at 11:50:09AM +0800, Daniel Drake wrote:
> > different addressing capabilities and some require translation. Instead we can
> > put the real DMA dev and any subdevices on the DMA domain. This change assigns
> > subdevices to the DMA domain, and moves the real DMA device to the DMA domain
> > if necessary.
> 
> Have you tested this with the real DMA device in identity mode?
> It is not quite working for me. (Again, I'm not using VMD here, but
> have looked closely and believe we're working under the same
> constraints)

So if you are not using VMD how does this matter for upstream?
Daniel Drake April 13, 2020, 2:25 a.m. UTC | #4
On Fri, Apr 10, 2020 at 9:22 AM Lu Baolu <baolu.lu@linux.intel.com> wrote:
> This is caused by the fragile private domain implementation. We are in
> process of removing it by enhancing the iommu subsystem with per-group
> default domain.
>
> https://www.spinics.net/lists/iommu/msg42976.html
>
> So ultimately VMD subdevices should have their own per-device iommu data
> and support per-device dma ops.

Interesting. There's also this patchset you posted:
[PATCH 00/19] [PULL REQUEST] iommu/vt-d: patches for v5.7
https://lists.linuxfoundation.org/pipermail/iommu/2020-April/042967.html
(to be pushed out to 5.8)

In there you have:
> iommu/vt-d: Don't force 32bit devices to uses DMA domain
which seems to clash with the approach being explored in this thread.

And:
> iommu/vt-d: Apply per-device dma_ops
This effectively solves the trip point that caused me to open these
discussions, where intel_map_page() -> iommu_need_mapping() would
incorrectly determine that a intel-iommu DMA mapping was needed for a
PCI subdevice running in identity mode. After this patch, a PCI
subdevice in identity mode uses the default system dma_ops and
completely avoids intel-iommu.

So that solves the issues I was looking at. Jon, you might want to
check if the problems you see are likewise solved for you by these
patches.

I didn't try Joerg's iommu group rework yet as it conflicts with those
patches above.

Daniel
Baolu Lu April 13, 2020, 2:48 a.m. UTC | #5
Hi Daniel,

On 2020/4/13 10:25, Daniel Drake wrote:
> On Fri, Apr 10, 2020 at 9:22 AM Lu Baolu <baolu.lu@linux.intel.com> wrote:
>> This is caused by the fragile private domain implementation. We are in
>> process of removing it by enhancing the iommu subsystem with per-group
>> default domain.
>>
>> https://www.spinics.net/lists/iommu/msg42976.html
>>
>> So ultimately VMD subdevices should have their own per-device iommu data
>> and support per-device dma ops.
> 
> Interesting. There's also this patchset you posted:
> [PATCH 00/19] [PULL REQUEST] iommu/vt-d: patches for v5.7
> https://lists.linuxfoundation.org/pipermail/iommu/2020-April/042967.html
> (to be pushed out to 5.8)

Both are trying to solve a same problem.

I have sync'ed with Joerg. This patch set will be replaced with Joerg's
proposal due to a race concern between domain switching and driver
binding. I will rebase all vt-d patches in this set on top of Joerg's
change.

Best regards,
baolu

> 
> In there you have:
>> iommu/vt-d: Don't force 32bit devices to uses DMA domain
> which seems to clash with the approach being explored in this thread.
> 
> And:
>> iommu/vt-d: Apply per-device dma_ops
> This effectively solves the trip point that caused me to open these
> discussions, where intel_map_page() -> iommu_need_mapping() would
> incorrectly determine that a intel-iommu DMA mapping was needed for a
> PCI subdevice running in identity mode. After this patch, a PCI
> subdevice in identity mode uses the default system dma_ops and
> completely avoids intel-iommu.
> 
> So that solves the issues I was looking at. Jon, you might want to
> check if the problems you see are likewise solved for you by these
> patches.
> 
> I didn't try Joerg's iommu group rework yet as it conflicts with those
> patches above.
> 
> Daniel
>
Jon Derrick April 13, 2020, 4:08 p.m. UTC | #6
On Sun, 2020-04-12 at 11:50 +0800, Daniel Drake wrote:
> Hi Jon,
> 
> Thanks for picking this up. Apologies for my absence here - I wasn't
> able to work on this recently, but I'm back again now.
> 
> On Fri, Apr 10, 2020 at 3:32 AM Jon Derrick <jonathan.derrick@intel.com> wrote:
> > This becomes problematic if the real DMA device and the subdevices have
> > different addressing capabilities and some require translation. Instead we can
> > put the real DMA dev and any subdevices on the DMA domain. This change assigns
> > subdevices to the DMA domain, and moves the real DMA device to the DMA domain
> > if necessary.
> 
> Have you tested this with the real DMA device in identity mode?
> It is not quite working for me. (Again, I'm not using VMD here, but
> have looked closely and believe we're working under the same
> constraints)

It does work for me when real DMA device starts in Identity, but my
'real DMA device' doesn't do the DMA. It just provides the source-id.

Does your 'real DMA device' do DMA?
I suppose that could be the reason. You wouldn't want to change the
domain on the live device using the method I proposed.

> 
> First, the real DMA device gets added to the group:
>  pci 0000:00:17.0: Adding to iommu group 9
> (it's in IDENTITY mode here)
> 
> Then later, the first subdevice comes along, and these are the results:
>  pci 10000:00:00.0: [8086:02d7] type 00 class 0x010601
>  pci 10000:00:00.0: reg 0x10: [mem 0xae1a0000-0xae1a7fff]
>  pci 10000:00:00.0: reg 0x14: [mem 0xae1a8000-0xae1a80ff]
>  pci 10000:00:00.0: reg 0x18: [io  0x3090-0x3097]
>  pci 10000:00:00.0: reg 0x1c: [io  0x3080-0x3083]
>  pci 10000:00:00.0: reg 0x20: [io  0x3060-0x307f]
>  pci 10000:00:00.0: reg 0x24: [mem 0xae100000-0xae103fff]
>  pci 10000:00:00.0: PME# supported from D3hot
>  pci 10000:00:00.0: Adding to iommu group 9
>  pci 10000:00:00.0: DMAR: Failed to get a private domain.
> 
> That final message is added by your patch and indicates that it's not working.
> 
> This is because the subdevice got added to the iommu group before the
> code you added tried to change to the DMA domain.
> 
> It first gets added to the group through this call path:
>     intel_iommu_add_device
> -> iommu_group_get_for_dev
> -> iommu_group_add_device
> 
> Then, continuing within intel_iommu_add_device we get to the code you
> added, which tries to move the real DMA dev to DMA mode instead. It
> calls:
> 
>    intel_iommu_request_dma_domain_for_dev
> -> iommu_request_dma_domain_for_dev
> -> request_default_domain_for_dev
> 
> Which fails here:
>     /* Don't change mappings of existing devices */
>     ret = -EBUSY;
>     if (iommu_group_device_count(group) != 1)
>         goto out;
> 
> because we already have 2 devices in the group (the real DMA dev, plus
> the subdevice we're in the process of handling now).
> 

You're right. I see the message too, but it still works for me.

> Next I'll look into the iommu group rework that Baolu mentioned.
> 
> Thanks,
> Daniel
Jon Derrick April 13, 2020, 4:08 p.m. UTC | #7
On Mon, 2020-04-13 at 10:48 +0800, Lu Baolu wrote:
> Hi Daniel,
> 
> On 2020/4/13 10:25, Daniel Drake wrote:
> > On Fri, Apr 10, 2020 at 9:22 AM Lu Baolu <baolu.lu@linux.intel.com> wrote:
> > > This is caused by the fragile private domain implementation. We are in
> > > process of removing it by enhancing the iommu subsystem with per-group
> > > default domain.
> > > 
> > > https://www.spinics.net/lists/iommu/msg42976.html
> > > 
> > > So ultimately VMD subdevices should have their own per-device iommu data
> > > and support per-device dma ops.
> > 
> > Interesting. There's also this patchset you posted:
> > [PATCH 00/19] [PULL REQUEST] iommu/vt-d: patches for v5.7
> > https://lists.linuxfoundation.org/pipermail/iommu/2020-April/042967.html
> > (to be pushed out to 5.8)
> 
> Both are trying to solve a same problem.
> 
> I have sync'ed with Joerg. This patch set will be replaced with Joerg's
> proposal due to a race concern between domain switching and driver
> binding. I will rebase all vt-d patches in this set on top of Joerg's
> change.
> 
> Best regards,
> baolu
> 
Thanks Baolu. I'll pick this back up on top of the for-5.8 changes.


> > In there you have:
> > > iommu/vt-d: Don't force 32bit devices to uses DMA domain
> > which seems to clash with the approach being explored in this thread.
> > 
> > And:
> > > iommu/vt-d: Apply per-device dma_ops
> > This effectively solves the trip point that caused me to open these
> > discussions, where intel_map_page() -> iommu_need_mapping() would
> > incorrectly determine that a intel-iommu DMA mapping was needed for a
> > PCI subdevice running in identity mode. After this patch, a PCI
> > subdevice in identity mode uses the default system dma_ops and
> > completely avoids intel-iommu.
> > 
> > So that solves the issues I was looking at. Jon, you might want to
> > check if the problems you see are likewise solved for you by these
> > patches.
> > 
> > I didn't try Joerg's iommu group rework yet as it conflicts with those
> > patches above.
> > 
> > Daniel
> >
Joerg Roedel April 18, 2020, 12:13 p.m. UTC | #8
On Mon, Apr 13, 2020 at 10:48:55AM +0800, Lu Baolu wrote:
> I have sync'ed with Joerg. This patch set will be replaced with Joerg's
> proposal due to a race concern between domain switching and driver
> binding. I will rebase all vt-d patches in this set on top of Joerg's
> change.

Okay, but is this patch relevant for v5.7? The other changes we are
working on will not land before v5.8.

Regards,

	Joerg
diff mbox series

Patch

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef0a5246700e..b4844a502499 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3049,6 +3049,9 @@  static int device_def_domain_type(struct device *dev)
 		if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev))
 			return IOMMU_DOMAIN_IDENTITY;
 
+		if (pci_real_dma_dev(pdev) != pdev)
+			return IOMMU_DOMAIN_DMA;
+
 		/*
 		 * We want to start off with all devices in the 1:1 domain, and
 		 * take them out later if we find they can't access all of memory.
@@ -5781,12 +5784,32 @@  static bool intel_iommu_capable(enum iommu_cap cap)
 	return false;
 }
 
+static int intel_iommu_request_dma_domain_for_dev(struct device *dev,
+                                                  struct dmar_domain *domain)
+{
+	int ret;
+
+	ret = iommu_request_dma_domain_for_dev(dev);
+	if (ret) {
+		dmar_remove_one_dev_info(dev);
+		domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
+		if (!get_private_domain_for_dev(dev)) {
+			dev_warn(dev,
+				 "Failed to get a private domain.\n");
+				return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
 static int intel_iommu_add_device(struct device *dev)
 {
 	struct dmar_domain *dmar_domain;
 	struct iommu_domain *domain;
 	struct intel_iommu *iommu;
 	struct iommu_group *group;
+	struct device *real_dev = dev;
 	u8 bus, devfn;
 	int ret;
 
@@ -5810,6 +5833,21 @@  static int intel_iommu_add_device(struct device *dev)
 
 	domain = iommu_get_domain_for_dev(dev);
 	dmar_domain = to_dmar_domain(domain);
+
+	if (dev_is_pci(dev))
+		real_dev = &pci_real_dma_dev(to_pci_dev(dev))->dev;
+
+	if (real_dev != dev) {
+		domain = iommu_get_domain_for_dev(real_dev);
+		if (domain->type != IOMMU_DOMAIN_DMA) {
+			dmar_remove_one_dev_info(real_dev);
+
+			ret = intel_iommu_request_dma_domain_for_dev(real_dev, dmar_domain);
+			if (ret)
+				goto unlink;
+		}
+	}
+
 	if (domain->type == IOMMU_DOMAIN_DMA) {
 		if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) {
 			ret = iommu_request_dm_for_dev(dev);
@@ -5823,20 +5861,12 @@  static int intel_iommu_add_device(struct device *dev)
 		}
 	} else {
 		if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
-			ret = iommu_request_dma_domain_for_dev(dev);
-			if (ret) {
-				dmar_remove_one_dev_info(dev);
-				dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-				if (!get_private_domain_for_dev(dev)) {
-					dev_warn(dev,
-						 "Failed to get a private domain.\n");
-					ret = -ENOMEM;
-					goto unlink;
-				}
+			ret = intel_iommu_request_dma_domain_for_dev(dev, dmar_domain);
+			if (ret)
+				goto unlink;
 
-				dev_info(dev,
-					 "Device uses a private dma domain.\n");
-			}
+			dev_info(dev,
+				 "Device uses a private dma domain.\n");
 		}
 	}