diff mbox series

[RFC,v3,02/25] hw/iommu: introduce DualStageIOMMUObject

Message ID 1580300216-86172-3-git-send-email-yi.l.liu@intel.com
State New
Headers show
Series intel_iommu: expose Shared Virtual Addressing to VMs | expand

Commit Message

Yi Liu Jan. 29, 2020, 12:16 p.m. UTC
From: Liu Yi L <yi.l.liu@intel.com>

Currently, many platform vendors provide the capability of dual stage
DMA address translation in hardware. For example, nested translation
on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3,
and etc. In dual stage DMA address translation, there are two stages
address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a
second-level) translation structures. Stage-1 translation results are
also subjected to stage-2 translation structures. Take vSVA (Virtual
Shared Virtual Addressing) as an example, guest IOMMU driver owns
stage-1 translation structures (covers GVA->GPA translation), and host
IOMMU driver owns stage-2 translation structures (covers GPA->HPA
translation). VMM is responsible to bind stage-1 translation structures
to host, thus hardware could achieve GVA->GPA and then GPA->HPA
translation. For more background on SVA, refer the below links.
 - https://www.youtube.com/watch?v=Kq_nfGK5MwQ
 - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\
Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf

As above, dual stage DMA translation offers two stage address mappings,
which could have better DMA address translation support for passthru
devices. This is also what vIOMMU developers are doing so far. Efforts
includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from
Eric Auger.
https://www.spinics.net/lists/kvm/msg198556.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html

Both efforts are aiming to expose a vIOMMU with dual stage hardware
backed. As so, QEMU needs to have an explicit object to stand for
the dual stage capability from hardware. Such object offers abstract
for the dual stage DMA translation related operations, like:

 1) PASID allocation (allow host to intercept in PASID allocation)
 2) bind stage-1 translation structures to host
 3) propagate stage-1 cache invalidation to host
 4) DMA address translation fault (I/O page fault) servicing etc.

This patch introduces DualStageIOMMUObject to stand for the hardware
dual stage DMA translation capability. PASID allocation/free are the
first operation included in it, in future, there will be more operations
like bind_stage1_pgtbl and invalidate_stage1_cache and etc.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/Makefile.objs                    |  1 +
 hw/iommu/Makefile.objs              |  1 +
 hw/iommu/dual_stage_iommu.c         | 59 +++++++++++++++++++++++++++++++++++++
 include/hw/iommu/dual_stage_iommu.h | 59 +++++++++++++++++++++++++++++++++++++
 4 files changed, 120 insertions(+)
 create mode 100644 hw/iommu/Makefile.objs
 create mode 100644 hw/iommu/dual_stage_iommu.c
 create mode 100644 include/hw/iommu/dual_stage_iommu.h

Comments

David Gibson Jan. 31, 2020, 3:59 a.m. UTC | #1
On Wed, Jan 29, 2020 at 04:16:33AM -0800, Liu, Yi L wrote:
> From: Liu Yi L <yi.l.liu@intel.com>
> 
> Currently, many platform vendors provide the capability of dual stage
> DMA address translation in hardware. For example, nested translation
> on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3,
> and etc. In dual stage DMA address translation, there are two stages
> address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a
> second-level) translation structures. Stage-1 translation results are
> also subjected to stage-2 translation structures. Take vSVA (Virtual
> Shared Virtual Addressing) as an example, guest IOMMU driver owns
> stage-1 translation structures (covers GVA->GPA translation), and host
> IOMMU driver owns stage-2 translation structures (covers GPA->HPA
> translation). VMM is responsible to bind stage-1 translation structures
> to host, thus hardware could achieve GVA->GPA and then GPA->HPA
> translation. For more background on SVA, refer the below links.
>  - https://www.youtube.com/watch?v=Kq_nfGK5MwQ
>  - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\
> Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf
> 
> As above, dual stage DMA translation offers two stage address mappings,
> which could have better DMA address translation support for passthru
> devices. This is also what vIOMMU developers are doing so far. Efforts
> includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from
> Eric Auger.
> https://www.spinics.net/lists/kvm/msg198556.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html
> 
> Both efforts are aiming to expose a vIOMMU with dual stage hardware
> backed. As so, QEMU needs to have an explicit object to stand for
> the dual stage capability from hardware. Such object offers abstract
> for the dual stage DMA translation related operations, like:
> 
>  1) PASID allocation (allow host to intercept in PASID allocation)
>  2) bind stage-1 translation structures to host
>  3) propagate stage-1 cache invalidation to host
>  4) DMA address translation fault (I/O page fault) servicing etc.
> 
> This patch introduces DualStageIOMMUObject to stand for the hardware
> dual stage DMA translation capability. PASID allocation/free are the
> first operation included in it, in future, there will be more operations
> like bind_stage1_pgtbl and invalidate_stage1_cache and etc.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>

Several overall queries about this:

1) Since it's explicitly handling PASIDs, this seems a lot more
   specific to SVM than the name suggests.  I'd suggest a rename.

2) Why are you hand rolling structures of pointers, rather than making
   this a QOM class or interface and putting those things into methods?

3) It's not really clear to me if this is for the case where both
   stages of translation are visible to the guest, or only one of
   them.
Yi Liu Jan. 31, 2020, 11:42 a.m. UTC | #2
Hi David,

> From: David Gibson [mailto:david@gibson.dropbear.id.au]
> Sent: Friday, January 31, 2020 11:59 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v3 02/25] hw/iommu: introduce DualStageIOMMUObject
> 
> On Wed, Jan 29, 2020 at 04:16:33AM -0800, Liu, Yi L wrote:
> > From: Liu Yi L <yi.l.liu@intel.com>
> >
> > Currently, many platform vendors provide the capability of dual stage
> > DMA address translation in hardware. For example, nested translation
> > on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3,
> > and etc. In dual stage DMA address translation, there are two stages
> > address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a
> > second-level) translation structures. Stage-1 translation results are
> > also subjected to stage-2 translation structures. Take vSVA (Virtual
> > Shared Virtual Addressing) as an example, guest IOMMU driver owns
> > stage-1 translation structures (covers GVA->GPA translation), and host
> > IOMMU driver owns stage-2 translation structures (covers GPA->HPA
> > translation). VMM is responsible to bind stage-1 translation structures
> > to host, thus hardware could achieve GVA->GPA and then GPA->HPA
> > translation. For more background on SVA, refer the below links.
> >  - https://www.youtube.com/watch?v=Kq_nfGK5MwQ
> >  - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\
> > Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf
> >
> > As above, dual stage DMA translation offers two stage address mappings,
> > which could have better DMA address translation support for passthru
> > devices. This is also what vIOMMU developers are doing so far. Efforts
> > includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from
> > Eric Auger.
> > https://www.spinics.net/lists/kvm/msg198556.html
> > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html
> >
> > Both efforts are aiming to expose a vIOMMU with dual stage hardware
> > backed. As so, QEMU needs to have an explicit object to stand for
> > the dual stage capability from hardware. Such object offers abstract
> > for the dual stage DMA translation related operations, like:
> >
> >  1) PASID allocation (allow host to intercept in PASID allocation)
> >  2) bind stage-1 translation structures to host
> >  3) propagate stage-1 cache invalidation to host
> >  4) DMA address translation fault (I/O page fault) servicing etc.
> >
> > This patch introduces DualStageIOMMUObject to stand for the hardware
> > dual stage DMA translation capability. PASID allocation/free are the
> > first operation included in it, in future, there will be more operations
> > like bind_stage1_pgtbl and invalidate_stage1_cache and etc.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> 
> Several overall queries about this:
> 
> 1) Since it's explicitly handling PASIDs, this seems a lot more
>    specific to SVM than the name suggests.  I'd suggest a rename.

It is not specific to SVM in future. We have efforts to move guest
IOVA support based on host IOMMU's dual-stage DMA translation
capability. Then, guest IOVA support will also re-use the methods
provided by this abstract layer. e.g. the bind_guest_pgtbl() and
flush_iommu_iotlb().

For the naming, how about HostIOMMUContext? This layer is to provide
explicit methods for setting up dual-stage DMA translation in host.

> 
> 2) Why are you hand rolling structures of pointers, rather than making
>    this a QOM class or interface and putting those things into methods?

Maybe the name is not proper. Although I named it as DualStageIOMMUObject,
it is actually a kind of abstract layer we discussed in previous email. I
think this is similar with VFIO_MAP/UNMAP. The difference is that VFIO_MAP/
UNMAP programs mappings to host iommu domain. While the newly added explicit
method is to link guest page table to host iommu domain. VFIO_MAP/UNMAP
is exposed to vIOMMU emulators via MemoryRegion layer. right? Maybe adding a
similar abstract layer is enough. Is adding QOM really necessary for this
case?

> 3) It's not really clear to me if this is for the case where both
>    stages of translation are visible to the guest, or only one of
>    them.

For this case, vIOMMU will only expose a single stage translation to VM.
e.g. Intel VT-d, vIOMMU exposes first-level translation to guest. Hardware
IOMMUs with the dual-stage translation capability lets guest own stage-1
translation structures and host owns the stage-2 translation structures.
VMM is responsible to bind guest's translation structures to host and
enable dual-stage translation. e.g. on Intel VT-d, config translation type
to be NESTED.

Take guest SVM as an example, guest iommu driver owns the gVA->gPA mappings,
which is treated as stage-1 translation from host point of view. Host itself
owns the gPA->hPPA translation and called stage-2 translation when dual-stage
translation is configured.

For guest IOVA, it is similar with guest SVM. Guest iommu driver owns the
gIOVA->gPA mappings, which is treated as stage-1 translation. Host owns the
gPA->hPA translation.

Regards,
Yi Liu
David Gibson Feb. 12, 2020, 6:32 a.m. UTC | #3
On Fri, Jan 31, 2020 at 11:42:06AM +0000, Liu, Yi L wrote:
> Hi David,
> 
> > From: David Gibson [mailto:david@gibson.dropbear.id.au]
> > Sent: Friday, January 31, 2020 11:59 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v3 02/25] hw/iommu: introduce DualStageIOMMUObject
> > 
> > On Wed, Jan 29, 2020 at 04:16:33AM -0800, Liu, Yi L wrote:
> > > From: Liu Yi L <yi.l.liu@intel.com>
> > >
> > > Currently, many platform vendors provide the capability of dual stage
> > > DMA address translation in hardware. For example, nested translation
> > > on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3,
> > > and etc. In dual stage DMA address translation, there are two stages
> > > address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a
> > > second-level) translation structures. Stage-1 translation results are
> > > also subjected to stage-2 translation structures. Take vSVA (Virtual
> > > Shared Virtual Addressing) as an example, guest IOMMU driver owns
> > > stage-1 translation structures (covers GVA->GPA translation), and host
> > > IOMMU driver owns stage-2 translation structures (covers GPA->HPA
> > > translation). VMM is responsible to bind stage-1 translation structures
> > > to host, thus hardware could achieve GVA->GPA and then GPA->HPA
> > > translation. For more background on SVA, refer the below links.
> > >  - https://www.youtube.com/watch?v=Kq_nfGK5MwQ
> > >  - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\
> > > Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf
> > >
> > > As above, dual stage DMA translation offers two stage address mappings,
> > > which could have better DMA address translation support for passthru
> > > devices. This is also what vIOMMU developers are doing so far. Efforts
> > > includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from
> > > Eric Auger.
> > > https://www.spinics.net/lists/kvm/msg198556.html
> > > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html
> > >
> > > Both efforts are aiming to expose a vIOMMU with dual stage hardware
> > > backed. As so, QEMU needs to have an explicit object to stand for
> > > the dual stage capability from hardware. Such object offers abstract
> > > for the dual stage DMA translation related operations, like:
> > >
> > >  1) PASID allocation (allow host to intercept in PASID allocation)
> > >  2) bind stage-1 translation structures to host
> > >  3) propagate stage-1 cache invalidation to host
> > >  4) DMA address translation fault (I/O page fault) servicing etc.
> > >
> > > This patch introduces DualStageIOMMUObject to stand for the hardware
> > > dual stage DMA translation capability. PASID allocation/free are the
> > > first operation included in it, in future, there will be more operations
> > > like bind_stage1_pgtbl and invalidate_stage1_cache and etc.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Eric Auger <eric.auger@redhat.com>
> > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > 
> > Several overall queries about this:
> > 
> > 1) Since it's explicitly handling PASIDs, this seems a lot more
> >    specific to SVM than the name suggests.  I'd suggest a rename.
> 
> It is not specific to SVM in future. We have efforts to move guest
> IOVA support based on host IOMMU's dual-stage DMA translation
> capability.

It's assuming the existence of pasids though, which is a rather more
specific model than simply having two translation stages.

> Then, guest IOVA support will also re-use the methods
> provided by this abstract layer. e.g. the bind_guest_pgtbl() and
> flush_iommu_iotlb().
> 
> For the naming, how about HostIOMMUContext? This layer is to provide
> explicit methods for setting up dual-stage DMA translation in host.

Uh.. maybe?  I'm still having trouble figuring out what this object
really represents.

> > 2) Why are you hand rolling structures of pointers, rather than making
> >    this a QOM class or interface and putting those things into methods?
> 
> Maybe the name is not proper. Although I named it as DualStageIOMMUObject,
> it is actually a kind of abstract layer we discussed in previous email. I
> think this is similar with VFIO_MAP/UNMAP. The difference is that VFIO_MAP/
> UNMAP programs mappings to host iommu domain. While the newly added explicit
> method is to link guest page table to host iommu domain. VFIO_MAP/UNMAP
> is exposed to vIOMMU emulators via MemoryRegion layer. right? Maybe adding a
> similar abstract layer is enough. Is adding QOM really necessary for this
> case?

Um... sorry, I'm having a lot of trouble making any sense of that.

> > 3) It's not really clear to me if this is for the case where both
> >    stages of translation are visible to the guest, or only one of
> >    them.
> 
> For this case, vIOMMU will only expose a single stage translation to VM.
> e.g. Intel VT-d, vIOMMU exposes first-level translation to guest. Hardware
> IOMMUs with the dual-stage translation capability lets guest own stage-1
> translation structures and host owns the stage-2 translation structures.
> VMM is responsible to bind guest's translation structures to host and
> enable dual-stage translation. e.g. on Intel VT-d, config translation type
> to be NESTED.

Ok, understood.

> Take guest SVM as an example, guest iommu driver owns the gVA->gPA mappings,
> which is treated as stage-1 translation from host point of view. Host itself
> owns the gPA->hPPA translation and called stage-2 translation when dual-stage
> translation is configured.
> 
> For guest IOVA, it is similar with guest SVM. Guest iommu driver owns the
> gIOVA->gPA mappings, which is treated as stage-1 translation. Host owns the
> gPA->hPA translation.

Ok, that makes sense.  It's still not really clear to me which part of
this setup this object represents.
diff mbox series

Patch

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 660e2b4..cab83fe 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -40,6 +40,7 @@  devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
 devices-dirs-$(CONFIG_NUBUS) += nubus/
 devices-dirs-y += semihosting/
 devices-dirs-y += smbios/
+devices-dirs-y += iommu/
 endif
 
 common-obj-y += $(devices-dirs-y)
diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs
new file mode 100644
index 0000000..d4f3b39
--- /dev/null
+++ b/hw/iommu/Makefile.objs
@@ -0,0 +1 @@ 
+obj-y += dual_stage_iommu.o
diff --git a/hw/iommu/dual_stage_iommu.c b/hw/iommu/dual_stage_iommu.c
new file mode 100644
index 0000000..be4179d
--- /dev/null
+++ b/hw/iommu/dual_stage_iommu.c
@@ -0,0 +1,59 @@ 
+/*
+ * QEMU abstract of Hardware Dual Stage DMA translation capability
+ *
+ * Copyright (C) 2020 Intel Corporation.
+ *
+ * Authors: Liu Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/iommu/dual_stage_iommu.h"
+
+int ds_iommu_pasid_alloc(DualStageIOMMUObject *dsi_obj, uint32_t min,
+                         uint32_t max, uint32_t *pasid)
+{
+    if (!dsi_obj) {
+        return -ENOENT;
+    }
+
+    if (dsi_obj->ops && dsi_obj->ops->pasid_alloc) {
+        return dsi_obj->ops->pasid_alloc(dsi_obj, min, max, pasid);
+    }
+    return -ENOENT;
+}
+
+int ds_iommu_pasid_free(DualStageIOMMUObject *dsi_obj, uint32_t pasid)
+{
+    if (!dsi_obj) {
+        return -ENOENT;
+    }
+
+    if (dsi_obj->ops && dsi_obj->ops->pasid_free) {
+        return dsi_obj->ops->pasid_free(dsi_obj, pasid);
+    }
+    return -ENOENT;
+}
+
+void ds_iommu_object_init(DualStageIOMMUObject *dsi_obj,
+                          DualStageIOMMUOps *ops)
+{
+    dsi_obj->ops = ops;
+}
+
+void ds_iommu_object_destroy(DualStageIOMMUObject *dsi_obj)
+{
+    dsi_obj->ops = NULL;
+}
diff --git a/include/hw/iommu/dual_stage_iommu.h b/include/hw/iommu/dual_stage_iommu.h
new file mode 100644
index 0000000..e9891e3
--- /dev/null
+++ b/include/hw/iommu/dual_stage_iommu.h
@@ -0,0 +1,59 @@ 
+/*
+ * QEMU abstraction of IOMMU Context
+ *
+ * Copyright (C) 2020 Red Hat Inc.
+ *
+ * Authors: Liu, Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_DS_IOMMU_H
+#define HW_DS_IOMMU_H
+
+#include "qemu/queue.h"
+#ifndef CONFIG_USER_ONLY
+#include "exec/hwaddr.h"
+#endif
+
+typedef struct DualStageIOMMUObject DualStageIOMMUObject;
+typedef struct DualStageIOMMUOps DualStageIOMMUOps;
+
+struct DualStageIOMMUOps {
+    /* Allocate pasid from DualStageIOMMU (a.k.a. host IOMMU) */
+    int (*pasid_alloc)(DualStageIOMMUObject *dsi_obj,
+                       uint32_t min,
+                       uint32_t max,
+                       uint32_t *pasid);
+    /* Reclaim a pasid from DualStageIOMMU (a.k.a. host IOMMU) */
+    int (*pasid_free)(DualStageIOMMUObject *dsi_obj,
+                      uint32_t pasid);
+};
+
+/*
+ * This is an abstraction of Dual-stage IOMMU.
+ */
+struct DualStageIOMMUObject {
+    DualStageIOMMUOps *ops;
+};
+
+int ds_iommu_pasid_alloc(DualStageIOMMUObject *dsi_obj, uint32_t min,
+                         uint32_t max, uint32_t *pasid);
+int ds_iommu_pasid_free(DualStageIOMMUObject *dsi_obj, uint32_t pasid);
+
+void ds_iommu_object_init(DualStageIOMMUObject *dsi_obj,
+                          DualStageIOMMUOps *ops);
+void ds_iommu_object_destroy(DualStageIOMMUObject *dsi_obj);
+
+#endif