mbox series

[RFC,v2,00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration

Message ID 20180921081819.9203-1-eric.auger@redhat.com
Headers show
Series vSMMUv3/pSMMUv3 2 stage VFIO integration | expand

Message

Eric Auger Sept. 21, 2018, 8:17 a.m. UTC
Up to now vSMMUv3 has not been integrated with VFIO. VFIO
integration requires to program the physical IOMMU consistently
with the guest mappings. However, as opposed to VTD, SMMUv3 has
no "Caching Mode" which allows easy trapping of guest mappings.
This means the vSMMUV3 cannot use the same VFIO integration as VTD.

However SMMUv3 has 2 translation stages. This was devised with
virtualization use case in mind where stage 1 is "owned" by the
guest whereas the host uses stage 2 for VM isolation.

This series sets up this nested translation stage. It only works
if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
other words, it does not work if there is a physical SMMUv2).

The series uses a new kernel user API [1], still under definition.

- We force the host to use stage 2 instead of stage 1, when we
  detect a vSMMUV3 is behind a VFIO device. For a VFIO device
  without any virtual IOMMU, we still use stage 1 as many existing
  SMMUs expect this behavior.
- We introduce new IOTLB "config" notifiers, requested to notify
  changes in the config of a given iommu memory region. So now
  we have notifiers for IOTLB changes and config changes.
- vSMMUv3 calls config notifiers when STE (Stream Table Entries)
  are updated by the guest.
- We implement a specific UNMAP notifier that conveys guest
  IOTLB invalidations to the host
- We implement a new MAP notifiers only used for MSI IOVAs so
  that the host can build a nested stage translation for MSI IOVAs
- As the legacy MAP notifier is not called anymore, we must make
  sure stage 2 mappings are set. This is achieved through another
  memory listener.
- Physical SMMUs faults are reported to the guest via en eventfd
  mechanism and reinjected into this latter.

Note: some iommu memory notifier rework related patches are close
to those previously published by Peter and Liu. I will be pleased to
add their Signed-off-by if they agree/wish.

Note: The 2 first patches were sent separately in:
[PATCH 0/2] ARM SMMUv3: Fix event queue handling and memory region names
 as they are fixes of current SMMU emulation code.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v3.0.0_2stage-rfc-v2

Testing:
- For testing use my kernel branch
  https://github.com/eauger/linux/tree/v4.19-rc4-2stage-rfc-v2 [1]
- Tested on Qualcomm HW with 1/2 assigned e1000e NICs. Hotplug
  was also tested OK.
- Known limitation:
  - currently sending an NH_ASID command instead of NH_VA
    upon guest NH_VA. This may cause important perf downgrade.
    Propagating NH_VA does not work at the moment.

References:
- [1] [RFC v2 00/20] SMMUv3 Nested Stage Setup
  (https://lkml.org/lkml/2018/9/18/1087)
  The User API still is unstable and under discussion.

History:
v1 -> v2:
- Fixed dual assignment (asid now correctly propagated on TLB invalidations)
- Integrated fault reporting

Next Steps:
- Mature the user API with people involved in SVA work (KVM forum may be a
  good opportunity to meet)
- Submit the IOMMU cfg notifier changes and some VFIO changes separately, to
  progress independently on the kernel dependency

Eric Auger (28):
  hw/arm/smmu-common: Fix the name of the iommu memory regions
  hw/arm/smmuv3: fix eventq recording and IRQ triggerring
  update-linux-headers: Import iommu.h
  linux-headers: Partial header update
  memory: add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  hw/arm/smmuv3: Implement get_attr API to report IOMMU_ATTR_VFIO_NESTED
  hw/vfio/common: Refactor container initialization
  hw/vfio/common: Force nested if iommu requires it
  memory: Introduce IOMMUIOLTBNotifier
  memory: rename memory_region notify_iommu, notify_one
  memory: Add IOMMUConfigNotifier
  memory: Add arch_id in IOTLBEntry
  hw/arm/smmuv3: Store s1ctrptr in translation config data
  hw/arm/smmuv3: Implement dummy replay
  hw/arm/smmuv3: Notify on config changes
  hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
  hw/vfio/common: Introduce vfio_alloc_guest_iommu helper
  hw/vfio/common: Introduce vfio_dma_(un)map_ram_section helpers
  hw/vfio/common: Register specific nested mode notifiers and
    memory_listener
  hw/vfio/common: Register MAP notifier for MSI binding
  target/arm/kvm: Notifies IOMMU on MSI stage 1 binding
  vfio/pci: Always set up MSI route before enabling vectors
  hw/arm/smmuv3: Remove warning about unsupported MAP notifiers
  memory: Introduce IOMMU_NOTIFIER_INIT_CFG IOMMU Config Notifier
  memory: Introduce IOMMU Memory Region inject_faults API
  hw/vfio/common: Handle fault_handler
  hw/arm/smmuv3: Init fault handling
  hw/arm/smmuv3: Implement fault injection

 exec.c                          |  12 +-
 hw/arm/smmu-common.c            |  12 +-
 hw/arm/smmuv3-internal.h        |  26 +-
 hw/arm/smmuv3.c                 | 189 +++++++--
 hw/i386/intel_iommu.c           |  16 +-
 hw/misc/tz-mpc.c                |   8 +-
 hw/ppc/spapr_iommu.c            |   2 +-
 hw/s390x/s390-pci-inst.c        |   4 +-
 hw/vfio/common.c                | 666 ++++++++++++++++++++++++--------
 hw/vfio/pci.c                   |   1 +
 hw/vfio/trace-events            |   4 +-
 hw/virtio/vhost.c               |  12 +-
 include/exec/memory.h           | 140 +++++--
 include/hw/arm/smmu-common.h    |   1 +
 include/hw/vfio/vfio-common.h   |   1 +
 linux-headers/linux/iommu.h     | 237 ++++++++++++
 linux-headers/linux/vfio.h      |  48 +++
 memory.c                        |  66 +++-
 scripts/update-linux-headers.sh |   2 +-
 target/arm/kvm.c                |  46 +--
 20 files changed, 1206 insertions(+), 287 deletions(-)
 create mode 100644 linux-headers/linux/iommu.h

Comments

Yi Liu Oct. 18, 2018, 10:30 a.m. UTC | #1
Hi Eric,

> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: Friday, September 21, 2018 4:18 PM
> Subject: [RFC v2 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration
> 
> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
> integration requires to program the physical IOMMU consistently
> with the guest mappings. However, as opposed to VTD, SMMUv3 has
> no "Caching Mode" which allows easy trapping of guest mappings.
> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
> 
> However SMMUv3 has 2 translation stages. This was devised with
> virtualization use case in mind where stage 1 is "owned" by the
> guest whereas the host uses stage 2 for VM isolation.
> 
> This series sets up this nested translation stage. It only works
> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
> other words, it does not work if there is a physical SMMUv2).
> 
> The series uses a new kernel user API [1], still under definition.
> 
> - We force the host to use stage 2 instead of stage 1, when we
>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>   without any virtual IOMMU, we still use stage 1 as many existing
>   SMMUs expect this behavior.
> - We introduce new IOTLB "config" notifiers, requested to notify
>   changes in the config of a given iommu memory region. So now
>   we have notifiers for IOTLB changes and config changes.
> - vSMMUv3 calls config notifiers when STE (Stream Table Entries)
>   are updated by the guest.
> - We implement a specific UNMAP notifier that conveys guest
>   IOTLB invalidations to the host
> - We implement a new MAP notifiers only used for MSI IOVAs so
>   that the host can build a nested stage translation for MSI IOVAs
> - As the legacy MAP notifier is not called anymore, we must make
>   sure stage 2 mappings are set. This is achieved through another
>   memory listener.
> - Physical SMMUs faults are reported to the guest via en eventfd
>   mechanism and reinjected into this latter.
> 
> Note: some iommu memory notifier rework related patches are close
> to those previously published by Peter and Liu. I will be pleased to
> add their Signed-off-by if they agree/wish.

Yeah, feel free to add mine if it's originated from our previous work.

BTW. Curiously, are you planning to implement all vIOMMU related
operations through MemoryRegion notifiers? Honestly, I did it in such
way in early RFC for vSVA work. However, we encountered two issues
at that time. First, how to check whether the notifier should be registered.
Second, there are cases in which the vfio_listener_region_add is not
triggered but vIOMMU exists. Details can be got by the link below:
(http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html)

As thus, we had some discussions in community. Last time, PCIPASIDOps
was proposed. It is to add callbacks in PCIDevice. VFIO would register its
implementations in vfio_realize(). Supposedly, pasid_table_bind,
page_table_bind, sva_tlb_invalidation_passdown and other vIOMMU
related operations can be done in such way. The sample patch below
may show how it looks like. (the full patch is in my sandbox, planned to
send out with Scalable Mode emulation patch).

Sample patch:
include/hw/pci/pci.h:
 +typedef struct PCIPASIDOps PCIPASIDOps;
 +struct PCIPASIDOps {
 +    void (*pasid_bind_table)(PCIBus *bus, int32_t devfn,
 +                            struct pasid_table_config *ptbl_cfg);
 +    void (*pasid_invalidate_extend_iotlb)(PCIBus *bus, int32_t devfn,
 +                            struct tlb_invalidate_info *inv_info);
 +};

 @@ -350,6 +359,7 @@ struct PCIDevice {
      MSIVectorUseNotifier msix_vector_use_notifier;
      MSIVectorReleaseNotifier msix_vector_release_notifier;
      MSIVectorPollNotifier msix_vector_poll_notifier;
 +    PCIPASIDOps *pasid_ops;
  };

hw/pci/pci.c:
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
 +{
 +    assert(ops && !dev->pasid_ops);
 +    dev->pasid_ops = ops;
 +}
 +
 +void pci_device_pasid_bind_table(PCIBus *bus, int32_t devfn,
 +                                 struct pasid_table_config *ptbl_cfg)
 +{
 +    PCIDevice *dev;
 +
 +    if (!bus) {
 +        return;
 +    }
 +
 +    dev = bus->devices[devfn];
 +    if (dev && dev->pasid_ops) {
 +        dev->pasid_ops->pasid_bind_table(bus, devfn, ptbl_cfg);
 +    }
 +}

hw/vfio/pci.c:
+static PCIPASIDOps vfio_pci_pasid_ops = {
 +    .pasid_bind_table = vfio_pci_device_pasid_bind_table,
 +    .pasid_invalidate_extend_iotlb = vfio_pci_device_pasid_invalidate_eiotlb,
 +};
 +
  static void vfio_realize(PCIDevice *pdev, Error **errp)
  {
      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 @@ -3048,6 +3068,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
      vfio_register_req_notifier(vdev);
      vfio_setup_resetfn_quirk(vdev);

 +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
 +
      return;

Regards,
Yi Liu
Eric Auger Oct. 18, 2018, 3:16 p.m. UTC | #2
Hi Yi,

On 10/18/18 12:30 PM, Liu, Yi L wrote:
> Hi Eric,
> 
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: Friday, September 21, 2018 4:18 PM
>> Subject: [RFC v2 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration
>>
>> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
>> integration requires to program the physical IOMMU consistently
>> with the guest mappings. However, as opposed to VTD, SMMUv3 has
>> no "Caching Mode" which allows easy trapping of guest mappings.
>> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
>>
>> However SMMUv3 has 2 translation stages. This was devised with
>> virtualization use case in mind where stage 1 is "owned" by the
>> guest whereas the host uses stage 2 for VM isolation.
>>
>> This series sets up this nested translation stage. It only works
>> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
>> other words, it does not work if there is a physical SMMUv2).
>>
>> The series uses a new kernel user API [1], still under definition.
>>
>> - We force the host to use stage 2 instead of stage 1, when we
>>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>>   without any virtual IOMMU, we still use stage 1 as many existing
>>   SMMUs expect this behavior.
>> - We introduce new IOTLB "config" notifiers, requested to notify
>>   changes in the config of a given iommu memory region. So now
>>   we have notifiers for IOTLB changes and config changes.
>> - vSMMUv3 calls config notifiers when STE (Stream Table Entries)
>>   are updated by the guest.
>> - We implement a specific UNMAP notifier that conveys guest
>>   IOTLB invalidations to the host
>> - We implement a new MAP notifiers only used for MSI IOVAs so
>>   that the host can build a nested stage translation for MSI IOVAs
>> - As the legacy MAP notifier is not called anymore, we must make
>>   sure stage 2 mappings are set. This is achieved through another
>>   memory listener.
>> - Physical SMMUs faults are reported to the guest via en eventfd
>>   mechanism and reinjected into this latter.
>>
>> Note: some iommu memory notifier rework related patches are close
>> to those previously published by Peter and Liu. I will be pleased to
>> add their Signed-off-by if they agree/wish.
> 
> Yeah, feel free to add mine if it's originated from our previous work.
OK
> 
> BTW. Curiously, are you planning to implement all vIOMMU related
> operations through MemoryRegion notifiers? Honestly, I did it in such
> way in early RFC for vSVA work. However, we encountered two issues
> at that time. First, how to check whether the notifier should be registered.
On my side I think I resolved this by querying the iommu mr about the
new IOMMU_ATTR_VFIO_NESTED IOMMU attribute  in vfio_connect_container

See patches 5 and 6, 8

This tells me whether the nested mode must be setup and choose the right
container->iommu_type which is then used in vfio_listener_region_add()
to decide whether the specific notifiers mustd to be registered.


> Second, there are cases in which the vfio_listener_region_add is not
> triggered but vIOMMU exists. Details can be got by the link below:
> (http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html)

Yes I remember this, due to the PT=1 case. On ARM we don't have this
specificity hence this integration.
> 
> As thus, we had some discussions in community. Last time, PCIPASIDOps
> was proposed. It is to add callbacks in PCIDevice. VFIO would register its
> implementations in vfio_realize(). Supposedly, pasid_table_bind,
> page_table_bind, sva_tlb_invalidation_passdown and other vIOMMU
> related operations can be done in such way. The sample patch below
> may show how it looks like. (the full patch is in my sandbox, planned to
> send out with Scalable Mode emulation patch).

To be honest, I lost track of the series and did not see this
PCIPASIDOps proposal. I will study whether this can fit my needs.

Thank you for the link!

Best Regards

Eric
> 
> Sample patch:
> include/hw/pci/pci.h:
>  +typedef struct PCIPASIDOps PCIPASIDOps;
>  +struct PCIPASIDOps {
>  +    void (*pasid_bind_table)(PCIBus *bus, int32_t devfn,
>  +                            struct pasid_table_config *ptbl_cfg);
>  +    void (*pasid_invalidate_extend_iotlb)(PCIBus *bus, int32_t devfn,
>  +                            struct tlb_invalidate_info *inv_info);
>  +};
> 
>  @@ -350,6 +359,7 @@ struct PCIDevice {
>       MSIVectorUseNotifier msix_vector_use_notifier;
>       MSIVectorReleaseNotifier msix_vector_release_notifier;
>       MSIVectorPollNotifier msix_vector_poll_notifier;
>  +    PCIPASIDOps *pasid_ops;
>   };
> 
> hw/pci/pci.c:
> +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
>  +{
>  +    assert(ops && !dev->pasid_ops);
>  +    dev->pasid_ops = ops;
>  +}
>  +
>  +void pci_device_pasid_bind_table(PCIBus *bus, int32_t devfn,
>  +                                 struct pasid_table_config *ptbl_cfg)
>  +{
>  +    PCIDevice *dev;
>  +
>  +    if (!bus) {
>  +        return;
>  +    }
>  +
>  +    dev = bus->devices[devfn];
>  +    if (dev && dev->pasid_ops) {
>  +        dev->pasid_ops->pasid_bind_table(bus, devfn, ptbl_cfg);
>  +    }
>  +}
> 
> hw/vfio/pci.c:
> +static PCIPASIDOps vfio_pci_pasid_ops = {
>  +    .pasid_bind_table = vfio_pci_device_pasid_bind_table,
>  +    .pasid_invalidate_extend_iotlb = vfio_pci_device_pasid_invalidate_eiotlb,
>  +};
>  +
>   static void vfio_realize(PCIDevice *pdev, Error **errp)
>   {
>       VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>  @@ -3048,6 +3068,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>       vfio_register_req_notifier(vdev);
>       vfio_setup_resetfn_quirk(vdev);
> 
>  +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
>  +
>       return;
> 
> Regards,
> Yi Liu
> 
>
Yi Liu Oct. 19, 2018, 8:02 a.m. UTC | #3
Hi Eric,

> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Thursday, October 18, 2018 11:16 PM
> 
> Hi Yi,
> 
> On 10/18/18 12:30 PM, Liu, Yi L wrote:
> > Hi Eric,
> >
> >> From: Eric Auger [mailto:eric.auger@redhat.com]
> >> Sent: Friday, September 21, 2018 4:18 PM
> >> Subject: [RFC v2 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration
> >>
> >> Up to now vSMMUv3 has not been integrated with VFIO. VFIO integration
> >> requires to program the physical IOMMU consistently with the guest
> >> mappings. However, as opposed to VTD, SMMUv3 has no "Caching Mode"
> >> which allows easy trapping of guest mappings.
> >> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
> >>
> >> However SMMUv3 has 2 translation stages. This was devised with
> >> virtualization use case in mind where stage 1 is "owned" by the guest
> >> whereas the host uses stage 2 for VM isolation.
> >>
> >> This series sets up this nested translation stage. It only works if
> >> there is one physical SMMUv3 used along with QEMU vSMMUv3 (in other
> >> words, it does not work if there is a physical SMMUv2).
> >>
> >> The series uses a new kernel user API [1], still under definition.
> >>
> >> - We force the host to use stage 2 instead of stage 1, when we
> >>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
> >>   without any virtual IOMMU, we still use stage 1 as many existing
> >>   SMMUs expect this behavior.
> >> - We introduce new IOTLB "config" notifiers, requested to notify
> >>   changes in the config of a given iommu memory region. So now
> >>   we have notifiers for IOTLB changes and config changes.
> >> - vSMMUv3 calls config notifiers when STE (Stream Table Entries)
> >>   are updated by the guest.
> >> - We implement a specific UNMAP notifier that conveys guest
> >>   IOTLB invalidations to the host
> >> - We implement a new MAP notifiers only used for MSI IOVAs so
> >>   that the host can build a nested stage translation for MSI IOVAs
> >> - As the legacy MAP notifier is not called anymore, we must make
> >>   sure stage 2 mappings are set. This is achieved through another
> >>   memory listener.
> >> - Physical SMMUs faults are reported to the guest via en eventfd
> >>   mechanism and reinjected into this latter.
> >>
> >> Note: some iommu memory notifier rework related patches are close to
> >> those previously published by Peter and Liu. I will be pleased to add
> >> their Signed-off-by if they agree/wish.
> >
> > Yeah, feel free to add mine if it's originated from our previous work.
> OK
> >
> > BTW. Curiously, are you planning to implement all vIOMMU related
> > operations through MemoryRegion notifiers? Honestly, I did it in such
> > way in early RFC for vSVA work. However, we encountered two issues at
> > that time. First, how to check whether the notifier should be registered.
> On my side I think I resolved this by querying the iommu mr about the new
> IOMMU_ATTR_VFIO_NESTED IOMMU attribute  in vfio_connect_container

So it's to detect it by checking iommu mr? I think it is also similar to my early
RFC.
https://patchwork.kernel.org/patch/9701003/

+    /* Check if vIOMMU exists */
+    QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
+        if (memory_region_is_iommu(subregion)) {
+            IOMMUNotifier n1;
+
+            /*
+             FIXME: current iommu notifier is actually designed for
+             IOMMUTLB MAP/UNMAP. However, vIOMMU emulator may need
+             notifiers other than MAP/UNMAP, so it'll be better to
+             split the non-IOMMUTLB notifier from the current IOMMUTLB
+             notifier framewrok.
+             */
+            iommu_notifier_init(&n1, vfio_iommu_bind_pasid_tbl_notify,
+                                IOMMU_NOTIFIER_SVM_PASIDT_BIND,
+                                0,
+                                0);
+            vfio_register_notifier(group->container,
+                                   subregion,
+                                   0,
+                                   &n1);
+        }
+    }

For VT-d, we only have 1 iommu mr. But I was told ther may be multiple iommu mr
on other platform. So I switched to use PCISVAOps. Thoughts?

> See patches 5 and 6, 8
> 
> This tells me whether the nested mode must be setup and choose the right
> container->iommu_type which is then used in vfio_listener_region_add()
> to decide whether the specific notifiers mustd to be registered.
> 
> > Second, there are cases in which the vfio_listener_region_add is not
> > triggered but vIOMMU exists. Details can be got by the link below:
> > (http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html)
> 
> Yes I remember this, due to the PT=1 case. On ARM we don't have this specificity
> hence this integration.
> >
> > As thus, we had some discussions in community. Last time, PCIPASIDOps
> > was proposed. It is to add callbacks in PCIDevice. VFIO would register
> > its implementations in vfio_realize(). Supposedly, pasid_table_bind,
> > page_table_bind, sva_tlb_invalidation_passdown and other vIOMMU
> > related operations can be done in such way. The sample patch below may
> > show how it looks like. (the full patch is in my sandbox, planned to
> > send out with Scalable Mode emulation patch).
> 
> To be honest, I lost track of the series and did not see this PCIPASIDOps proposal. I
> will study whether this can fit my needs.

It's proposed in below link. The name at that time is PCISVAOps. I made it to be
PCIPASIDOps to be more generic.

http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00081.html

Regards,
Yi Liu
Shameerali Kolothum Thodi Nov. 23, 2018, 4:28 p.m. UTC | #4
> -----Original Message-----
> From: Qemu-devel [mailto:qemu-devel-
> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org] On Behalf Of
> Eric Auger
> Sent: 21 September 2018 09:18
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
> devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org
> Cc: yi.l.liu@intel.com; cdall@kernel.org; mst@redhat.com; jean-
> philippe.brucker@arm.com; peterx@redhat.com; alex.williamson@redhat.com
> Subject: [Qemu-devel] [RFC v2 00/28] vSMMUv3/pSMMUv3 2 stage VFIO
> integration
> 

[...]

> This series can be found at:
> https://github.com/eauger/qemu/tree/v3.0.0_2stage-rfc-v2
> 
> Testing:
> - For testing use my kernel branch
>   https://github.com/eauger/linux/tree/v4.19-rc4-2stage-rfc-v2 [1]
> - Tested on Qualcomm HW with 1/2 assigned e1000e NICs. Hotplug
>   was also tested OK.
> - Known limitation:
>   - currently sending an NH_ASID command instead of NH_VA
>     upon guest NH_VA. This may cause important perf downgrade.
>     Propagating NH_VA does not work at the moment.

Hi Eric,

I had a  go with this RFC on one of our platform and noted that when the Guest 
boots with ACPI , the ping doesn't work with the assigned ixgbevf NIC though 
the network device gets detected and added to the iommu group. 

It's all fine when I boot the Guest with DT. Have you tested this with ACPI yet or
am I missing anything obvious here?

Please let me know.

Thanks,
Shameer
 
> References:
> - [1] [RFC v2 00/20] SMMUv3 Nested Stage Setup
>   (https://lkml.org/lkml/2018/9/18/1087)
>   The User API still is unstable and under discussion.
> 
> History:
> v1 -> v2:
> - Fixed dual assignment (asid now correctly propagated on TLB invalidations)
> - Integrated fault reporting
> 
> Next Steps:
> - Mature the user API with people involved in SVA work (KVM forum may be a
>   good opportunity to meet)
> - Submit the IOMMU cfg notifier changes and some VFIO changes separately,
> to
>   progress independently on the kernel dependency
> 
> Eric Auger (28):
>   hw/arm/smmu-common: Fix the name of the iommu memory regions
>   hw/arm/smmuv3: fix eventq recording and IRQ triggerring
>   update-linux-headers: Import iommu.h
>   linux-headers: Partial header update
>   memory: add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
>   hw/arm/smmuv3: Implement get_attr API to report
> IOMMU_ATTR_VFIO_NESTED
>   hw/vfio/common: Refactor container initialization
>   hw/vfio/common: Force nested if iommu requires it
>   memory: Introduce IOMMUIOLTBNotifier
>   memory: rename memory_region notify_iommu, notify_one
>   memory: Add IOMMUConfigNotifier
>   memory: Add arch_id in IOTLBEntry
>   hw/arm/smmuv3: Store s1ctrptr in translation config data
>   hw/arm/smmuv3: Implement dummy replay
>   hw/arm/smmuv3: Notify on config changes
>   hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
>   hw/vfio/common: Introduce vfio_alloc_guest_iommu helper
>   hw/vfio/common: Introduce vfio_dma_(un)map_ram_section helpers
>   hw/vfio/common: Register specific nested mode notifiers and
>     memory_listener
>   hw/vfio/common: Register MAP notifier for MSI binding
>   target/arm/kvm: Notifies IOMMU on MSI stage 1 binding
>   vfio/pci: Always set up MSI route before enabling vectors
>   hw/arm/smmuv3: Remove warning about unsupported MAP notifiers
>   memory: Introduce IOMMU_NOTIFIER_INIT_CFG IOMMU Config Notifier
>   memory: Introduce IOMMU Memory Region inject_faults API
>   hw/vfio/common: Handle fault_handler
>   hw/arm/smmuv3: Init fault handling
>   hw/arm/smmuv3: Implement fault injection
> 
>  exec.c                          |  12 +-
>  hw/arm/smmu-common.c            |  12 +-
>  hw/arm/smmuv3-internal.h        |  26 +-
>  hw/arm/smmuv3.c                 | 189 +++++++--
>  hw/i386/intel_iommu.c           |  16 +-
>  hw/misc/tz-mpc.c                |   8 +-
>  hw/ppc/spapr_iommu.c            |   2 +-
>  hw/s390x/s390-pci-inst.c        |   4 +-
>  hw/vfio/common.c                | 666 ++++++++++++++++++++++++--------
>  hw/vfio/pci.c                   |   1 +
>  hw/vfio/trace-events            |   4 +-
>  hw/virtio/vhost.c               |  12 +-
>  include/exec/memory.h           | 140 +++++--
>  include/hw/arm/smmu-common.h    |   1 +
>  include/hw/vfio/vfio-common.h   |   1 +
>  linux-headers/linux/iommu.h     | 237 ++++++++++++
>  linux-headers/linux/vfio.h      |  48 +++
>  memory.c                        |  66 +++-
>  scripts/update-linux-headers.sh |   2 +-
>  target/arm/kvm.c                |  46 +--
>  20 files changed, 1206 insertions(+), 287 deletions(-)
>  create mode 100644 linux-headers/linux/iommu.h
> 
> --
> 2.17.1
>
Eric Auger Nov. 26, 2018, 9:56 a.m. UTC | #5
Hi Shameer,

On 11/23/18 5:28 PM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Qemu-devel [mailto:qemu-devel-
>> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org] On Behalf Of
>> Eric Auger
>> Sent: 21 September 2018 09:18
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
>> devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org
>> Cc: yi.l.liu@intel.com; cdall@kernel.org; mst@redhat.com; jean-
>> philippe.brucker@arm.com; peterx@redhat.com; alex.williamson@redhat.com
>> Subject: [Qemu-devel] [RFC v2 00/28] vSMMUv3/pSMMUv3 2 stage VFIO
>> integration
>>
> 
> [...]
> 
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v3.0.0_2stage-rfc-v2
>>
>> Testing:
>> - For testing use my kernel branch
>>   https://github.com/eauger/linux/tree/v4.19-rc4-2stage-rfc-v2 [1]
>> - Tested on Qualcomm HW with 1/2 assigned e1000e NICs. Hotplug
>>   was also tested OK.
>> - Known limitation:
>>   - currently sending an NH_ASID command instead of NH_VA
>>     upon guest NH_VA. This may cause important perf downgrade.
>>     Propagating NH_VA does not work at the moment.
> 
> Hi Eric,
> 
> I had a  go with this RFC on one of our platform and noted that when the Guest 
> boots with ACPI , the ping doesn't work with the assigned ixgbevf NIC though 
> the network device gets detected and added to the iommu group. 
> 
> It's all fine when I boot the Guest with DT. Have you tested this with ACPI yet or
> am I missing anything obvious here?

Thank you for reporting. You are not missing anything obvious: I can
reproduce the issue as well on one of my platform. I will debug and come
back to you as soon as possible.

Thanks

Eric
> 
> Please let me know.
> 
> Thanks,
> Shameer
>  
>> References:
>> - [1] [RFC v2 00/20] SMMUv3 Nested Stage Setup
>>   (https://lkml.org/lkml/2018/9/18/1087)
>>   The User API still is unstable and under discussion.
>>
>> History:
>> v1 -> v2:
>> - Fixed dual assignment (asid now correctly propagated on TLB invalidations)
>> - Integrated fault reporting
>>
>> Next Steps:
>> - Mature the user API with people involved in SVA work (KVM forum may be a
>>   good opportunity to meet)
>> - Submit the IOMMU cfg notifier changes and some VFIO changes separately,
>> to
>>   progress independently on the kernel dependency
>>
>> Eric Auger (28):
>>   hw/arm/smmu-common: Fix the name of the iommu memory regions
>>   hw/arm/smmuv3: fix eventq recording and IRQ triggerring
>>   update-linux-headers: Import iommu.h
>>   linux-headers: Partial header update
>>   memory: add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
>>   hw/arm/smmuv3: Implement get_attr API to report
>> IOMMU_ATTR_VFIO_NESTED
>>   hw/vfio/common: Refactor container initialization
>>   hw/vfio/common: Force nested if iommu requires it
>>   memory: Introduce IOMMUIOLTBNotifier
>>   memory: rename memory_region notify_iommu, notify_one
>>   memory: Add IOMMUConfigNotifier
>>   memory: Add arch_id in IOTLBEntry
>>   hw/arm/smmuv3: Store s1ctrptr in translation config data
>>   hw/arm/smmuv3: Implement dummy replay
>>   hw/arm/smmuv3: Notify on config changes
>>   hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
>>   hw/vfio/common: Introduce vfio_alloc_guest_iommu helper
>>   hw/vfio/common: Introduce vfio_dma_(un)map_ram_section helpers
>>   hw/vfio/common: Register specific nested mode notifiers and
>>     memory_listener
>>   hw/vfio/common: Register MAP notifier for MSI binding
>>   target/arm/kvm: Notifies IOMMU on MSI stage 1 binding
>>   vfio/pci: Always set up MSI route before enabling vectors
>>   hw/arm/smmuv3: Remove warning about unsupported MAP notifiers
>>   memory: Introduce IOMMU_NOTIFIER_INIT_CFG IOMMU Config Notifier
>>   memory: Introduce IOMMU Memory Region inject_faults API
>>   hw/vfio/common: Handle fault_handler
>>   hw/arm/smmuv3: Init fault handling
>>   hw/arm/smmuv3: Implement fault injection
>>
>>  exec.c                          |  12 +-
>>  hw/arm/smmu-common.c            |  12 +-
>>  hw/arm/smmuv3-internal.h        |  26 +-
>>  hw/arm/smmuv3.c                 | 189 +++++++--
>>  hw/i386/intel_iommu.c           |  16 +-
>>  hw/misc/tz-mpc.c                |   8 +-
>>  hw/ppc/spapr_iommu.c            |   2 +-
>>  hw/s390x/s390-pci-inst.c        |   4 +-
>>  hw/vfio/common.c                | 666 ++++++++++++++++++++++++--------
>>  hw/vfio/pci.c                   |   1 +
>>  hw/vfio/trace-events            |   4 +-
>>  hw/virtio/vhost.c               |  12 +-
>>  include/exec/memory.h           | 140 +++++--
>>  include/hw/arm/smmu-common.h    |   1 +
>>  include/hw/vfio/vfio-common.h   |   1 +
>>  linux-headers/linux/iommu.h     | 237 ++++++++++++
>>  linux-headers/linux/vfio.h      |  48 +++
>>  memory.c                        |  66 +++-
>>  scripts/update-linux-headers.sh |   2 +-
>>  target/arm/kvm.c                |  46 +--
>>  20 files changed, 1206 insertions(+), 287 deletions(-)
>>  create mode 100644 linux-headers/linux/iommu.h
>>
>> --
>> 2.17.1
>>
> 
>
Eric Auger Nov. 26, 2018, 3:48 p.m. UTC | #6
Hi Shameer,

On 11/23/18 5:28 PM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Qemu-devel [mailto:qemu-devel-
>> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org] On Behalf Of
>> Eric Auger
>> Sent: 21 September 2018 09:18
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
>> devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org
>> Cc: yi.l.liu@intel.com; cdall@kernel.org; mst@redhat.com; jean-
>> philippe.brucker@arm.com; peterx@redhat.com; alex.williamson@redhat.com
>> Subject: [Qemu-devel] [RFC v2 00/28] vSMMUv3/pSMMUv3 2 stage VFIO
>> integration
>>
> 
> [...]
> 
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v3.0.0_2stage-rfc-v2
>>
>> Testing:
>> - For testing use my kernel branch
>>   https://github.com/eauger/linux/tree/v4.19-rc4-2stage-rfc-v2 [1]
>> - Tested on Qualcomm HW with 1/2 assigned e1000e NICs. Hotplug
>>   was also tested OK.
>> - Known limitation:
>>   - currently sending an NH_ASID command instead of NH_VA
>>     upon guest NH_VA. This may cause important perf downgrade.
>>     Propagating NH_VA does not work at the moment.
> 
> Hi Eric,
> 
> I had a  go with this RFC on one of our platform and noted that when the Guest 
> boots with ACPI , the ping doesn't work with the assigned ixgbevf NIC though 
> the network device gets detected and added to the iommu group. 
> 
> It's all fine when I boot the Guest with DT. Have you tested this with ACPI yet or
> am I missing anything obvious here?

I just submitted the following QEMU patch:

[PATCH for-3.1] hw/arm/virt-acpi-build: Fix SMMUv3 ACPI integration

Please can you apply it and test it?

Thank you in advance

Best Regards

Eric
> 
> Please let me know.
> 
> Thanks,
> Shameer
>  
>> References:
>> - [1] [RFC v2 00/20] SMMUv3 Nested Stage Setup
>>   (https://lkml.org/lkml/2018/9/18/1087)
>>   The User API still is unstable and under discussion.
>>
>> History:
>> v1 -> v2:
>> - Fixed dual assignment (asid now correctly propagated on TLB invalidations)
>> - Integrated fault reporting
>>
>> Next Steps:
>> - Mature the user API with people involved in SVA work (KVM forum may be a
>>   good opportunity to meet)
>> - Submit the IOMMU cfg notifier changes and some VFIO changes separately,
>> to
>>   progress independently on the kernel dependency
>>
>> Eric Auger (28):
>>   hw/arm/smmu-common: Fix the name of the iommu memory regions
>>   hw/arm/smmuv3: fix eventq recording and IRQ triggerring
>>   update-linux-headers: Import iommu.h
>>   linux-headers: Partial header update
>>   memory: add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
>>   hw/arm/smmuv3: Implement get_attr API to report
>> IOMMU_ATTR_VFIO_NESTED
>>   hw/vfio/common: Refactor container initialization
>>   hw/vfio/common: Force nested if iommu requires it
>>   memory: Introduce IOMMUIOLTBNotifier
>>   memory: rename memory_region notify_iommu, notify_one
>>   memory: Add IOMMUConfigNotifier
>>   memory: Add arch_id in IOTLBEntry
>>   hw/arm/smmuv3: Store s1ctrptr in translation config data
>>   hw/arm/smmuv3: Implement dummy replay
>>   hw/arm/smmuv3: Notify on config changes
>>   hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
>>   hw/vfio/common: Introduce vfio_alloc_guest_iommu helper
>>   hw/vfio/common: Introduce vfio_dma_(un)map_ram_section helpers
>>   hw/vfio/common: Register specific nested mode notifiers and
>>     memory_listener
>>   hw/vfio/common: Register MAP notifier for MSI binding
>>   target/arm/kvm: Notifies IOMMU on MSI stage 1 binding
>>   vfio/pci: Always set up MSI route before enabling vectors
>>   hw/arm/smmuv3: Remove warning about unsupported MAP notifiers
>>   memory: Introduce IOMMU_NOTIFIER_INIT_CFG IOMMU Config Notifier
>>   memory: Introduce IOMMU Memory Region inject_faults API
>>   hw/vfio/common: Handle fault_handler
>>   hw/arm/smmuv3: Init fault handling
>>   hw/arm/smmuv3: Implement fault injection
>>
>>  exec.c                          |  12 +-
>>  hw/arm/smmu-common.c            |  12 +-
>>  hw/arm/smmuv3-internal.h        |  26 +-
>>  hw/arm/smmuv3.c                 | 189 +++++++--
>>  hw/i386/intel_iommu.c           |  16 +-
>>  hw/misc/tz-mpc.c                |   8 +-
>>  hw/ppc/spapr_iommu.c            |   2 +-
>>  hw/s390x/s390-pci-inst.c        |   4 +-
>>  hw/vfio/common.c                | 666 ++++++++++++++++++++++++--------
>>  hw/vfio/pci.c                   |   1 +
>>  hw/vfio/trace-events            |   4 +-
>>  hw/virtio/vhost.c               |  12 +-
>>  include/exec/memory.h           | 140 +++++--
>>  include/hw/arm/smmu-common.h    |   1 +
>>  include/hw/vfio/vfio-common.h   |   1 +
>>  linux-headers/linux/iommu.h     | 237 ++++++++++++
>>  linux-headers/linux/vfio.h      |  48 +++
>>  memory.c                        |  66 +++-
>>  scripts/update-linux-headers.sh |   2 +-
>>  target/arm/kvm.c                |  46 +--
>>  20 files changed, 1206 insertions(+), 287 deletions(-)
>>  create mode 100644 linux-headers/linux/iommu.h
>>
>> --
>> 2.17.1
>>
> 
>