diff mbox series

[3/3] intel-iommu: build iova tree during IOMMU translation

Message ID 20221129081037.12099-4-jasowang@redhat.com
State New
Headers show
Series Fix UNMAP notifier for intel-iommu | expand

Commit Message

Jason Wang Nov. 29, 2022, 8:10 a.m. UTC
The IOVA tree is only built during page walk this breaks the device
that tries to use UNMAP notifier only. One example is vhost-net, it
tries to use UNMAP notifier when vIOMMU doesn't support DEVIOTLB_UNMAP
notifier (e.g when dt mode is not enabled). The interesting part is
that it doesn't use MAP since it can query the IOMMU translation by
itself upon a IOTLB miss.

This doesn't work since Qemu doesn't build IOVA tree in IOMMU
translation which means the UNMAP notifier won't be triggered during
the page walk since Qemu think it is never mapped. This could be
noticed when vIOMMU is used with vhost_net but dt is disabled.

Fixing this by build the iova tree during IOMMU translation, this
makes sure the UNMAP notifier event could be identified during page
walk. And we need to walk page table not only for UNMAP notifier but
for MAP notifier during PSI.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c | 43 ++++++++++++++++++-------------------------
 1 file changed, 18 insertions(+), 25 deletions(-)

Comments

Peter Xu Nov. 29, 2022, 3:57 p.m. UTC | #1
On Tue, Nov 29, 2022 at 04:10:37PM +0800, Jason Wang wrote:
> The IOVA tree is only built during page walk this breaks the device
> that tries to use UNMAP notifier only. One example is vhost-net, it
> tries to use UNMAP notifier when vIOMMU doesn't support DEVIOTLB_UNMAP
> notifier (e.g when dt mode is not enabled). The interesting part is
> that it doesn't use MAP since it can query the IOMMU translation by
> itself upon a IOTLB miss.
> 
> This doesn't work since Qemu doesn't build IOVA tree in IOMMU
> translation which means the UNMAP notifier won't be triggered during
> the page walk since Qemu think it is never mapped. This could be
> noticed when vIOMMU is used with vhost_net but dt is disabled.
> 
> Fixing this by build the iova tree during IOMMU translation, this
> makes sure the UNMAP notifier event could be identified during page
> walk. And we need to walk page table not only for UNMAP notifier but
> for MAP notifier during PSI.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 43 ++++++++++++++++++-------------------------
>  1 file changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index d025ef2873..edeb62f4b2 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1834,6 +1834,8 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>      uint8_t access_flags;
>      bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
>      VTDIOTLBEntry *iotlb_entry;
> +    const DMAMap *mapped;
> +    DMAMap target;
>  
>      /*
>       * We have standalone memory region for interrupt addresses, we
> @@ -1954,6 +1956,21 @@ out:
>      entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
>      entry->addr_mask = ~page_mask;
>      entry->perm = access_flags;
> +
> +    target.iova = entry->iova;
> +    target.size = entry->addr_mask;
> +    target.translated_addr = entry->translated_addr;
> +    target.perm = entry->perm;
> +
> +    mapped = iova_tree_find(vtd_as->iova_tree, &target);
> +    if (!mapped) {
> +        /* To make UNMAP notifier work, we need build iova tree here
> +         * in order to have the UNMAP iommu notifier to be triggered
> +         * during the page walk.
> +         */
> +        iova_tree_insert(vtd_as->iova_tree, &target);
> +    }
> +
>      return true;
>  
>  error:
> @@ -2161,31 +2178,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
>          ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
>                                         vtd_as->devfn, &ce);
>          if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
> -            if (vtd_as_has_map_notifier(vtd_as)) {
> -                /*
> -                 * As long as we have MAP notifications registered in
> -                 * any of our IOMMU notifiers, we need to sync the
> -                 * shadow page table.
> -                 */
> -                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
> -            } else {
> -                /*
> -                 * For UNMAP-only notifiers, we don't need to walk the
> -                 * page tables.  We just deliver the PSI down to
> -                 * invalidate caches.
> -                 */
> -                IOMMUTLBEvent event = {
> -                    .type = IOMMU_NOTIFIER_UNMAP,
> -                    .entry = {
> -                        .target_as = &address_space_memory,
> -                        .iova = addr,
> -                        .translated_addr = 0,
> -                        .addr_mask = size - 1,
> -                        .perm = IOMMU_NONE,
> -                    },
> -                };
> -                memory_region_notify_iommu(&vtd_as->iommu, 0, event);

Isn't this path the one that will be responsible for pass-through the UNMAP
events from guest to vhost when there's no MAP notifier requested?

At least that's what I expected when introducing the iova tree, because for
unmap-only device hierachy I thought we didn't need the tree at all.

Jason, do you know where I miss?

Thanks,

> -            }
> +            vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
>          }
>      }
>  }
> -- 
> 2.25.1
>
Jason Wang Nov. 30, 2022, 6:33 a.m. UTC | #2
On Tue, Nov 29, 2022 at 11:57 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Nov 29, 2022 at 04:10:37PM +0800, Jason Wang wrote:
> > The IOVA tree is only built during page walk this breaks the device
> > that tries to use UNMAP notifier only. One example is vhost-net, it
> > tries to use UNMAP notifier when vIOMMU doesn't support DEVIOTLB_UNMAP
> > notifier (e.g when dt mode is not enabled). The interesting part is
> > that it doesn't use MAP since it can query the IOMMU translation by
> > itself upon a IOTLB miss.
> >
> > This doesn't work since Qemu doesn't build IOVA tree in IOMMU
> > translation which means the UNMAP notifier won't be triggered during
> > the page walk since Qemu think it is never mapped. This could be
> > noticed when vIOMMU is used with vhost_net but dt is disabled.
> >
> > Fixing this by build the iova tree during IOMMU translation, this
> > makes sure the UNMAP notifier event could be identified during page
> > walk. And we need to walk page table not only for UNMAP notifier but
> > for MAP notifier during PSI.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c | 43 ++++++++++++++++++-------------------------
> >  1 file changed, 18 insertions(+), 25 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index d025ef2873..edeb62f4b2 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -1834,6 +1834,8 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> >      uint8_t access_flags;
> >      bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
> >      VTDIOTLBEntry *iotlb_entry;
> > +    const DMAMap *mapped;
> > +    DMAMap target;
> >
> >      /*
> >       * We have standalone memory region for interrupt addresses, we
> > @@ -1954,6 +1956,21 @@ out:
> >      entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
> >      entry->addr_mask = ~page_mask;
> >      entry->perm = access_flags;
> > +
> > +    target.iova = entry->iova;
> > +    target.size = entry->addr_mask;
> > +    target.translated_addr = entry->translated_addr;
> > +    target.perm = entry->perm;
> > +
> > +    mapped = iova_tree_find(vtd_as->iova_tree, &target);
> > +    if (!mapped) {
> > +        /* To make UNMAP notifier work, we need build iova tree here
> > +         * in order to have the UNMAP iommu notifier to be triggered
> > +         * during the page walk.
> > +         */
> > +        iova_tree_insert(vtd_as->iova_tree, &target);
> > +    }
> > +
> >      return true;
> >
> >  error:
> > @@ -2161,31 +2178,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> >          ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> >                                         vtd_as->devfn, &ce);
> >          if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
> > -            if (vtd_as_has_map_notifier(vtd_as)) {
> > -                /*
> > -                 * As long as we have MAP notifications registered in
> > -                 * any of our IOMMU notifiers, we need to sync the
> > -                 * shadow page table.
> > -                 */
> > -                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
> > -            } else {
> > -                /*
> > -                 * For UNMAP-only notifiers, we don't need to walk the
> > -                 * page tables.  We just deliver the PSI down to
> > -                 * invalidate caches.
> > -                 */
> > -                IOMMUTLBEvent event = {
> > -                    .type = IOMMU_NOTIFIER_UNMAP,
> > -                    .entry = {
> > -                        .target_as = &address_space_memory,
> > -                        .iova = addr,
> > -                        .translated_addr = 0,
> > -                        .addr_mask = size - 1,
> > -                        .perm = IOMMU_NONE,
> > -                    },
> > -                };
> > -                memory_region_notify_iommu(&vtd_as->iommu, 0, event);
>
> Isn't this path the one that will be responsible for pass-through the UNMAP
> events from guest to vhost when there's no MAP notifier requested?

Yes, but it doesn't do the iova tree removing. More below.

>
> At least that's what I expected when introducing the iova tree, because for
> unmap-only device hierachy I thought we didn't need the tree at all.

Then the problem is the UNMAP notifier won't be trigger at all during
DSI page walk in vtd_page_walk_one() because there's no DMAMap stored
in the iova tree.:

        if (!mapped) {
            /* Skip since we didn't map this range at all */
            trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
            return 0;
        }

So I choose to build the iova tree in translate then we won't go
within the above condition.

Thanks

>
> Jason, do you know where I miss?
>
> Thanks,
>
> > -            }
> > +            vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
> >          }
> >      }
> >  }
> > --
> > 2.25.1
> >
>
> --
> Peter Xu
>
Peter Xu Nov. 30, 2022, 3:17 p.m. UTC | #3
On Wed, Nov 30, 2022 at 02:33:51PM +0800, Jason Wang wrote:
> On Tue, Nov 29, 2022 at 11:57 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Nov 29, 2022 at 04:10:37PM +0800, Jason Wang wrote:
> > > The IOVA tree is only built during page walk this breaks the device
> > > that tries to use UNMAP notifier only. One example is vhost-net, it
> > > tries to use UNMAP notifier when vIOMMU doesn't support DEVIOTLB_UNMAP
> > > notifier (e.g when dt mode is not enabled). The interesting part is
> > > that it doesn't use MAP since it can query the IOMMU translation by
> > > itself upon a IOTLB miss.
> > >
> > > This doesn't work since Qemu doesn't build IOVA tree in IOMMU
> > > translation which means the UNMAP notifier won't be triggered during
> > > the page walk since Qemu think it is never mapped. This could be
> > > noticed when vIOMMU is used with vhost_net but dt is disabled.
> > >
> > > Fixing this by build the iova tree during IOMMU translation, this
> > > makes sure the UNMAP notifier event could be identified during page
> > > walk. And we need to walk page table not only for UNMAP notifier but
> > > for MAP notifier during PSI.
> > >
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  hw/i386/intel_iommu.c | 43 ++++++++++++++++++-------------------------
> > >  1 file changed, 18 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index d025ef2873..edeb62f4b2 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -1834,6 +1834,8 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > >      uint8_t access_flags;
> > >      bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
> > >      VTDIOTLBEntry *iotlb_entry;
> > > +    const DMAMap *mapped;
> > > +    DMAMap target;
> > >
> > >      /*
> > >       * We have standalone memory region for interrupt addresses, we
> > > @@ -1954,6 +1956,21 @@ out:
> > >      entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
> > >      entry->addr_mask = ~page_mask;
> > >      entry->perm = access_flags;
> > > +
> > > +    target.iova = entry->iova;
> > > +    target.size = entry->addr_mask;
> > > +    target.translated_addr = entry->translated_addr;
> > > +    target.perm = entry->perm;
> > > +
> > > +    mapped = iova_tree_find(vtd_as->iova_tree, &target);
> > > +    if (!mapped) {
> > > +        /* To make UNMAP notifier work, we need build iova tree here
> > > +         * in order to have the UNMAP iommu notifier to be triggered
> > > +         * during the page walk.
> > > +         */
> > > +        iova_tree_insert(vtd_as->iova_tree, &target);
> > > +    }
> > > +
> > >      return true;
> > >
> > >  error:
> > > @@ -2161,31 +2178,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > >          ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > >                                         vtd_as->devfn, &ce);
> > >          if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
> > > -            if (vtd_as_has_map_notifier(vtd_as)) {
> > > -                /*
> > > -                 * As long as we have MAP notifications registered in
> > > -                 * any of our IOMMU notifiers, we need to sync the
> > > -                 * shadow page table.
> > > -                 */
> > > -                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
> > > -            } else {
> > > -                /*
> > > -                 * For UNMAP-only notifiers, we don't need to walk the
> > > -                 * page tables.  We just deliver the PSI down to
> > > -                 * invalidate caches.
> > > -                 */
> > > -                IOMMUTLBEvent event = {
> > > -                    .type = IOMMU_NOTIFIER_UNMAP,
> > > -                    .entry = {
> > > -                        .target_as = &address_space_memory,
> > > -                        .iova = addr,
> > > -                        .translated_addr = 0,
> > > -                        .addr_mask = size - 1,
> > > -                        .perm = IOMMU_NONE,
> > > -                    },
> > > -                };
> > > -                memory_region_notify_iommu(&vtd_as->iommu, 0, event);
> >
> > Isn't this path the one that will be responsible for pass-through the UNMAP
> > events from guest to vhost when there's no MAP notifier requested?
> 
> Yes, but it doesn't do the iova tree removing. More below.
> 
> >
> > At least that's what I expected when introducing the iova tree, because for
> > unmap-only device hierachy I thought we didn't need the tree at all.
> 
> Then the problem is the UNMAP notifier won't be trigger at all during
> DSI page walk in vtd_page_walk_one() because there's no DMAMap stored
> in the iova tree.:
> 
>         if (!mapped) {
>             /* Skip since we didn't map this range at all */
>             trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
>             return 0;
>         }
> 
> So I choose to build the iova tree in translate then we won't go
> within the above condition.

That's also why it's weird because IIUC we should never walk a page table
at all if there's no MAP notifier regiestered.

When I'm looking at the walk callers I found that indeed there's one path
missing where can cause it to actually walk the pgtables without !MAP, then
I also noticed commit f7701e2c7983b6, and I'm wondering what we really want
is something like this:

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a08ee85edf..c46f3db992 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1536,7 +1536,7 @@ static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
     VTDContextEntry ce;
     IOMMUNotifier *n;

-    if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
+    if (!vtd_as_has_map_notifier(vtd_as)) {
         return 0;
     }

So I'm not sure whether this patch is the problem resolver; so far I feel
like it's patch 2 who does the real fix.  Then we can have the above
oneliner so we stop any walks when there's no map notifiers.

Thanks,
Jason Wang Dec. 1, 2022, 8:35 a.m. UTC | #4
On Wed, Nov 30, 2022 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Nov 30, 2022 at 02:33:51PM +0800, Jason Wang wrote:
> > On Tue, Nov 29, 2022 at 11:57 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Tue, Nov 29, 2022 at 04:10:37PM +0800, Jason Wang wrote:
> > > > The IOVA tree is only built during page walk this breaks the device
> > > > that tries to use UNMAP notifier only. One example is vhost-net, it
> > > > tries to use UNMAP notifier when vIOMMU doesn't support DEVIOTLB_UNMAP
> > > > notifier (e.g when dt mode is not enabled). The interesting part is
> > > > that it doesn't use MAP since it can query the IOMMU translation by
> > > > itself upon a IOTLB miss.
> > > >
> > > > This doesn't work since Qemu doesn't build IOVA tree in IOMMU
> > > > translation which means the UNMAP notifier won't be triggered during
> > > > the page walk since Qemu think it is never mapped. This could be
> > > > noticed when vIOMMU is used with vhost_net but dt is disabled.
> > > >
> > > > Fixing this by build the iova tree during IOMMU translation, this
> > > > makes sure the UNMAP notifier event could be identified during page
> > > > walk. And we need to walk page table not only for UNMAP notifier but
> > > > for MAP notifier during PSI.
> > > >
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  hw/i386/intel_iommu.c | 43 ++++++++++++++++++-------------------------
> > > >  1 file changed, 18 insertions(+), 25 deletions(-)
> > > >
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > index d025ef2873..edeb62f4b2 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -1834,6 +1834,8 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > >      uint8_t access_flags;
> > > >      bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
> > > >      VTDIOTLBEntry *iotlb_entry;
> > > > +    const DMAMap *mapped;
> > > > +    DMAMap target;
> > > >
> > > >      /*
> > > >       * We have standalone memory region for interrupt addresses, we
> > > > @@ -1954,6 +1956,21 @@ out:
> > > >      entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
> > > >      entry->addr_mask = ~page_mask;
> > > >      entry->perm = access_flags;
> > > > +
> > > > +    target.iova = entry->iova;
> > > > +    target.size = entry->addr_mask;
> > > > +    target.translated_addr = entry->translated_addr;
> > > > +    target.perm = entry->perm;
> > > > +
> > > > +    mapped = iova_tree_find(vtd_as->iova_tree, &target);
> > > > +    if (!mapped) {
> > > > +        /* To make UNMAP notifier work, we need build iova tree here
> > > > +         * in order to have the UNMAP iommu notifier to be triggered
> > > > +         * during the page walk.
> > > > +         */
> > > > +        iova_tree_insert(vtd_as->iova_tree, &target);
> > > > +    }
> > > > +
> > > >      return true;
> > > >
> > > >  error:
> > > > @@ -2161,31 +2178,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > > >          ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > > >                                         vtd_as->devfn, &ce);
> > > >          if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
> > > > -            if (vtd_as_has_map_notifier(vtd_as)) {
> > > > -                /*
> > > > -                 * As long as we have MAP notifications registered in
> > > > -                 * any of our IOMMU notifiers, we need to sync the
> > > > -                 * shadow page table.
> > > > -                 */
> > > > -                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
> > > > -            } else {
> > > > -                /*
> > > > -                 * For UNMAP-only notifiers, we don't need to walk the
> > > > -                 * page tables.  We just deliver the PSI down to
> > > > -                 * invalidate caches.
> > > > -                 */
> > > > -                IOMMUTLBEvent event = {
> > > > -                    .type = IOMMU_NOTIFIER_UNMAP,
> > > > -                    .entry = {
> > > > -                        .target_as = &address_space_memory,
> > > > -                        .iova = addr,
> > > > -                        .translated_addr = 0,
> > > > -                        .addr_mask = size - 1,
> > > > -                        .perm = IOMMU_NONE,
> > > > -                    },
> > > > -                };
> > > > -                memory_region_notify_iommu(&vtd_as->iommu, 0, event);
> > >
> > > Isn't this path the one that will be responsible for pass-through the UNMAP
> > > events from guest to vhost when there's no MAP notifier requested?
> >
> > Yes, but it doesn't do the iova tree removing. More below.
> >
> > >
> > > At least that's what I expected when introducing the iova tree, because for
> > > unmap-only device hierachy I thought we didn't need the tree at all.
> >
> > Then the problem is the UNMAP notifier won't be trigger at all during
> > DSI page walk in vtd_page_walk_one() because there's no DMAMap stored
> > in the iova tree.:
> >
> >         if (!mapped) {
> >             /* Skip since we didn't map this range at all */
> >             trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
> >             return 0;
> >         }
> >
> > So I choose to build the iova tree in translate then we won't go
> > within the above condition.
>
> That's also why it's weird because IIUC we should never walk a page table
> at all if there's no MAP notifier regiestered.

If this is true, we probably need to document this somewhere.

>
> When I'm looking at the walk callers I found that indeed there's one path
> missing where can cause it to actually walk the pgtables without !MAP, then
> I also noticed commit f7701e2c7983b6, and I'm wondering what we really want
> is something like this:
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index a08ee85edf..c46f3db992 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1536,7 +1536,7 @@ static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
>      VTDContextEntry ce;
>      IOMMUNotifier *n;
>
> -    if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
> +    if (!vtd_as_has_map_notifier(vtd_as)) {
>          return 0;
>      }
>
> So I'm not sure whether this patch is the problem resolver; so far I feel
> like it's patch 2 who does the real fix.  Then we can have the above
> oneliner so we stop any walks when there's no map notifiers.
>
> Thanks,

I may miss something but as state above, the problem is a missing
UNMAP notification during DSI when there's only UNMAP notifier.

To solve it we might have two ways:

1) build the iova tree during iommu translation then we can correctly
trigger UNMAP during page walk caused by DSI
2) don't do the iova tree walk for !MAP notifier, need new logic to
trigger UNMAP notifier in PSI/DSI

This patch choose to go 1) (which seems easier at least for -stable).
Do you mean you prefer to go with 2)?

Thanks

>
> --
> Peter Xu
>
Peter Xu Dec. 1, 2022, 2:58 p.m. UTC | #5
On Thu, Dec 01, 2022 at 04:35:48PM +0800, Jason Wang wrote:
> On Wed, Nov 30, 2022 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Nov 30, 2022 at 02:33:51PM +0800, Jason Wang wrote:
> > > On Tue, Nov 29, 2022 at 11:57 PM Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > On Tue, Nov 29, 2022 at 04:10:37PM +0800, Jason Wang wrote:
> > > > > The IOVA tree is only built during page walk this breaks the device
> > > > > that tries to use UNMAP notifier only. One example is vhost-net, it
> > > > > tries to use UNMAP notifier when vIOMMU doesn't support DEVIOTLB_UNMAP
> > > > > notifier (e.g when dt mode is not enabled). The interesting part is
> > > > > that it doesn't use MAP since it can query the IOMMU translation by
> > > > > itself upon a IOTLB miss.
> > > > >
> > > > > This doesn't work since Qemu doesn't build IOVA tree in IOMMU
> > > > > translation which means the UNMAP notifier won't be triggered during
> > > > > the page walk since Qemu think it is never mapped. This could be
> > > > > noticed when vIOMMU is used with vhost_net but dt is disabled.
> > > > >
> > > > > Fixing this by build the iova tree during IOMMU translation, this
> > > > > makes sure the UNMAP notifier event could be identified during page
> > > > > walk. And we need to walk page table not only for UNMAP notifier but
> > > > > for MAP notifier during PSI.
> > > > >
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > ---
> > > > >  hw/i386/intel_iommu.c | 43 ++++++++++++++++++-------------------------
> > > > >  1 file changed, 18 insertions(+), 25 deletions(-)
> > > > >
> > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > > index d025ef2873..edeb62f4b2 100644
> > > > > --- a/hw/i386/intel_iommu.c
> > > > > +++ b/hw/i386/intel_iommu.c
> > > > > @@ -1834,6 +1834,8 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > > >      uint8_t access_flags;
> > > > >      bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
> > > > >      VTDIOTLBEntry *iotlb_entry;
> > > > > +    const DMAMap *mapped;
> > > > > +    DMAMap target;
> > > > >
> > > > >      /*
> > > > >       * We have standalone memory region for interrupt addresses, we
> > > > > @@ -1954,6 +1956,21 @@ out:
> > > > >      entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
> > > > >      entry->addr_mask = ~page_mask;
> > > > >      entry->perm = access_flags;
> > > > > +
> > > > > +    target.iova = entry->iova;
> > > > > +    target.size = entry->addr_mask;
> > > > > +    target.translated_addr = entry->translated_addr;
> > > > > +    target.perm = entry->perm;
> > > > > +
> > > > > +    mapped = iova_tree_find(vtd_as->iova_tree, &target);
> > > > > +    if (!mapped) {
> > > > > +        /* To make UNMAP notifier work, we need build iova tree here
> > > > > +         * in order to have the UNMAP iommu notifier to be triggered
> > > > > +         * during the page walk.
> > > > > +         */
> > > > > +        iova_tree_insert(vtd_as->iova_tree, &target);
> > > > > +    }
> > > > > +
> > > > >      return true;
> > > > >
> > > > >  error:
> > > > > @@ -2161,31 +2178,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > > > >          ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > > > >                                         vtd_as->devfn, &ce);
> > > > >          if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
> > > > > -            if (vtd_as_has_map_notifier(vtd_as)) {
> > > > > -                /*
> > > > > -                 * As long as we have MAP notifications registered in
> > > > > -                 * any of our IOMMU notifiers, we need to sync the
> > > > > -                 * shadow page table.
> > > > > -                 */
> > > > > -                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
> > > > > -            } else {
> > > > > -                /*
> > > > > -                 * For UNMAP-only notifiers, we don't need to walk the
> > > > > -                 * page tables.  We just deliver the PSI down to
> > > > > -                 * invalidate caches.
> > > > > -                 */
> > > > > -                IOMMUTLBEvent event = {
> > > > > -                    .type = IOMMU_NOTIFIER_UNMAP,
> > > > > -                    .entry = {
> > > > > -                        .target_as = &address_space_memory,
> > > > > -                        .iova = addr,
> > > > > -                        .translated_addr = 0,
> > > > > -                        .addr_mask = size - 1,
> > > > > -                        .perm = IOMMU_NONE,
> > > > > -                    },
> > > > > -                };
> > > > > -                memory_region_notify_iommu(&vtd_as->iommu, 0, event);

[1]

> > > >
> > > > Isn't this path the one that will be responsible for pass-through the UNMAP
> > > > events from guest to vhost when there's no MAP notifier requested?
> > >
> > > Yes, but it doesn't do the iova tree removing. More below.
> > >
> > > >
> > > > At least that's what I expected when introducing the iova tree, because for
> > > > unmap-only device hierachy I thought we didn't need the tree at all.
> > >
> > > Then the problem is the UNMAP notifier won't be trigger at all during
> > > DSI page walk in vtd_page_walk_one() because there's no DMAMap stored
> > > in the iova tree.:
> > >
> > >         if (!mapped) {
> > >             /* Skip since we didn't map this range at all */
> > >             trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
> > >             return 0;
> > >         }
> > >
> > > So I choose to build the iova tree in translate then we won't go
> > > within the above condition.
> >
> > That's also why it's weird because IIUC we should never walk a page table
> > at all if there's no MAP notifier regiestered.
> 
> If this is true, we probably need to document this somewhere.

Agree.  I'll post a patch.

> 
> >
> > When I'm looking at the walk callers I found that indeed there's one path
> > missing where can cause it to actually walk the pgtables without !MAP, then
> > I also noticed commit f7701e2c7983b6, and I'm wondering what we really want
> > is something like this:
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index a08ee85edf..c46f3db992 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -1536,7 +1536,7 @@ static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> >      VTDContextEntry ce;
> >      IOMMUNotifier *n;
> >
> > -    if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
> > +    if (!vtd_as_has_map_notifier(vtd_as)) {
> >          return 0;
> >      }
> >
> > So I'm not sure whether this patch is the problem resolver; so far I feel
> > like it's patch 2 who does the real fix.  Then we can have the above
> > oneliner so we stop any walks when there's no map notifiers.
> >
> > Thanks,
> 
> I may miss something but as state above, the problem is a missing
> UNMAP notification during DSI when there's only UNMAP notifier.

I got confused too on why we didn't notify UNMAP for DSI already, that's so
weird because I thought it should be there or it should be broken for a
long time.. as we discussed multiple times around this one:

		/*
		 * Fallback to domain selective flush if no PSI support or
		 * the size is too big.
		 */
		if (!cap_pgsel_inv(iommu->cap) ||
		    mask > cap_max_amask_val(iommu->cap))
			iommu->flush.flush_iotlb(iommu, did, 0, 0,
							DMA_TLB_DSI_FLUSH);
		else
			iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
							DMA_TLB_PSI_FLUSH);

I guess we were just always lucky?..

> 
> To solve it we might have two ways:
> 
> 1) build the iova tree during iommu translation then we can correctly
> trigger UNMAP during page walk caused by DSI
> 2) don't do the iova tree walk for !MAP notifier, need new logic to
> trigger UNMAP notifier in PSI/DSI
> 
> This patch choose to go 1) (which seems easier at least for -stable).
> Do you mean you prefer to go with 2)?

Yes.

IOVA tree is unnecessary overhead IMHO because UNMAP (both IOTLB or
DEVIOTLB unmap) shouldn't need that complexity at all.  Using the iova tree
can be accurate on which page got unmapped when the kernel driver used DSI
for a large PSI as shown above, however IMHO it needs more justification
that the pgtable walk is worth the effort.  Not to mention if a device in
QEMU that wants to use the iova tree for some reason, one can just register
with MAP and ignore all MAP events, while by default we keep UNMAP simple.

So we could do:

  (1) Rename vtd_sync_shadow_page_table() to vtd_sync_domain()
  (2) instead of optimizing dev-iotlb only there:
  
      if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
          return 0;
      }

      we should firstly check if UNMAP or DEVUNMAP registered, we directly
      send a notification to the whole domain.  We need to choose the event
      that the register happens with but not both.

This also reminded me that whether we should sanity check on iommu
notifiers on some invalid cases.  E.g. it seems to me when registered with
DEVIOTLB_UNMAP it should not register with either MAP or UNMAP anymore or
it doesn't make sense.

One step further, I'm wondering whether the DEV_IOTLB event should exist at
all.  Maybe we want to have DEV_IOTLB typed iommu notifier only, but then
when we got dev-iotlb PSI/DSI we notify with type=UNMAP just like normal
UNMAP events, then in above (2) we can send UNMAP constantly as long as
!MAP.  But even if so that'll just be another optional change on top.

Thanks,
Jason Wang Dec. 5, 2022, 4:12 a.m. UTC | #6
`


On Thu, Dec 1, 2022 at 10:59 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Dec 01, 2022 at 04:35:48PM +0800, Jason Wang wrote:
> > On Wed, Nov 30, 2022 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 02:33:51PM +0800, Jason Wang wrote:
> > > > On Tue, Nov 29, 2022 at 11:57 PM Peter Xu <peterx@redhat.com> wrote:
> > > > >
> > > > > On Tue, Nov 29, 2022 at 04:10:37PM +0800, Jason Wang wrote:
> > > > > > The IOVA tree is only built during page walk this breaks the device
> > > > > > that tries to use UNMAP notifier only. One example is vhost-net, it
> > > > > > tries to use UNMAP notifier when vIOMMU doesn't support DEVIOTLB_UNMAP
> > > > > > notifier (e.g when dt mode is not enabled). The interesting part is
> > > > > > that it doesn't use MAP since it can query the IOMMU translation by
> > > > > > itself upon a IOTLB miss.
> > > > > >
> > > > > > This doesn't work since Qemu doesn't build IOVA tree in IOMMU
> > > > > > translation which means the UNMAP notifier won't be triggered during
> > > > > > the page walk since Qemu think it is never mapped. This could be
> > > > > > noticed when vIOMMU is used with vhost_net but dt is disabled.
> > > > > >
> > > > > > Fixing this by build the iova tree during IOMMU translation, this
> > > > > > makes sure the UNMAP notifier event could be identified during page
> > > > > > walk. And we need to walk page table not only for UNMAP notifier but
> > > > > > for MAP notifier during PSI.
> > > > > >
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > >  hw/i386/intel_iommu.c | 43 ++++++++++++++++++-------------------------
> > > > > >  1 file changed, 18 insertions(+), 25 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > > > index d025ef2873..edeb62f4b2 100644
> > > > > > --- a/hw/i386/intel_iommu.c
> > > > > > +++ b/hw/i386/intel_iommu.c
> > > > > > @@ -1834,6 +1834,8 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > > > >      uint8_t access_flags;
> > > > > >      bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
> > > > > >      VTDIOTLBEntry *iotlb_entry;
> > > > > > +    const DMAMap *mapped;
> > > > > > +    DMAMap target;
> > > > > >
> > > > > >      /*
> > > > > >       * We have standalone memory region for interrupt addresses, we
> > > > > > @@ -1954,6 +1956,21 @@ out:
> > > > > >      entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
> > > > > >      entry->addr_mask = ~page_mask;
> > > > > >      entry->perm = access_flags;
> > > > > > +
> > > > > > +    target.iova = entry->iova;
> > > > > > +    target.size = entry->addr_mask;
> > > > > > +    target.translated_addr = entry->translated_addr;
> > > > > > +    target.perm = entry->perm;
> > > > > > +
> > > > > > +    mapped = iova_tree_find(vtd_as->iova_tree, &target);
> > > > > > +    if (!mapped) {
> > > > > > +        /* To make UNMAP notifier work, we need build iova tree here
> > > > > > +         * in order to have the UNMAP iommu notifier to be triggered
> > > > > > +         * during the page walk.
> > > > > > +         */
> > > > > > +        iova_tree_insert(vtd_as->iova_tree, &target);
> > > > > > +    }
> > > > > > +
> > > > > >      return true;
> > > > > >
> > > > > >  error:
> > > > > > @@ -2161,31 +2178,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > > > > >          ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > > > > >                                         vtd_as->devfn, &ce);
> > > > > >          if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
> > > > > > -            if (vtd_as_has_map_notifier(vtd_as)) {
> > > > > > -                /*
> > > > > > -                 * As long as we have MAP notifications registered in
> > > > > > -                 * any of our IOMMU notifiers, we need to sync the
> > > > > > -                 * shadow page table.
> > > > > > -                 */
> > > > > > -                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
> > > > > > -            } else {
> > > > > > -                /*
> > > > > > -                 * For UNMAP-only notifiers, we don't need to walk the
> > > > > > -                 * page tables.  We just deliver the PSI down to
> > > > > > -                 * invalidate caches.
> > > > > > -                 */
> > > > > > -                IOMMUTLBEvent event = {
> > > > > > -                    .type = IOMMU_NOTIFIER_UNMAP,
> > > > > > -                    .entry = {
> > > > > > -                        .target_as = &address_space_memory,
> > > > > > -                        .iova = addr,
> > > > > > -                        .translated_addr = 0,
> > > > > > -                        .addr_mask = size - 1,
> > > > > > -                        .perm = IOMMU_NONE,
> > > > > > -                    },
> > > > > > -                };
> > > > > > -                memory_region_notify_iommu(&vtd_as->iommu, 0, event);
>
> [1]
>
> > > > >
> > > > > Isn't this path the one that will be responsible for pass-through the UNMAP
> > > > > events from guest to vhost when there's no MAP notifier requested?
> > > >
> > > > Yes, but it doesn't do the iova tree removing. More below.
> > > >
> > > > >
> > > > > At least that's what I expected when introducing the iova tree, because for
> > > > > unmap-only device hierachy I thought we didn't need the tree at all.
> > > >
> > > > Then the problem is the UNMAP notifier won't be trigger at all during
> > > > DSI page walk in vtd_page_walk_one() because there's no DMAMap stored
> > > > in the iova tree.:
> > > >
> > > >         if (!mapped) {
> > > >             /* Skip since we didn't map this range at all */
> > > >             trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
> > > >             return 0;
> > > >         }
> > > >
> > > > So I choose to build the iova tree in translate then we won't go
> > > > within the above condition.
> > >
> > > That's also why it's weird because IIUC we should never walk a page table
> > > at all if there's no MAP notifier regiestered.
> >
> > If this is true, we probably need to document this somewhere.
>
> Agree.  I'll post a patch.
>
> >
> > >
> > > When I'm looking at the walk callers I found that indeed there's one path
> > > missing where can cause it to actually walk the pgtables without !MAP, then
> > > I also noticed commit f7701e2c7983b6, and I'm wondering what we really want
> > > is something like this:
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index a08ee85edf..c46f3db992 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -1536,7 +1536,7 @@ static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> > >      VTDContextEntry ce;
> > >      IOMMUNotifier *n;
> > >
> > > -    if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
> > > +    if (!vtd_as_has_map_notifier(vtd_as)) {
> > >          return 0;
> > >      }
> > >
> > > So I'm not sure whether this patch is the problem resolver; so far I feel
> > > like it's patch 2 who does the real fix.  Then we can have the above
> > > oneliner so we stop any walks when there's no map notifiers.
> > >
> > > Thanks,
> >
> > I may miss something but as state above, the problem is a missing
> > UNMAP notification during DSI when there's only UNMAP notifier.
>
> I got confused too on why we didn't notify UNMAP for DSI already, that's so
> weird because I thought it should be there or it should be broken for a
> long time.. as we discussed multiple times around this one:
>
>                 /*
>                  * Fallback to domain selective flush if no PSI support or
>                  * the size is too big.
>                  */
>                 if (!cap_pgsel_inv(iommu->cap) ||
>                     mask > cap_max_amask_val(iommu->cap))
>                         iommu->flush.flush_iotlb(iommu, did, 0, 0,
>                                                         DMA_TLB_DSI_FLUSH);
>                 else
>                         iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
>                                                         DMA_TLB_PSI_FLUSH);
>
> I guess we were just always lucky?..

Probably, or the reason is that we a notifier with UNMAP only is not
commonly used before until patch 2.

>
> >
> > To solve it we might have two ways:
> >
> > 1) build the iova tree during iommu translation then we can correctly
> > trigger UNMAP during page walk caused by DSI
> > 2) don't do the iova tree walk for !MAP notifier, need new logic to
> > trigger UNMAP notifier in PSI/DSI
> >
> > This patch choose to go 1) (which seems easier at least for -stable).
> > Do you mean you prefer to go with 2)?
>
> Yes.
>
> IOVA tree is unnecessary overhead IMHO because UNMAP (both IOTLB or
> DEVIOTLB unmap) shouldn't need that complexity at all.  Using the iova tree
> can be accurate on which page got unmapped when the kernel driver used DSI
> for a large PSI as shown above, however IMHO it needs more justification
> that the pgtable walk is worth the effort.  Not to mention if a device in
> QEMU that wants to use the iova tree for some reason, one can just register
> with MAP and ignore all MAP events, while by default we keep UNMAP simple.
>
> So we could do:
>
>   (1) Rename vtd_sync_shadow_page_table() to vtd_sync_domain()
>   (2) instead of optimizing dev-iotlb only there:
>
>       if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
>           return 0;
>       }
>
>       we should firstly check if UNMAP or DEVUNMAP registered, we directly
>       send a notification to the whole domain.  We need to choose the event
>       that the register happens with but not both.
>
> This also reminded me that whether we should sanity check on iommu
> notifiers on some invalid cases.  E.g. it seems to me when registered with
> DEVIOTLB_UNMAP it should not register with either MAP or UNMAP anymore or
> it doesn't make sense.

It seems it doesn't' harm to allow both UNMAP and DEVIOTLB_UNMAP work.

>
> One step further, I'm wondering whether the DEV_IOTLB event should exist at
> all.  Maybe we want to have DEV_IOTLB typed iommu notifier only,

This seems cleaner ( I remember we had some discussion before ).

> but then
> when we got dev-iotlb PSI/DSI we notify with type=UNMAP just like normal
> UNMAP events, then in above (2) we can send UNMAP constantly as long as
> !MAP.  But even if so that'll just be another optional change on top.

I'm fine to go without iova-tree. Would you mind to post patches for
fix? I can test and include it in this series then.

Thanks

>
> Thanks,
>
> --
> Peter Xu
>
Peter Xu Dec. 5, 2022, 11:18 p.m. UTC | #7
Jason,

On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> I'm fine to go without iova-tree. Would you mind to post patches for
> fix? I can test and include it in this series then.

One sample patch attached, only compile tested.

I can also work on this but I'll be slow in making progress, so I'll add it
into my todo.  If you can help to fix this issue it'll be more than great.
No worry on the ownership or authorship of the patch if you agree on the
change and moving forward with this when modifying - just take it over!

Thanks!
Jason Wang Dec. 6, 2022, 3:18 a.m. UTC | #8
On Tue, Dec 6, 2022 at 7:19 AM Peter Xu <peterx@redhat.com> wrote:
>
> Jason,
>
> On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> > I'm fine to go without iova-tree. Would you mind to post patches for
> > fix? I can test and include it in this series then.
>
> One sample patch attached, only compile tested.

I don't see any direct connection between the attached patch and the
intel-iommu?

>
> I can also work on this but I'll be slow in making progress, so I'll add it
> into my todo.  If you can help to fix this issue it'll be more than great.

Ok, let me try but it might take some time :)

> No worry on the ownership or authorship of the patch if you agree on the
> change and moving forward with this when modifying - just take it over!

Ok.

Thanks

>
> Thanks!
>
> --
> Peter Xu
Peter Xu Dec. 6, 2022, 1:58 p.m. UTC | #9
On Tue, Dec 06, 2022 at 11:18:03AM +0800, Jason Wang wrote:
> On Tue, Dec 6, 2022 at 7:19 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > Jason,
> >
> > On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> > > I'm fine to go without iova-tree. Would you mind to post patches for
> > > fix? I can test and include it in this series then.
> >
> > One sample patch attached, only compile tested.
> 
> I don't see any direct connection between the attached patch and the
> intel-iommu?

Sorry!  Wrong tree dumped...  Trying again.

> 
> >
> > I can also work on this but I'll be slow in making progress, so I'll add it
> > into my todo.  If you can help to fix this issue it'll be more than great.
> 
> Ok, let me try but it might take some time :)

Sure. :)

I'll also add it into my todo (and I think the other similar one has been
there for a while.. :( ).
Jason Wang Dec. 23, 2022, 8:02 a.m. UTC | #10
On Tue, Dec 6, 2022 at 9:58 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Dec 06, 2022 at 11:18:03AM +0800, Jason Wang wrote:
> > On Tue, Dec 6, 2022 at 7:19 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Jason,
> > >
> > > On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> > > > I'm fine to go without iova-tree. Would you mind to post patches for
> > > > fix? I can test and include it in this series then.
> > >
> > > One sample patch attached, only compile tested.
> >
> > I don't see any direct connection between the attached patch and the
> > intel-iommu?
>
> Sorry!  Wrong tree dumped...  Trying again.

The HWADDR breaks memory_region_notify_iommu_one():

qemu-system-x86_64: ../softmmu/memory.c:1991:
memory_region_notify_iommu_one: Assertion `entry->iova >=
notifier->start && entry_end <= notifier->end' failed.

I wonder if we need either:

1) remove the assert

or

2) introduce a new memory_region_notify_unmap_all() to unmap from
notifier->start to notifier->end.

Thanks

>
> >
> > >
> > > I can also work on this but I'll be slow in making progress, so I'll add it
> > > into my todo.  If you can help to fix this issue it'll be more than great.
> >
> > Ok, let me try but it might take some time :)
>
> Sure. :)
>
> I'll also add it into my todo (and I think the other similar one has been
> there for a while.. :( ).
>
> --
> Peter Xu
Peter Xu Dec. 23, 2022, 4:22 p.m. UTC | #11
On Fri, Dec 23, 2022 at 04:02:29PM +0800, Jason Wang wrote:
> On Tue, Dec 6, 2022 at 9:58 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Dec 06, 2022 at 11:18:03AM +0800, Jason Wang wrote:
> > > On Tue, Dec 6, 2022 at 7:19 AM Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > Jason,
> > > >
> > > > On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> > > > > I'm fine to go without iova-tree. Would you mind to post patches for
> > > > > fix? I can test and include it in this series then.
> > > >
> > > > One sample patch attached, only compile tested.
> > >
> > > I don't see any direct connection between the attached patch and the
> > > intel-iommu?
> >
> > Sorry!  Wrong tree dumped...  Trying again.
> 
> The HWADDR breaks memory_region_notify_iommu_one():
> 
> qemu-system-x86_64: ../softmmu/memory.c:1991:
> memory_region_notify_iommu_one: Assertion `entry->iova >=
> notifier->start && entry_end <= notifier->end' failed.
> 
> I wonder if we need either:
> 
> 1) remove the assert

I vote for this one.  Not only removing the assertion, we should probably
crop the range too just like dev-iotlb unmaps?

Thanks,

> 
> or
> 
> 2) introduce a new memory_region_notify_unmap_all() to unmap from
> notifier->start to notifier->end.
diff mbox series

Patch

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d025ef2873..edeb62f4b2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1834,6 +1834,8 @@  static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     uint8_t access_flags;
     bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
     VTDIOTLBEntry *iotlb_entry;
+    const DMAMap *mapped;
+    DMAMap target;
 
     /*
      * We have standalone memory region for interrupt addresses, we
@@ -1954,6 +1956,21 @@  out:
     entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
     entry->addr_mask = ~page_mask;
     entry->perm = access_flags;
+
+    target.iova = entry->iova;
+    target.size = entry->addr_mask;
+    target.translated_addr = entry->translated_addr;
+    target.perm = entry->perm;
+
+    mapped = iova_tree_find(vtd_as->iova_tree, &target);
+    if (!mapped) {
+        /* To make UNMAP notifier work, we need build iova tree here
+         * in order to have the UNMAP iommu notifier to be triggered
+         * during the page walk.
+         */
+        iova_tree_insert(vtd_as->iova_tree, &target);
+    }
+
     return true;
 
 error:
@@ -2161,31 +2178,7 @@  static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
         ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
                                        vtd_as->devfn, &ce);
         if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
-            if (vtd_as_has_map_notifier(vtd_as)) {
-                /*
-                 * As long as we have MAP notifications registered in
-                 * any of our IOMMU notifiers, we need to sync the
-                 * shadow page table.
-                 */
-                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
-            } else {
-                /*
-                 * For UNMAP-only notifiers, we don't need to walk the
-                 * page tables.  We just deliver the PSI down to
-                 * invalidate caches.
-                 */
-                IOMMUTLBEvent event = {
-                    .type = IOMMU_NOTIFIER_UNMAP,
-                    .entry = {
-                        .target_as = &address_space_memory,
-                        .iova = addr,
-                        .translated_addr = 0,
-                        .addr_mask = size - 1,
-                        .perm = IOMMU_NONE,
-                    },
-                };
-                memory_region_notify_iommu(&vtd_as->iommu, 0, event);
-            }
+            vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
         }
     }
 }