diff mbox series

intel-iommu: Document iova_tree

Message ID 20221201162501.3864692-1-peterx@redhat.com
State New
Headers show
Series intel-iommu: Document iova_tree | expand

Commit Message

Peter Xu Dec. 1, 2022, 4:25 p.m. UTC
It seems not super clear on when iova_tree is used, and why.  Add a rich
comment above iova_tree to track why we needed the iova_tree, and when we
need it.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

Comments

Eric Auger Dec. 1, 2022, 6:17 p.m. UTC | #1
Hi Peter

On 12/1/22 17:25, Peter Xu wrote:
> It seems not super clear on when iova_tree is used, and why.  Add a rich
> comment above iova_tree to track why we needed the iova_tree, and when we
> need it.
>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 46d973e629..8d130ab2e3 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
>      QLIST_ENTRY(VTDAddressSpace) next;
>      /* Superset of notifier flags that this address space has */
>      IOMMUNotifierFlag notifier_flags;
> -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> +    /*
> +     * @iova_tree traces mapped IOVA ranges.
> +     *
> +     * The tree is not needed if no MAP notifiers is registered with

s/no MAP notifiers/no MAP notifier
> +     * current VTD address space, because all UNMAP (including iotlb or
> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> +     * notifiers.
because all UNMAP notifications (iotlb or dev-iotlb) can be triggered
directly, as opposed to MAP notifications. (?)
> +     *
> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> +     * reasons.
> +     *
> +     * Firstly, there's no way to identify whether an PSI event is MAP or
maybe give the decryption of the 'PSI' and 'DSI" acronyms once ;-)
> +     * UNMAP within the PSI message itself.  Without having prior knowledge
> +     * of existing state vIOMMU doesn't know whether it should notify MAP
> +     * or UNMAP for a PSI message it received.
> +     *
> +     * Secondly, PSI received from guest driver (or even a large PSI can
> +     * grow into a DSI at least with Linux intel-iommu driver) can be
> +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> +     * events. If it directly pass-throughs any such event it may confuse

If it directly notifies the registered device with the unmodified range, it may confuse the drivers ../..

So the range of the MAP notification can be adapted based on the existing IOVA mappings.  

> +     * the registered drivers (e.g. vfio-pci) on either: (1) trying to map
> +     * the same region more than once (for VFIO_IOMMU_MAP_DMA, -EEXIST will
> +     * trigger), or (2) trying to UNMAP a range that is still partially
> +     * mapped.  That accuracy is not required for UNMAP-only notifiers, but
> +     * it is a must-to-have for MAP-inclusive notifiers, because the vIOMMU
> +     * needs to make sure the shadow page table is always in sync with the
> +     * guest IOMMU pgtables for a device.
> +     */
> +    IOVATree *iova_tree;
>  };
>  
>  struct VTDIOTLBEntry {
Thanks

Eric
Peter Xu Dec. 1, 2022, 7:22 p.m. UTC | #2
On Thu, Dec 01, 2022 at 07:17:41PM +0100, Eric Auger wrote:
> Hi Peter

Hi, Eric,

> 
> On 12/1/22 17:25, Peter Xu wrote:
> > It seems not super clear on when iova_tree is used, and why.  Add a rich
> > comment above iova_tree to track why we needed the iova_tree, and when we
> > need it.
> >
> > Suggested-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 46d973e629..8d130ab2e3 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> >      QLIST_ENTRY(VTDAddressSpace) next;
> >      /* Superset of notifier flags that this address space has */
> >      IOMMUNotifierFlag notifier_flags;
> > -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> > +    /*
> > +     * @iova_tree traces mapped IOVA ranges.
> > +     *
> > +     * The tree is not needed if no MAP notifiers is registered with
> 
> s/no MAP notifiers/no MAP notifier

Will fix.

> > +     * current VTD address space, because all UNMAP (including iotlb or
> > +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> > +     * notifiers.
> because all UNMAP notifications (iotlb or dev-iotlb) can be triggered
> directly, as opposed to MAP notifications. (?)

What I wanted to say is any PSI or DSI messages we got from the guest can
be transparently delivered to QEMU's iommu notifiers.  I'm not sure
"triggered directly" best describe the case here.

PSI: Page Selective Invalidations
DSI: Domain Selective Invalidations

Sorry to mention these terms again, but that's really what the "transparent
delivery" means here - we get the PSI/DSI messages, then we notify with the
same ranges in IOMMU notifiers.  They're not the same concept but we do
that transparently without changing the core of the messages.

Maybe I should spell out "!MAP" as "UNMAP-only"?  Would that help?

> > +     *
> > +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> > +     * reasons.
> > +     *
> > +     * Firstly, there's no way to identify whether an PSI event is MAP or
> maybe give the decryption of the 'PSI' and 'DSI" acronyms once ;-)

Please see above. :)

These are VT-d terms used in multiple places in the .[ch] files, I assume
I'll just keep using them because otherwise I'll need to comment them
everytime we use any PSI/DSI terms.  It might become an overkill I'm afraid.

> > +     * UNMAP within the PSI message itself.  Without having prior knowledge
> > +     * of existing state vIOMMU doesn't know whether it should notify MAP
> > +     * or UNMAP for a PSI message it received.
> > +     *
> > +     * Secondly, PSI received from guest driver (or even a large PSI can
> > +     * grow into a DSI at least with Linux intel-iommu driver) can be
> > +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> > +     * events. If it directly pass-throughs any such event it may confuse
> 
> If it directly notifies the registered device with the unmodified range, it may confuse the drivers ../..

Will fix.

> 
> So the range of the MAP notification can be adapted based on the existing IOVA mappings.  

Yes, e.g. the iova tree makes sure we don't map something again if it's mapped.

Thanks,

> 
> > +     * the registered drivers (e.g. vfio-pci) on either: (1) trying to map
> > +     * the same region more than once (for VFIO_IOMMU_MAP_DMA, -EEXIST will
> > +     * trigger), or (2) trying to UNMAP a range that is still partially
> > +     * mapped.  That accuracy is not required for UNMAP-only notifiers, but
> > +     * it is a must-to-have for MAP-inclusive notifiers, because the vIOMMU
> > +     * needs to make sure the shadow page table is always in sync with the
> > +     * guest IOMMU pgtables for a device.
> > +     */
> > +    IOVATree *iova_tree;
> >  };
> >  
> >  struct VTDIOTLBEntry {
> Thanks
> 
> Eric
>
Peter Xu Dec. 1, 2022, 7:44 p.m. UTC | #3
On Thu, Dec 01, 2022 at 02:22:27PM -0500, Peter Xu wrote:
> On Thu, Dec 01, 2022 at 07:17:41PM +0100, Eric Auger wrote:
> > Hi Peter
> 
> Hi, Eric,
> 
> > 
> > On 12/1/22 17:25, Peter Xu wrote:
> > > It seems not super clear on when iova_tree is used, and why.  Add a rich
> > > comment above iova_tree to track why we needed the iova_tree, and when we
> > > need it.
> > >
> > > Suggested-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
> > >  1 file changed, 29 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > > index 46d973e629..8d130ab2e3 100644
> > > --- a/include/hw/i386/intel_iommu.h
> > > +++ b/include/hw/i386/intel_iommu.h
> > > @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> > >      QLIST_ENTRY(VTDAddressSpace) next;
> > >      /* Superset of notifier flags that this address space has */
> > >      IOMMUNotifierFlag notifier_flags;
> > > -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> > > +    /*
> > > +     * @iova_tree traces mapped IOVA ranges.
> > > +     *
> > > +     * The tree is not needed if no MAP notifiers is registered with
> > 
> > s/no MAP notifiers/no MAP notifier
> 
> Will fix.
> 
> > > +     * current VTD address space, because all UNMAP (including iotlb or
> > > +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> > > +     * notifiers.
> > because all UNMAP notifications (iotlb or dev-iotlb) can be triggered
> > directly, as opposed to MAP notifications. (?)
> 
> What I wanted to say is any PSI or DSI messages we got from the guest can
> be transparently delivered to QEMU's iommu notifiers.  I'm not sure
> "triggered directly" best describe the case here.
> 
> PSI: Page Selective Invalidations
> DSI: Domain Selective Invalidations
> 
> Sorry to mention these terms again, but that's really what the "transparent
> delivery" means here - we get the PSI/DSI messages, then we notify with the
> same ranges in IOMMU notifiers.  They're not the same concept but we do
> that transparently without changing the core of the messages.
> 
> Maybe I should spell out "!MAP" as "UNMAP-only"?  Would that help?
> 
> > > +     *
> > > +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> > > +     * reasons.
> > > +     *
> > > +     * Firstly, there's no way to identify whether an PSI event is MAP or
> > maybe give the decryption of the 'PSI' and 'DSI" acronyms once ;-)
> 
> Please see above. :)
> 
> These are VT-d terms used in multiple places in the .[ch] files, I assume
> I'll just keep using them because otherwise I'll need to comment them
> everytime we use any PSI/DSI terms.  It might become an overkill I'm afraid.
> 
> > > +     * UNMAP within the PSI message itself.  Without having prior knowledge
> > > +     * of existing state vIOMMU doesn't know whether it should notify MAP
> > > +     * or UNMAP for a PSI message it received.
> > > +     *
> > > +     * Secondly, PSI received from guest driver (or even a large PSI can
> > > +     * grow into a DSI at least with Linux intel-iommu driver) can be
> > > +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> > > +     * events. If it directly pass-throughs any such event it may confuse
> > 
> > If it directly notifies the registered device with the unmodified range, it may confuse the drivers ../..
> 
> Will fix.
> 
> > 
> > So the range of the MAP notification can be adapted based on the existing IOVA mappings.  
> 
> Yes, e.g. the iova tree makes sure we don't map something again if it's mapped.

I figured maybe simpler I just attach a v2..

Thanks,
Jason Wang Dec. 5, 2022, 4:23 a.m. UTC | #4
On Fri, Dec 2, 2022 at 12:25 AM Peter Xu <peterx@redhat.com> wrote:
>
> It seems not super clear on when iova_tree is used, and why.  Add a rich
> comment above iova_tree to track why we needed the iova_tree, and when we
> need it.
>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 46d973e629..8d130ab2e3 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
>      QLIST_ENTRY(VTDAddressSpace) next;
>      /* Superset of notifier flags that this address space has */
>      IOMMUNotifierFlag notifier_flags;
> -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> +    /*
> +     * @iova_tree traces mapped IOVA ranges.
> +     *
> +     * The tree is not needed if no MAP notifiers is registered with
> +     * current VTD address space, because all UNMAP (including iotlb or
> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> +     * notifiers.

So this means the UNMAP notifier doesn't need to be as accurate as
MAP. (Should we document it in the notifier headers)?

For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].

> +     *
> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> +     * reasons.
> +     *
> +     * Firstly, there's no way to identify whether an PSI event is MAP or
> +     * UNMAP within the PSI message itself.  Without having prior knowledge
> +     * of existing state vIOMMU doesn't know whether it should notify MAP
> +     * or UNMAP for a PSI message it received.
> +     *
> +     * Secondly, PSI received from guest driver (or even a large PSI can
> +     * grow into a DSI at least with Linux intel-iommu driver) can be
> +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> +     * events.

Yes, so I think we need a document that the UNMAP handler should be
prepared for this.

Thanks

> If it directly pass-throughs any such event it may confuse
> +     * the registered drivers (e.g. vfio-pci) on either: (1) trying to map
> +     * the same region more than once (for VFIO_IOMMU_MAP_DMA, -EEXIST will
> +     * trigger), or (2) trying to UNMAP a range that is still partially
> +     * mapped.  That accuracy is not required for UNMAP-only notifiers, but
> +     * it is a must-to-have for MAP-inclusive notifiers, because the vIOMMU
> +     * needs to make sure the shadow page table is always in sync with the
> +     * guest IOMMU pgtables for a device.
> +     */
> +    IOVATree *iova_tree;
>  };
>
>  struct VTDIOTLBEntry {
> --
> 2.37.3
>
Peter Xu Dec. 5, 2022, 11:28 p.m. UTC | #5
On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
> On Fri, Dec 2, 2022 at 12:25 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > It seems not super clear on when iova_tree is used, and why.  Add a rich
> > comment above iova_tree to track why we needed the iova_tree, and when we
> > need it.
> >
> > Suggested-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 46d973e629..8d130ab2e3 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> >      QLIST_ENTRY(VTDAddressSpace) next;
> >      /* Superset of notifier flags that this address space has */
> >      IOMMUNotifierFlag notifier_flags;
> > -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> > +    /*
> > +     * @iova_tree traces mapped IOVA ranges.
> > +     *
> > +     * The tree is not needed if no MAP notifiers is registered with
> > +     * current VTD address space, because all UNMAP (including iotlb or
> > +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> > +     * notifiers.
> 
> So this means the UNMAP notifier doesn't need to be as accurate as
> MAP. (Should we document it in the notifier headers)?

Yes.

> 
> For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].

IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
b+Y] as long as the range covers [a, b]?

> 
> > +     *
> > +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> > +     * reasons.
> > +     *
> > +     * Firstly, there's no way to identify whether an PSI event is MAP or
> > +     * UNMAP within the PSI message itself.  Without having prior knowledge
> > +     * of existing state vIOMMU doesn't know whether it should notify MAP
> > +     * or UNMAP for a PSI message it received.
> > +     *
> > +     * Secondly, PSI received from guest driver (or even a large PSI can
> > +     * grow into a DSI at least with Linux intel-iommu driver) can be
> > +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> > +     * events.
> 
> Yes, so I think we need a document that the UNMAP handler should be
> prepared for this.

How about I squash below into this same patch?

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..c83bd11a68 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
+ *
+ * Normally there're two use cases for the notifiers:
+ *
+ *   (1) When the device needs accurate synchronizations of the vIOMMU page
+ *       tables, it needs to register with both MAP|UNMAP notifies (which
+ *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
+ *       events are registered, the notifications will be accurate but
+ *       there's overhead on synchronizing the guest vIOMMU page tables.
+ *
+ *   (2) When the device doesn't need accurate synchronizations of the
+ *       vIOMMU page tables (when the device can both cache translations
+ *       and requesting to translate dynamically during DMA process), it
+ *       needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
+ *       Note that in such working mode shadow page table is not used for
+ *       vIOMMU unit on this address space, so the UNMAP messages can be
+ *       actually larger than the real invalidations (just like how the
+ *       Linux IOMMU driver normally works, where an invalidation can be
+ *       enlarged as long as it still covers the target range).
  */
 typedef enum {
     IOMMU_NOTIFIER_NONE = 0,

Thanks,
Jason Wang Dec. 6, 2022, 7:04 a.m. UTC | #6
On Tue, Dec 6, 2022 at 7:28 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
> > On Fri, Dec 2, 2022 at 12:25 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > It seems not super clear on when iova_tree is used, and why.  Add a rich
> > > comment above iova_tree to track why we needed the iova_tree, and when we
> > > need it.
> > >
> > > Suggested-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
> > >  1 file changed, 29 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > > index 46d973e629..8d130ab2e3 100644
> > > --- a/include/hw/i386/intel_iommu.h
> > > +++ b/include/hw/i386/intel_iommu.h
> > > @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> > >      QLIST_ENTRY(VTDAddressSpace) next;
> > >      /* Superset of notifier flags that this address space has */
> > >      IOMMUNotifierFlag notifier_flags;
> > > -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> > > +    /*
> > > +     * @iova_tree traces mapped IOVA ranges.
> > > +     *
> > > +     * The tree is not needed if no MAP notifiers is registered with
> > > +     * current VTD address space, because all UNMAP (including iotlb or
> > > +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> > > +     * notifiers.
> >
> > So this means the UNMAP notifier doesn't need to be as accurate as
> > MAP. (Should we document it in the notifier headers)?
>
> Yes.
>
> >
> > For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
>
> IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
> b+Y] as long as the range covers [a, b]?

Right.

>
> >
> > > +     *
> > > +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> > > +     * reasons.
> > > +     *
> > > +     * Firstly, there's no way to identify whether an PSI event is MAP or
> > > +     * UNMAP within the PSI message itself.  Without having prior knowledge
> > > +     * of existing state vIOMMU doesn't know whether it should notify MAP
> > > +     * or UNMAP for a PSI message it received.
> > > +     *
> > > +     * Secondly, PSI received from guest driver (or even a large PSI can
> > > +     * grow into a DSI at least with Linux intel-iommu driver) can be
> > > +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> > > +     * events.
> >
> > Yes, so I think we need a document that the UNMAP handler should be
> > prepared for this.
>
> How about I squash below into this same patch?

Looks good to me.

Thanks

>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 91f8a2395a..c83bd11a68 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
>  /*
>   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>   * register with one or multiple IOMMU Notifier capability bit(s).
> + *
> + * Normally there're two use cases for the notifiers:
> + *
> + *   (1) When the device needs accurate synchronizations of the vIOMMU page
> + *       tables, it needs to register with both MAP|UNMAP notifies (which
> + *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
> + *       events are registered, the notifications will be accurate but
> + *       there's overhead on synchronizing the guest vIOMMU page tables.
> + *
> + *   (2) When the device doesn't need accurate synchronizations of the
> + *       vIOMMU page tables (when the device can both cache translations
> + *       and requesting to translate dynamically during DMA process), it
> + *       needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
> + *       Note that in such working mode shadow page table is not used for
> + *       vIOMMU unit on this address space, so the UNMAP messages can be
> + *       actually larger than the real invalidations (just like how the
> + *       Linux IOMMU driver normally works, where an invalidation can be
> + *       enlarged as long as it still covers the target range).
>   */
>  typedef enum {
>      IOMMU_NOTIFIER_NONE = 0,
>
> Thanks,
>
> --
> Peter Xu
>
Eric Auger Dec. 6, 2022, 1:06 p.m. UTC | #7
Hi Peter,
On 12/1/22 20:22, Peter Xu wrote:
> On Thu, Dec 01, 2022 at 07:17:41PM +0100, Eric Auger wrote:
>> Hi Peter
> Hi, Eric,
>
>> On 12/1/22 17:25, Peter Xu wrote:
>>> It seems not super clear on when iova_tree is used, and why.  Add a rich
>>> comment above iova_tree to track why we needed the iova_tree, and when we
>>> need it.
>>>
>>> Suggested-by: Jason Wang <jasowang@redhat.com>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
>>>  1 file changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>> index 46d973e629..8d130ab2e3 100644
>>> --- a/include/hw/i386/intel_iommu.h
>>> +++ b/include/hw/i386/intel_iommu.h
>>> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
>>>      QLIST_ENTRY(VTDAddressSpace) next;
>>>      /* Superset of notifier flags that this address space has */
>>>      IOMMUNotifierFlag notifier_flags;
>>> -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
>>> +    /*
>>> +     * @iova_tree traces mapped IOVA ranges.
>>> +     *
>>> +     * The tree is not needed if no MAP notifiers is registered with
>> s/no MAP notifiers/no MAP notifier
> Will fix.
>
>>> +     * current VTD address space, because all UNMAP (including iotlb or
>>> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
>>> +     * notifiers.
>> because all UNMAP notifications (iotlb or dev-iotlb) can be triggered
>> directly, as opposed to MAP notifications. (?)
> What I wanted to say is any PSI or DSI messages we got from the guest can
> be transparently delivered to QEMU's iommu notifiers.  I'm not sure
> "triggered directly" best describe the case here.
yes "transparently delivered" is OK. Or "guest invalidate commands can
be directly passed to the IOMMU UNMAP notifiers without any further
reshuffling". But that's nitpicking.
>
> PSI: Page Selective Invalidations
> DSI: Domain Selective Invalidations
>
> Sorry to mention these terms again, but that's really what the "transparent
> delivery" means here - we get the PSI/DSI messages, then we notify with the
> same ranges in IOMMU notifiers.  They're not the same concept but we do
> that transparently without changing the core of the messages.
>
> Maybe I should spell out "!MAP" as "UNMAP-only"?  Would that help?
yeah those are unmap notifiers if I am correct.
>
>>> +     *
>>> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
>>> +     * reasons.
>>> +     *
>>> +     * Firstly, there's no way to identify whether an PSI event is MAP or
>> maybe give the decryption of the 'PSI' and 'DSI" acronyms once ;-)
> Please see above. :)
ok thanks
>
> These are VT-d terms used in multiple places in the .[ch] files, I assume
> I'll just keep using them because otherwise I'll need to comment them
> everytime we use any PSI/DSI terms.  It might become an overkill I'm afraid.
OK maybe just using the full terminology once is enough.
>
>>> +     * UNMAP within the PSI message itself.  Without having prior knowledge
>>> +     * of existing state vIOMMU doesn't know whether it should notify MAP
>>> +     * or UNMAP for a PSI message it received.
>>> +     *
>>> +     * Secondly, PSI received from guest driver (or even a large PSI can
>>> +     * grow into a DSI at least with Linux intel-iommu driver) can be
>>> +     * larger in range than the newly mapped ranges for either MAP or UNMAP
>>> +     * events. If it directly pass-throughs any such event it may confuse
>> If it directly notifies the registered device with the unmodified range, it may confuse the drivers ../..
> Will fix.
>
>> So the range of the MAP notification can be adapted based on the existing IOVA mappings.  
> Yes, e.g. the iova tree makes sure we don't map something again if it's mapped.

OK

Thanks

Eric
>
> Thanks,
>
>>> +     * the registered drivers (e.g. vfio-pci) on either: (1) trying to map
>>> +     * the same region more than once (for VFIO_IOMMU_MAP_DMA, -EEXIST will
>>> +     * trigger), or (2) trying to UNMAP a range that is still partially
>>> +     * mapped.  That accuracy is not required for UNMAP-only notifiers, but
>>> +     * it is a must-to-have for MAP-inclusive notifiers, because the vIOMMU
>>> +     * needs to make sure the shadow page table is always in sync with the
>>> +     * guest IOMMU pgtables for a device.
>>> +     */
>>> +    IOVATree *iova_tree;
>>>  };
>>>  
>>>  struct VTDIOTLBEntry {
>> Thanks
>>
>> Eric
>>
Eric Auger Dec. 6, 2022, 1:16 p.m. UTC | #8
Hi Peter,
On 12/6/22 00:28, Peter Xu wrote:
> On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
>> On Fri, Dec 2, 2022 at 12:25 AM Peter Xu <peterx@redhat.com> wrote:
>>> It seems not super clear on when iova_tree is used, and why.  Add a rich
>>> comment above iova_tree to track why we needed the iova_tree, and when we
>>> need it.
>>>
>>> Suggested-by: Jason Wang <jasowang@redhat.com>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
>>>  1 file changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>> index 46d973e629..8d130ab2e3 100644
>>> --- a/include/hw/i386/intel_iommu.h
>>> +++ b/include/hw/i386/intel_iommu.h
>>> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
>>>      QLIST_ENTRY(VTDAddressSpace) next;
>>>      /* Superset of notifier flags that this address space has */
>>>      IOMMUNotifierFlag notifier_flags;
>>> -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
>>> +    /*
>>> +     * @iova_tree traces mapped IOVA ranges.
>>> +     *
>>> +     * The tree is not needed if no MAP notifiers is registered with
>>> +     * current VTD address space, because all UNMAP (including iotlb or
>>> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
>>> +     * notifiers.
>> So this means the UNMAP notifier doesn't need to be as accurate as
>> MAP. (Should we document it in the notifier headers)?
> Yes.
>
>> For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
> IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
> b+Y] as long as the range covers [a, b]?
>
>>> +     *
>>> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
>>> +     * reasons.
>>> +     *
>>> +     * Firstly, there's no way to identify whether an PSI event is MAP or
>>> +     * UNMAP within the PSI message itself.  Without having prior knowledge
>>> +     * of existing state vIOMMU doesn't know whether it should notify MAP
>>> +     * or UNMAP for a PSI message it received.
>>> +     *
>>> +     * Secondly, PSI received from guest driver (or even a large PSI can
>>> +     * grow into a DSI at least with Linux intel-iommu driver) can be
>>> +     * larger in range than the newly mapped ranges for either MAP or UNMAP
>>> +     * events.
>> Yes, so I think we need a document that the UNMAP handler should be
>> prepared for this.
> How about I squash below into this same patch?
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 91f8a2395a..c83bd11a68 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
>  /*
>   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>   * register with one or multiple IOMMU Notifier capability bit(s).
> + *
> + * Normally there're two use cases for the notifiers:
> + *
> + *   (1) When the device needs accurate synchronizations of the vIOMMU page
accurate synchronizations sound too vague & subjective to me.
> + *       tables, it needs to register with both MAP|UNMAP notifies (which
> + *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
> + *       events are registered, the notifications will be accurate but
> + *       there's overhead on synchronizing the guest vIOMMU page tables.
> + *
> + *   (2) When the device doesn't need accurate synchronizations of the
> + *       vIOMMU page tables (when the device can both cache translations
> + *       and requesting to translate dynamically during DMA process), it
s/requesting/request
> + *       needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
would be nice to clarify the distinction between both then
> + *       Note that in such working mode shadow page table is not used for
> + *       vIOMMU unit on this address space, so the UNMAP messages can be
I do not catch 'is not used for vIOMMU unit on this address space'
> + *       actually larger than the real invalidations (just like how the
> + *       Linux IOMMU driver normally works, where an invalidation can be
> + *       enlarged as long as it still covers the target range).
>   */
>  typedef enum {
>      IOMMU_NOTIFIER_NONE = 0,
>
> Thanks,
>
Thanks

Eric
Peter Xu Dec. 6, 2022, 4:02 p.m. UTC | #9
On Tue, Dec 06, 2022 at 02:06:54PM +0100, Eric Auger wrote:
> >>> +     * current VTD address space, because all UNMAP (including iotlb or
> >>> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> >>> +     * notifiers.
> >> because all UNMAP notifications (iotlb or dev-iotlb) can be triggered
> >> directly, as opposed to MAP notifications. (?)
> > What I wanted to say is any PSI or DSI messages we got from the guest can
> > be transparently delivered to QEMU's iommu notifiers.  I'm not sure
> > "triggered directly" best describe the case here.
> yes "transparently delivered" is OK. Or "guest invalidate commands can
> be directly passed to the IOMMU UNMAP notifiers without any further
> reshuffling". But that's nitpicking.

Will do.

> >
> > PSI: Page Selective Invalidations
> > DSI: Domain Selective Invalidations
> >
> > Sorry to mention these terms again, but that's really what the "transparent
> > delivery" means here - we get the PSI/DSI messages, then we notify with the
> > same ranges in IOMMU notifiers.  They're not the same concept but we do
> > that transparently without changing the core of the messages.
> >
> > Maybe I should spell out "!MAP" as "UNMAP-only"?  Would that help?
> yeah those are unmap notifiers if I am correct.
> >
> >>> +     *
> >>> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> >>> +     * reasons.
> >>> +     *
> >>> +     * Firstly, there's no way to identify whether an PSI event is MAP or
> >> maybe give the decryption of the 'PSI' and 'DSI" acronyms once ;-)
> > Please see above. :)
> ok thanks
> >
> > These are VT-d terms used in multiple places in the .[ch] files, I assume
> > I'll just keep using them because otherwise I'll need to comment them
> > everytime we use any PSI/DSI terms.  It might become an overkill I'm afraid.
> OK maybe just using the full terminology once is enough.

Ok, I'll add them.

Thanks Eric.
Peter Xu Dec. 6, 2022, 4:05 p.m. UTC | #10
On Tue, Dec 06, 2022 at 02:16:32PM +0100, Eric Auger wrote:
> Hi Peter,
> On 12/6/22 00:28, Peter Xu wrote:
> > On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
> >> On Fri, Dec 2, 2022 at 12:25 AM Peter Xu <peterx@redhat.com> wrote:
> >>> It seems not super clear on when iova_tree is used, and why.  Add a rich
> >>> comment above iova_tree to track why we needed the iova_tree, and when we
> >>> need it.
> >>>
> >>> Suggested-by: Jason Wang <jasowang@redhat.com>
> >>> Signed-off-by: Peter Xu <peterx@redhat.com>
> >>> ---
> >>>  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
> >>>  1 file changed, 29 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> >>> index 46d973e629..8d130ab2e3 100644
> >>> --- a/include/hw/i386/intel_iommu.h
> >>> +++ b/include/hw/i386/intel_iommu.h
> >>> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> >>>      QLIST_ENTRY(VTDAddressSpace) next;
> >>>      /* Superset of notifier flags that this address space has */
> >>>      IOMMUNotifierFlag notifier_flags;
> >>> -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> >>> +    /*
> >>> +     * @iova_tree traces mapped IOVA ranges.
> >>> +     *
> >>> +     * The tree is not needed if no MAP notifiers is registered with
> >>> +     * current VTD address space, because all UNMAP (including iotlb or
> >>> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> >>> +     * notifiers.
> >> So this means the UNMAP notifier doesn't need to be as accurate as
> >> MAP. (Should we document it in the notifier headers)?
> > Yes.
> >
> >> For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
> > IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
> > b+Y] as long as the range covers [a, b]?
> >
> >>> +     *
> >>> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> >>> +     * reasons.
> >>> +     *
> >>> +     * Firstly, there's no way to identify whether an PSI event is MAP or
> >>> +     * UNMAP within the PSI message itself.  Without having prior knowledge
> >>> +     * of existing state vIOMMU doesn't know whether it should notify MAP
> >>> +     * or UNMAP for a PSI message it received.
> >>> +     *
> >>> +     * Secondly, PSI received from guest driver (or even a large PSI can
> >>> +     * grow into a DSI at least with Linux intel-iommu driver) can be
> >>> +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> >>> +     * events.
> >> Yes, so I think we need a document that the UNMAP handler should be
> >> prepared for this.
> > How about I squash below into this same patch?
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 91f8a2395a..c83bd11a68 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
> >  /*
> >   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
> >   * register with one or multiple IOMMU Notifier capability bit(s).
> > + *
> > + * Normally there're two use cases for the notifiers:
> > + *
> > + *   (1) When the device needs accurate synchronizations of the vIOMMU page
> accurate synchronizations sound too vague & subjective to me.

Suggestions?

> > + *       tables, it needs to register with both MAP|UNMAP notifies (which
> > + *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
> > + *       events are registered, the notifications will be accurate but
> > + *       there's overhead on synchronizing the guest vIOMMU page tables.
> > + *
> > + *   (2) When the device doesn't need accurate synchronizations of the
> > + *       vIOMMU page tables (when the device can both cache translations
> > + *       and requesting to translate dynamically during DMA process), it
> s/requesting/request
> > + *       needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
> would be nice to clarify the distinction between both then
> > + *       Note that in such working mode shadow page table is not used for
> > + *       vIOMMU unit on this address space, so the UNMAP messages can be
> I do not catch 'is not used for vIOMMU unit on this address space'

How about: "Note that in this working mode the vIOMMU will not maintain a
shadowed page table for the address space, and the UNMAP messages can be.."?

> > + *       actually larger than the real invalidations (just like how the
> > + *       Linux IOMMU driver normally works, where an invalidation can be
> > + *       enlarged as long as it still covers the target range).
> >   */
> >  typedef enum {
> >      IOMMU_NOTIFIER_NONE = 0,
> >
> > Thanks,
> >
> Thanks
> 
> Eric
>
Eric Auger Dec. 6, 2022, 4:28 p.m. UTC | #11
On 12/6/22 17:05, Peter Xu wrote:
> On Tue, Dec 06, 2022 at 02:16:32PM +0100, Eric Auger wrote:
>> Hi Peter,
>> On 12/6/22 00:28, Peter Xu wrote:
>>> On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
>>>> On Fri, Dec 2, 2022 at 12:25 AM Peter Xu <peterx@redhat.com> wrote:
>>>>> It seems not super clear on when iova_tree is used, and why.  Add a rich
>>>>> comment above iova_tree to track why we needed the iova_tree, and when we
>>>>> need it.
>>>>>
>>>>> Suggested-by: Jason Wang <jasowang@redhat.com>
>>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>>>> ---
>>>>>  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
>>>>>  1 file changed, 29 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>>>> index 46d973e629..8d130ab2e3 100644
>>>>> --- a/include/hw/i386/intel_iommu.h
>>>>> +++ b/include/hw/i386/intel_iommu.h
>>>>> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
>>>>>      QLIST_ENTRY(VTDAddressSpace) next;
>>>>>      /* Superset of notifier flags that this address space has */
>>>>>      IOMMUNotifierFlag notifier_flags;
>>>>> -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
>>>>> +    /*
>>>>> +     * @iova_tree traces mapped IOVA ranges.
>>>>> +     *
>>>>> +     * The tree is not needed if no MAP notifiers is registered with
>>>>> +     * current VTD address space, because all UNMAP (including iotlb or
>>>>> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
>>>>> +     * notifiers.
>>>> So this means the UNMAP notifier doesn't need to be as accurate as
>>>> MAP. (Should we document it in the notifier headers)?
>>> Yes.
>>>
>>>> For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
>>> IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
>>> b+Y] as long as the range covers [a, b]?
>>>
>>>>> +     *
>>>>> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
>>>>> +     * reasons.
>>>>> +     *
>>>>> +     * Firstly, there's no way to identify whether an PSI event is MAP or
>>>>> +     * UNMAP within the PSI message itself.  Without having prior knowledge
>>>>> +     * of existing state vIOMMU doesn't know whether it should notify MAP
>>>>> +     * or UNMAP for a PSI message it received.
>>>>> +     *
>>>>> +     * Secondly, PSI received from guest driver (or even a large PSI can
>>>>> +     * grow into a DSI at least with Linux intel-iommu driver) can be
>>>>> +     * larger in range than the newly mapped ranges for either MAP or UNMAP
>>>>> +     * events.
>>>> Yes, so I think we need a document that the UNMAP handler should be
>>>> prepared for this.
>>> How about I squash below into this same patch?
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 91f8a2395a..c83bd11a68 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
>>>  /*
>>>   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>>>   * register with one or multiple IOMMU Notifier capability bit(s).
>>> + *
>>> + * Normally there're two use cases for the notifiers:
>>> + *
>>> + *   (1) When the device needs accurate synchronizations of the vIOMMU page
>> accurate synchronizations sound too vague & subjective to me.
> Suggestions?
Well I would say:
when the notified device maintains a shadow page table and must to be
notified on each guest MAP (page table entry creation) and UNMAP
(invalidation) events (VFIO). Both notifications must be accurate so
that the shadow page table is fully in sync with the guest view.
>
>>> + *       tables, it needs to register with both MAP|UNMAP notifies (which
>>> + *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
>>> + *       events are registered, the notifications will be accurate but
>>> + *       there's overhead on synchronizing the guest vIOMMU page tables.
>>> + *
>>> + *   (2) When the device doesn't need accurate synchronizations of the
>>> + *       vIOMMU page tables (when the device can both cache translations
>>> + *       and requesting to translate dynamically during DMA process), it
when the notified device maintains a cache of IOMMU translations (IOTLB)
and is able to fill that cache by requesting translations from the
vIOMMU through a protocol similar to ATS. In that case the notified
device only needs to register an UNMAP notifier. In that case the unmap
notifications are allower to be wider than the strict necessary.

However the problem is since you need to satisfy the VFIO use case, how
do you detect when you are allowed to invalidate more that the strict
necessary?

Eric
 
>> s/requesting/request
>>> + *       needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
>> would be nice to clarify the distinction between both then
>>> + *       Note that in such working mode shadow page table is not used for
>>> + *       vIOMMU unit on this address space, so the UNMAP messages can be
>> I do not catch 'is not used for vIOMMU unit on this address space'
> How about: "Note that in this working mode the vIOMMU will not maintain a
> shadowed page table for the address space, and the UNMAP messages can be.."?
>
>>> + *       actually larger than the real invalidations (just like how the
>>> + *       Linux IOMMU driver normally works, where an invalidation can be
>>> + *       enlarged as long as it still covers the target range).
>>>   */
>>>  typedef enum {
>>>      IOMMU_NOTIFIER_NONE = 0,
>>>
>>> Thanks,
>>>
>> Thanks
>>
>> Eric
>>
Peter Xu Dec. 6, 2022, 10:09 p.m. UTC | #12
On Tue, Dec 06, 2022 at 05:28:01PM +0100, Eric Auger wrote:
> 
> 
> On 12/6/22 17:05, Peter Xu wrote:
> > On Tue, Dec 06, 2022 at 02:16:32PM +0100, Eric Auger wrote:
> >> Hi Peter,
> >> On 12/6/22 00:28, Peter Xu wrote:
> >>> On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
> >>>> On Fri, Dec 2, 2022 at 12:25 AM Peter Xu <peterx@redhat.com> wrote:
> >>>>> It seems not super clear on when iova_tree is used, and why.  Add a rich
> >>>>> comment above iova_tree to track why we needed the iova_tree, and when we
> >>>>> need it.
> >>>>>
> >>>>> Suggested-by: Jason Wang <jasowang@redhat.com>
> >>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
> >>>>> ---
> >>>>>  include/hw/i386/intel_iommu.h | 30 +++++++++++++++++++++++++++++-
> >>>>>  1 file changed, 29 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> >>>>> index 46d973e629..8d130ab2e3 100644
> >>>>> --- a/include/hw/i386/intel_iommu.h
> >>>>> +++ b/include/hw/i386/intel_iommu.h
> >>>>> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> >>>>>      QLIST_ENTRY(VTDAddressSpace) next;
> >>>>>      /* Superset of notifier flags that this address space has */
> >>>>>      IOMMUNotifierFlag notifier_flags;
> >>>>> -    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
> >>>>> +    /*
> >>>>> +     * @iova_tree traces mapped IOVA ranges.
> >>>>> +     *
> >>>>> +     * The tree is not needed if no MAP notifiers is registered with
> >>>>> +     * current VTD address space, because all UNMAP (including iotlb or
> >>>>> +     * dev-iotlb) events can be transparently delivered to !MAP iommu
> >>>>> +     * notifiers.
> >>>> So this means the UNMAP notifier doesn't need to be as accurate as
> >>>> MAP. (Should we document it in the notifier headers)?
> >>> Yes.
> >>>
> >>>> For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
> >>> IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
> >>> b+Y] as long as the range covers [a, b]?
> >>>
> >>>>> +     *
> >>>>> +     * The tree OTOH is required for MAP typed iommu notifiers for a few
> >>>>> +     * reasons.
> >>>>> +     *
> >>>>> +     * Firstly, there's no way to identify whether an PSI event is MAP or
> >>>>> +     * UNMAP within the PSI message itself.  Without having prior knowledge
> >>>>> +     * of existing state vIOMMU doesn't know whether it should notify MAP
> >>>>> +     * or UNMAP for a PSI message it received.
> >>>>> +     *
> >>>>> +     * Secondly, PSI received from guest driver (or even a large PSI can
> >>>>> +     * grow into a DSI at least with Linux intel-iommu driver) can be
> >>>>> +     * larger in range than the newly mapped ranges for either MAP or UNMAP
> >>>>> +     * events.
> >>>> Yes, so I think we need a document that the UNMAP handler should be
> >>>> prepared for this.
> >>> How about I squash below into this same patch?
> >>>
> >>> diff --git a/include/exec/memory.h b/include/exec/memory.h
> >>> index 91f8a2395a..c83bd11a68 100644
> >>> --- a/include/exec/memory.h
> >>> +++ b/include/exec/memory.h
> >>> @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
> >>>  /*
> >>>   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
> >>>   * register with one or multiple IOMMU Notifier capability bit(s).
> >>> + *
> >>> + * Normally there're two use cases for the notifiers:
> >>> + *
> >>> + *   (1) When the device needs accurate synchronizations of the vIOMMU page
> >> accurate synchronizations sound too vague & subjective to me.
> > Suggestions?
> Well I would say:
> when the notified device maintains a shadow page table and must to be

s/to//

> notified on each guest MAP (page table entry creation) and UNMAP
> (invalidation) events (VFIO). Both notifications must be accurate so
> that the shadow page table is fully in sync with the guest view.

Thanks, I'll try to squash this into the new version.

> >
> >>> + *       tables, it needs to register with both MAP|UNMAP notifies (which
> >>> + *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
> >>> + *       events are registered, the notifications will be accurate but
> >>> + *       there's overhead on synchronizing the guest vIOMMU page tables.
> >>> + *
> >>> + *   (2) When the device doesn't need accurate synchronizations of the
> >>> + *       vIOMMU page tables (when the device can both cache translations
> >>> + *       and requesting to translate dynamically during DMA process), it
> when the notified device maintains a cache of IOMMU translations (IOTLB)
> and is able to fill that cache by requesting translations from the
> vIOMMU through a protocol similar to ATS. In that case the notified
> device only needs to register an UNMAP notifier. In that case the unmap
> notifications are allower to be wider than the strict necessary.

Same here.

> 
> However the problem is since you need to satisfy the VFIO use case, how
> do you detect when you are allowed to invalidate more that the strict
> necessary?

We detect that by checking whether the vtd_as has map notifier registered.
Please feel free to have a look at all sites of vtd_as_has_map_notifier().
We maintain the iova tree only for MAP case currently.
diff mbox series

Patch

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 46d973e629..8d130ab2e3 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -109,7 +109,35 @@  struct VTDAddressSpace {
     QLIST_ENTRY(VTDAddressSpace) next;
     /* Superset of notifier flags that this address space has */
     IOMMUNotifierFlag notifier_flags;
-    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
+    /*
+     * @iova_tree traces mapped IOVA ranges.
+     *
+     * The tree is not needed if no MAP notifiers is registered with
+     * current VTD address space, because all UNMAP (including iotlb or
+     * dev-iotlb) events can be transparently delivered to !MAP iommu
+     * notifiers.
+     *
+     * The tree OTOH is required for MAP typed iommu notifiers for a few
+     * reasons.
+     *
+     * Firstly, there's no way to identify whether an PSI event is MAP or
+     * UNMAP within the PSI message itself.  Without having prior knowledge
+     * of existing state vIOMMU doesn't know whether it should notify MAP
+     * or UNMAP for a PSI message it received.
+     *
+     * Secondly, PSI received from guest driver (or even a large PSI can
+     * grow into a DSI at least with Linux intel-iommu driver) can be
+     * larger in range than the newly mapped ranges for either MAP or UNMAP
+     * events. If it directly pass-throughs any such event it may confuse
+     * the registered drivers (e.g. vfio-pci) on either: (1) trying to map
+     * the same region more than once (for VFIO_IOMMU_MAP_DMA, -EEXIST will
+     * trigger), or (2) trying to UNMAP a range that is still partially
+     * mapped.  That accuracy is not required for UNMAP-only notifiers, but
+     * it is a must-to-have for MAP-inclusive notifiers, because the vIOMMU
+     * needs to make sure the shadow page table is always in sync with the
+     * guest IOMMU pgtables for a device.
+     */
+    IOVATree *iova_tree;
 };
 
 struct VTDIOTLBEntry {