diff mbox series

[v3] memory: Optimize replay of guest mapping

Message ID 20230413110019.48922-1-zhenzhong.duan@intel.com
State New
Headers show
Series [v3] memory: Optimize replay of guest mapping | expand

Commit Message

Duan, Zhenzhong April 13, 2023, 11 a.m. UTC
On x86, there are two notifiers registered due to vtd-ir memory
region splitting the entire address space. During replay of the
address space for each notifier, the whole address space is
scanned which is unnecessary. We only need to scan the space
belong to notifier monitored space.

While on x86 IOMMU memory region spans over entire address space,
but on some other platforms(e.g. arm mps3-an547), IOMMU memory
region is only a window in the whole address space. user could
register a notifier with arbitrary scope beyond IOMMU memory
region. Though in current implementation replay is only triggered
by VFIO and dirty page sync with notifiers derived from memory
region section, but this isn't guaranteed in the future.

So, we replay the intersection part of IOMMU memory region and
IOMMU notifier in memory_region_iommu_replay().

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v3: Fix assert failure on mps3-an547
v2: Add an assert per Peter
Tested on x86 with a net card passed to guest(kvm/tcg), ping/ssh pass.
Also did simple bootup test with mps3-an547

 hw/i386/intel_iommu.c | 2 +-
 softmmu/memory.c      | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

Comments

Peter Xu April 13, 2023, 6:53 p.m. UTC | #1
On Thu, Apr 13, 2023 at 07:00:19PM +0800, Zhenzhong Duan wrote:
> On x86, there are two notifiers registered due to vtd-ir memory
> region splitting the entire address space. During replay of the
> address space for each notifier, the whole address space is
> scanned which is unnecessary. We only need to scan the space
> belong to notifier monitored space.
> 
> While on x86 IOMMU memory region spans over entire address space,
> but on some other platforms(e.g. arm mps3-an547), IOMMU memory
> region is only a window in the whole address space. user could
> register a notifier with arbitrary scope beyond IOMMU memory
> region. Though in current implementation replay is only triggered
> by VFIO and dirty page sync with notifiers derived from memory
> region section, but this isn't guaranteed in the future.
> 
> So, we replay the intersection part of IOMMU memory region and
> IOMMU notifier in memory_region_iommu_replay().
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>
Peter Maydell April 18, 2023, 10:13 a.m. UTC | #2
On Thu, 13 Apr 2023 at 12:12, Zhenzhong Duan <zhenzhong.duan@intel.com> wrote:
>
> On x86, there are two notifiers registered due to vtd-ir memory
> region splitting the entire address space. During replay of the
> address space for each notifier, the whole address space is
> scanned which is unnecessary. We only need to scan the space
> belong to notifier monitored space.
>
> While on x86 IOMMU memory region spans over entire address space,
> but on some other platforms(e.g. arm mps3-an547), IOMMU memory
> region is only a window in the whole address space. user could
> register a notifier with arbitrary scope beyond IOMMU memory
> region. Though in current implementation replay is only triggered
> by VFIO and dirty page sync with notifiers derived from memory
> region section, but this isn't guaranteed in the future.
>
> So, we replay the intersection part of IOMMU memory region and
> IOMMU notifier in memory_region_iommu_replay().
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v3: Fix assert failure on mps3-an547
> v2: Add an assert per Peter
> Tested on x86 with a net card passed to guest(kvm/tcg), ping/ssh pass.
> Also did simple bootup test with mps3-an547
>
>  hw/i386/intel_iommu.c | 2 +-
>  softmmu/memory.c      | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index a62896759c78..faade7def867 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3850,7 +3850,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>                  .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
>              };
>
> -            vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
> +            vtd_page_walk(s, &ce, n->start, n->end, &info, vtd_as->pasid);
>          }
>      } else {
>          trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index b1a6cae6f583..f7af691991de 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1925,7 +1925,7 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>  {
>      MemoryRegion *mr = MEMORY_REGION(iommu_mr);
>      IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
> -    hwaddr addr, granularity;
> +    hwaddr addr, end, granularity;
>      IOMMUTLBEntry iotlb;
>
>      /* If the IOMMU has its own replay callback, override */
> @@ -1935,8 +1935,9 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>      }
>
>      granularity = memory_region_iommu_get_min_page_size(iommu_mr);
> +    end = MIN(n->end, memory_region_size(mr));
>
> -    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> +    for (addr = n->start; addr < end; addr += granularity) {
>          iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
>          if (iotlb.perm != IOMMU_NONE) {
>              n->notify(n, &iotlb);


The documentation for the replay method of IOMMUMemoryRegionClass
says:
     * The default implementation of memory_region_iommu_replay() is to
     * call the IOMMU translate method for every page in the address space
     * with flag == IOMMU_NONE and then call the notifier if translate
     * returns a valid mapping. If this method is implemented then it
     * overrides the default behaviour, and must provide the full semantics
     * of memory_region_iommu_replay(), by calling @notifier for every
     * translation present in the IOMMU.

This commit changes the default implementation so it's no longer
doing this for every page in the address space. If the change is
correct, we should update the doc comment too.

Oddly, the doc comment for memory_region_iommu_replay() itself
doesn't very clearly state what its semantics are; it could
probably be improved.

Anyway, this change is OK for the TCG use of iommu notifiers,
because that doesn't care about replay.

thanks
-- PMM
Peter Xu April 18, 2023, 1:56 p.m. UTC | #3
On Tue, Apr 18, 2023 at 11:13:57AM +0100, Peter Maydell wrote:
> On Thu, 13 Apr 2023 at 12:12, Zhenzhong Duan <zhenzhong.duan@intel.com> wrote:
> >
> > On x86, there are two notifiers registered due to vtd-ir memory
> > region splitting the entire address space. During replay of the
> > address space for each notifier, the whole address space is
> > scanned which is unnecessary. We only need to scan the space
> > belong to notifier monitored space.
> >
> > While on x86 IOMMU memory region spans over entire address space,
> > but on some other platforms(e.g. arm mps3-an547), IOMMU memory
> > region is only a window in the whole address space. user could
> > register a notifier with arbitrary scope beyond IOMMU memory
> > region. Though in current implementation replay is only triggered
> > by VFIO and dirty page sync with notifiers derived from memory
> > region section, but this isn't guaranteed in the future.
> >
> > So, we replay the intersection part of IOMMU memory region and
> > IOMMU notifier in memory_region_iommu_replay().
> >
> > Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> > ---
> > v3: Fix assert failure on mps3-an547
> > v2: Add an assert per Peter
> > Tested on x86 with a net card passed to guest(kvm/tcg), ping/ssh pass.
> > Also did simple bootup test with mps3-an547
> >
> >  hw/i386/intel_iommu.c | 2 +-
> >  softmmu/memory.c      | 5 +++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index a62896759c78..faade7def867 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -3850,7 +3850,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> >                  .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
> >              };
> >
> > -            vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
> > +            vtd_page_walk(s, &ce, n->start, n->end, &info, vtd_as->pasid);
> >          }
> >      } else {
> >          trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index b1a6cae6f583..f7af691991de 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -1925,7 +1925,7 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> >  {
> >      MemoryRegion *mr = MEMORY_REGION(iommu_mr);
> >      IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
> > -    hwaddr addr, granularity;
> > +    hwaddr addr, end, granularity;
> >      IOMMUTLBEntry iotlb;
> >
> >      /* If the IOMMU has its own replay callback, override */
> > @@ -1935,8 +1935,9 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> >      }
> >
> >      granularity = memory_region_iommu_get_min_page_size(iommu_mr);
> > +    end = MIN(n->end, memory_region_size(mr));
> >
> > -    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> > +    for (addr = n->start; addr < end; addr += granularity) {
> >          iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
> >          if (iotlb.perm != IOMMU_NONE) {
> >              n->notify(n, &iotlb);
> 
> 
> The documentation for the replay method of IOMMUMemoryRegionClass
> says:
>      * The default implementation of memory_region_iommu_replay() is to
>      * call the IOMMU translate method for every page in the address space
>      * with flag == IOMMU_NONE and then call the notifier if translate
>      * returns a valid mapping. If this method is implemented then it
>      * overrides the default behaviour, and must provide the full semantics
>      * of memory_region_iommu_replay(), by calling @notifier for every
>      * translation present in the IOMMU.
> 
> This commit changes the default implementation so it's no longer
> doing this for every page in the address space. If the change is
> correct, we should update the doc comment too.
> 
> Oddly, the doc comment for memory_region_iommu_replay() itself
> doesn't very clearly state what its semantics are; it could
> probably be improved.
> 
> Anyway, this change is OK for the TCG use of iommu notifiers,
> because that doesn't care about replay.

Since the notifier contains the range information I'd say the change
shouldn't affect any caller but only a pure performance difference.  Indeed
it'll be nicer the documentation can be updated too.  Thanks,
Duan, Zhenzhong April 19, 2023, 2:38 a.m. UTC | #4
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Tuesday, April 18, 2023 9:56 PM
>To: Peter Maydell <peter.maydell@linaro.org>
>Cc: Duan, Zhenzhong <zhenzhong.duan@intel.com>; qemu-
>devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>marcel.apfelbaum@gmail.com; pbonzini@redhat.com;
>richard.henderson@linaro.org; eduardo@habkost.net; david@redhat.com;
>philmd@linaro.org
>Subject: Re: [PATCH v3] memory: Optimize replay of guest mapping
>
>On Tue, Apr 18, 2023 at 11:13:57AM +0100, Peter Maydell wrote:
>> On Thu, 13 Apr 2023 at 12:12, Zhenzhong Duan
><zhenzhong.duan@intel.com> wrote:
>> >
>> > On x86, there are two notifiers registered due to vtd-ir memory
>> > region splitting the entire address space. During replay of the
>> > address space for each notifier, the whole address space is scanned
>> > which is unnecessary. We only need to scan the space belong to
>> > notifier monitored space.
>> >
>> > While on x86 IOMMU memory region spans over entire address space,
>> > but on some other platforms(e.g. arm mps3-an547), IOMMU memory
>> > region is only a window in the whole address space. user could
>> > register a notifier with arbitrary scope beyond IOMMU memory region.
>> > Though in current implementation replay is only triggered by VFIO
>> > and dirty page sync with notifiers derived from memory region
>> > section, but this isn't guaranteed in the future.
>> >
>> > So, we replay the intersection part of IOMMU memory region and IOMMU
>> > notifier in memory_region_iommu_replay().
>> >
>> > Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> > ---
>> > v3: Fix assert failure on mps3-an547
>> > v2: Add an assert per Peter
>> > Tested on x86 with a net card passed to guest(kvm/tcg), ping/ssh pass.
>> > Also did simple bootup test with mps3-an547
>> >
>> >  hw/i386/intel_iommu.c | 2 +-
>> >  softmmu/memory.c      | 5 +++--
>> >  2 files changed, 4 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
>> > a62896759c78..faade7def867 100644
>> > --- a/hw/i386/intel_iommu.c
>> > +++ b/hw/i386/intel_iommu.c
>> > @@ -3850,7 +3850,7 @@ static void
>vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>> >                  .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
>> >              };
>> >
>> > -            vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
>> > +            vtd_page_walk(s, &ce, n->start, n->end, &info,
>> > + vtd_as->pasid);
>> >          }
>> >      } else {
>> >          trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
>> > diff --git a/softmmu/memory.c b/softmmu/memory.c index
>> > b1a6cae6f583..f7af691991de 100644
>> > --- a/softmmu/memory.c
>> > +++ b/softmmu/memory.c
>> > @@ -1925,7 +1925,7 @@ void
>> > memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
>IOMMUNotifier *n)  {
>> >      MemoryRegion *mr = MEMORY_REGION(iommu_mr);
>> >      IOMMUMemoryRegionClass *imrc =
>IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
>> > -    hwaddr addr, granularity;
>> > +    hwaddr addr, end, granularity;
>> >      IOMMUTLBEntry iotlb;
>> >
>> >      /* If the IOMMU has its own replay callback, override */ @@
>> > -1935,8 +1935,9 @@ void
>memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
>IOMMUNotifier *n)
>> >      }
>> >
>> >      granularity = memory_region_iommu_get_min_page_size(iommu_mr);
>> > +    end = MIN(n->end, memory_region_size(mr));
>> >
>> > -    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
>> > +    for (addr = n->start; addr < end; addr += granularity) {
>> >          iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n-
>>iommu_idx);
>> >          if (iotlb.perm != IOMMU_NONE) {
>> >              n->notify(n, &iotlb);
>>
>>
>> The documentation for the replay method of IOMMUMemoryRegionClass
>> says:
>>      * The default implementation of memory_region_iommu_replay() is to
>>      * call the IOMMU translate method for every page in the address space
>>      * with flag == IOMMU_NONE and then call the notifier if translate
>>      * returns a valid mapping. If this method is implemented then it
>>      * overrides the default behaviour, and must provide the full semantics
>>      * of memory_region_iommu_replay(), by calling @notifier for every
>>      * translation present in the IOMMU.
>>
>> This commit changes the default implementation so it's no longer doing
>> this for every page in the address space. If the change is correct, we
>> should update the doc comment too.
>>
>> Oddly, the doc comment for memory_region_iommu_replay() itself doesn't
>> very clearly state what its semantics are; it could probably be
>> improved.
>>
>> Anyway, this change is OK for the TCG use of iommu notifiers, because
>> that doesn't care about replay.
>
>Since the notifier contains the range information I'd say the change shouldn't
>affect any caller but only a pure performance difference.  Indeed it'll be nicer
>the documentation can be updated too.  Thanks,
>
>--
>Peter Xu
Thanks Peter Maydell and Peter Xu's comments, will add doc update.
May I ask if it's preferred to add doc update to this patch or a separate patch?

Regards
Zhenzhong
Peter Maydell April 19, 2023, 9:49 a.m. UTC | #5
On Wed, 19 Apr 2023 at 03:38, Duan, Zhenzhong <zhenzhong.duan@intel.com> wrote:

> >> The documentation for the replay method of IOMMUMemoryRegionClass
> >> says:
> >>      * The default implementation of memory_region_iommu_replay() is to
> >>      * call the IOMMU translate method for every page in the address space
> >>      * with flag == IOMMU_NONE and then call the notifier if translate
> >>      * returns a valid mapping. If this method is implemented then it
> >>      * overrides the default behaviour, and must provide the full semantics
> >>      * of memory_region_iommu_replay(), by calling @notifier for every
> >>      * translation present in the IOMMU.
> >>
> >> This commit changes the default implementation so it's no longer doing
> >> this for every page in the address space. If the change is correct, we
> >> should update the doc comment too.
> >>
> >> Oddly, the doc comment for memory_region_iommu_replay() itself doesn't
> >> very clearly state what its semantics are; it could probably be
> >> improved.
> >>
> >> Anyway, this change is OK for the TCG use of iommu notifiers, because
> >> that doesn't care about replay.
> >
> >Since the notifier contains the range information I'd say the change shouldn't
> >affect any caller but only a pure performance difference.  Indeed it'll be nicer
> >the documentation can be updated too.  Thanks,

> Thanks Peter Maydell and Peter Xu's comments, will add doc update.
> May I ask if it's preferred to add doc update to this patch or a separate patch?

I suggest doing it in this same patch.

thanks
-- PMM
diff mbox series

Patch

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a62896759c78..faade7def867 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3850,7 +3850,7 @@  static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
                 .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
             };
 
-            vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
+            vtd_page_walk(s, &ce, n->start, n->end, &info, vtd_as->pasid);
         }
     } else {
         trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
diff --git a/softmmu/memory.c b/softmmu/memory.c
index b1a6cae6f583..f7af691991de 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1925,7 +1925,7 @@  void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
 {
     MemoryRegion *mr = MEMORY_REGION(iommu_mr);
     IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
-    hwaddr addr, granularity;
+    hwaddr addr, end, granularity;
     IOMMUTLBEntry iotlb;
 
     /* If the IOMMU has its own replay callback, override */
@@ -1935,8 +1935,9 @@  void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
     }
 
     granularity = memory_region_iommu_get_min_page_size(iommu_mr);
+    end = MIN(n->end, memory_region_size(mr));
 
-    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
+    for (addr = n->start; addr < end; addr += granularity) {
         iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
         if (iotlb.perm != IOMMU_NONE) {
             n->notify(n, &iotlb);