diff mbox series

intel_iommu: handle invalid ce for shadow sync

Message ID 20180913075517.11140-1-peterx@redhat.com
State New
Headers show
Series intel_iommu: handle invalid ce for shadow sync | expand

Commit Message

Peter Xu Sept. 13, 2018, 7:55 a.m. UTC
There are two callers for vtd_sync_shadow_page_table_range(), one
provided a valid context entry and one not.  Move that fetching
operation into the caller vtd_sync_shadow_page_table() where we need to
fetch the context entry.

Meanwhile, we should handle VTD_FR_CONTEXT_ENTRY_P properly when
synchronizing shadow page tables.  Having invalid context entry there is
perfectly valid when we move a device out of an existing domain.  When
that happens, instead of posting an error we invalidate the whole region.

Without this patch, QEMU will crash if we do these steps:

(1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe)
(2) bind the NICs with vfio-pci in the guest
(3) start testpmd with the NICs applied
(4) stop testpmd
(5) rebind the NIC back to ixgbe kernel driver

The patch should fix it.

Reported-by: Pei Zhang <pezhang@redhat.com>
Tested-by: Pei Zhang <pezhang@redhat.com>
CC: Pei Zhang <pezhang@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Maxime Coquelin <maxime.coquelin@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: QEMU Stable <qemu-stable@nongnu.org>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1627272
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 54 ++++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 21 deletions(-)

Comments

Maxime Coquelin Sept. 13, 2018, 8:16 a.m. UTC | #1
Hi Peter,

On 09/13/2018 09:55 AM, Peter Xu wrote:
> There are two callers for vtd_sync_shadow_page_table_range(), one
> provided a valid context entry and one not.  Move that fetching
> operation into the caller vtd_sync_shadow_page_table() where we need to
> fetch the context entry.
> 
> Meanwhile, we should handle VTD_FR_CONTEXT_ENTRY_P properly when
> synchronizing shadow page tables.  Having invalid context entry there is
> perfectly valid when we move a device out of an existing domain.  When
> that happens, instead of posting an error we invalidate the whole region.
> 
> Without this patch, QEMU will crash if we do these steps:
> 
> (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe)
> (2) bind the NICs with vfio-pci in the guest
> (3) start testpmd with the NICs applied
> (4) stop testpmd
> (5) rebind the NIC back to ixgbe kernel driver
> 
> The patch should fix it.
> 
> Reported-by: Pei Zhang<pezhang@redhat.com>
> Tested-by: Pei Zhang<pezhang@redhat.com>
> CC: Pei Zhang<pezhang@redhat.com>
> CC: Alex Williamson<alex.williamson@redhat.com>
> CC: Jason Wang<jasowang@redhat.com>
> CC: Maxime Coquelin<maxime.coquelin@redhat.com>
> CC: Michael S. Tsirkin<mst@redhat.com>
> CC: QEMU Stable<qemu-stable@nongnu.org>
> Fixes:https://bugzilla.redhat.com/show_bug.cgi?id=1627272

It seems like a regression as it wasn't reported earlier, isn't it?
If it is, do you know what is the faulty commit?

> Signed-off-by: Peter Xu<peterx@redhat.com>
> ---
>   hw/i386/intel_iommu.c | 54 ++++++++++++++++++++++++++-----------------
>   1 file changed, 33 insertions(+), 21 deletions(-)

Other than that, the patch looks good to me.
FWIW:
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime
Peter Xu Sept. 13, 2018, 8:33 a.m. UTC | #2
On Thu, Sep 13, 2018 at 10:16:20AM +0200, Maxime Coquelin wrote:
> Hi Peter,
> 
> On 09/13/2018 09:55 AM, Peter Xu wrote:
> > There are two callers for vtd_sync_shadow_page_table_range(), one
> > provided a valid context entry and one not.  Move that fetching
> > operation into the caller vtd_sync_shadow_page_table() where we need to
> > fetch the context entry.
> > 
> > Meanwhile, we should handle VTD_FR_CONTEXT_ENTRY_P properly when
> > synchronizing shadow page tables.  Having invalid context entry there is
> > perfectly valid when we move a device out of an existing domain.  When
> > that happens, instead of posting an error we invalidate the whole region.
> > 
> > Without this patch, QEMU will crash if we do these steps:
> > 
> > (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe)
> > (2) bind the NICs with vfio-pci in the guest
> > (3) start testpmd with the NICs applied
> > (4) stop testpmd
> > (5) rebind the NIC back to ixgbe kernel driver
> > 
> > The patch should fix it.
> > 
> > Reported-by: Pei Zhang<pezhang@redhat.com>
> > Tested-by: Pei Zhang<pezhang@redhat.com>
> > CC: Pei Zhang<pezhang@redhat.com>
> > CC: Alex Williamson<alex.williamson@redhat.com>
> > CC: Jason Wang<jasowang@redhat.com>
> > CC: Maxime Coquelin<maxime.coquelin@redhat.com>
> > CC: Michael S. Tsirkin<mst@redhat.com>
> > CC: QEMU Stable<qemu-stable@nongnu.org>
> > Fixes:https://bugzilla.redhat.com/show_bug.cgi?id=1627272
> 
> It seems like a regression as it wasn't reported earlier, isn't it?
> If it is, do you know what is the faulty commit?

I think it should be 63b88968f1 ("intel-iommu: rework the page walk
logic", 2018-05-23).  The old code would possibly unmap all (using the
old replay logic) before we introduce the shadow page sync helpers and
I must have overlooked on that point.

> 
> > Signed-off-by: Peter Xu<peterx@redhat.com>
> > ---
> >   hw/i386/intel_iommu.c | 54 ++++++++++++++++++++++++++-----------------
> >   1 file changed, 33 insertions(+), 21 deletions(-)
> 
> Other than that, the patch looks good to me.
> FWIW:
> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Peter Xu Sept. 28, 2018, 5:23 a.m. UTC | #3
On Thu, Sep 13, 2018 at 03:55:17PM +0800, Peter Xu wrote:
> There are two callers for vtd_sync_shadow_page_table_range(), one
> provided a valid context entry and one not.  Move that fetching
> operation into the caller vtd_sync_shadow_page_table() where we need to
> fetch the context entry.
> 
> Meanwhile, we should handle VTD_FR_CONTEXT_ENTRY_P properly when
> synchronizing shadow page tables.  Having invalid context entry there is
> perfectly valid when we move a device out of an existing domain.  When
> that happens, instead of posting an error we invalidate the whole region.
> 
> Without this patch, QEMU will crash if we do these steps:
> 
> (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe)
> (2) bind the NICs with vfio-pci in the guest
> (3) start testpmd with the NICs applied
> (4) stop testpmd
> (5) rebind the NIC back to ixgbe kernel driver
> 
> The patch should fix it.

Ping?

Regards,
Eric Auger Oct. 1, 2018, 11:36 a.m. UTC | #4
Hi Peter,
On 9/13/18 9:55 AM, Peter Xu wrote:
> There are two callers for vtd_sync_shadow_page_table_range(), one
> provided a valid context entry and one not.  Move that fetching
> operation into the caller vtd_sync_shadow_page_table() where we need to
> fetch the context entry.
> 
> Meanwhile, we should handle VTD_FR_CONTEXT_ENTRY_P properly when
> synchronizing shadow page tables.  Having invalid context entry there is
> perfectly valid when we move a device out of an existing domain.  When
> that happens, instead of posting an error we invalidate the whole region.
> 
> Without this patch, QEMU will crash if we do these steps:
> 
> (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe)
> (2) bind the NICs with vfio-pci in the guest
> (3) start testpmd with the NICs applied
> (4) stop testpmd
> (5) rebind the NIC back to ixgbe kernel driver
> 
> The patch should fix it.
> 
> Reported-by: Pei Zhang <pezhang@redhat.com>
> Tested-by: Pei Zhang <pezhang@redhat.com>
> CC: Pei Zhang <pezhang@redhat.com>
> CC: Alex Williamson <alex.williamson@redhat.com>
> CC: Jason Wang <jasowang@redhat.com>
> CC: Maxime Coquelin <maxime.coquelin@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: QEMU Stable <qemu-stable@nongnu.org>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1627272
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 54 ++++++++++++++++++++++++++-----------------
>  1 file changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 3dfada19a6..2509520d6f 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -37,6 +37,8 @@
>  #include "kvm_i386.h"
>  #include "trace.h"
>  
> +static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
> +
>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                              uint64_t wmask, uint64_t w1cmask)
>  {
Comment above is outdated:
/* If context entry is NULL, we'll try to fetch it on our own. */
> @@ -1047,39 +1049,49 @@ static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
>          .notify_unmap = true,
>          .aw = s->aw_bits,
>          .as = vtd_as,
> +        .domain_id = VTD_CONTEXT_ENTRY_DID(ce->hi),
>      };
> -    VTDContextEntry ce_cache;
> +
> +    return vtd_page_walk(ce, addr, addr + size, &info);
> +}
Maybe change would gain in clarity if split into 2 patches, code
reorganization and fix on the side.

Thanks

Eric
> +
> +static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> +{
>      int ret;
> +    VTDContextEntry ce;
> +    IOMMUNotifier *n;
>  
> -    if (ce) {
> -        /* If the caller provided context entry, use it */
> -        ce_cache = *ce;
> -    } else {
> -        /* If the caller didn't provide ce, try to fetch */
> -        ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> -                                       vtd_as->devfn, &ce_cache);
> -        if (ret) {
> +    ret = vtd_dev_to_context_entry(vtd_as->iommu_state,
> +                                   pci_bus_num(vtd_as->bus),
> +                                   vtd_as->devfn, &ce);
> +    if (ret) {
> +        if (ret == -VTD_FR_CONTEXT_ENTRY_P) {
> +            /*
> +             * It's a valid scenario to have a context entry that is
> +             * not present.  For example, when a device is removed
> +             * from an existing domain then the context entry will be
> +             * zeroed by the guest before it was put into another
> +             * domain.  When this happens, instead of synchronizing
> +             * the shadow pages we should invalidate all existing
> +             * mappings and notify the backends.
> +             */
> +            IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
> +                vtd_address_space_unmap(vtd_as, n);
> +            }
> +        } else {
>              /*
>               * This should not really happen, but in case it happens,
>               * we just skip the sync for this time.  After all we even
>               * don't have the root table pointer!
>               */
>              error_report_once("%s: invalid context entry for bus 0x%x"
> -                              " devfn 0x%x",
> -                              __func__, pci_bus_num(vtd_as->bus),
> -                              vtd_as->devfn);
> -            return 0;
> +                              " devfn 0x%x", __func__,
> +                              pci_bus_num(vtd_as->bus), vtd_as->devfn);
>          }
> +        return 0;
>      }
>  
> -    info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
> -
> -    return vtd_page_walk(&ce_cache, addr, addr + size, &info);
> -}
> -
> -static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> -{
> -    return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
> +    return vtd_sync_shadow_page_table_range(vtd_as, &ce, 0, UINT64_MAX);
>  }
>  
>  /*
>
Jason Wang Oct. 8, 2018, 3:08 a.m. UTC | #5
On 2018年09月13日 15:55, Peter Xu wrote:
> There are two callers for vtd_sync_shadow_page_table_range(), one
> provided a valid context entry and one not.  Move that fetching
> operation into the caller vtd_sync_shadow_page_table() where we need to
> fetch the context entry.
>
> Meanwhile, we should handle VTD_FR_CONTEXT_ENTRY_P properly when
> synchronizing shadow page tables.  Having invalid context entry there is
> perfectly valid when we move a device out of an existing domain.  When
> that happens, instead of posting an error we invalidate the whole region.
>
> Without this patch, QEMU will crash if we do these steps:
>
> (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe)
> (2) bind the NICs with vfio-pci in the guest
> (3) start testpmd with the NICs applied
> (4) stop testpmd
> (5) rebind the NIC back to ixgbe kernel driver
>
> The patch should fix it.
>
> Reported-by: Pei Zhang <pezhang@redhat.com>
> Tested-by: Pei Zhang <pezhang@redhat.com>
> CC: Pei Zhang <pezhang@redhat.com>
> CC: Alex Williamson <alex.williamson@redhat.com>
> CC: Jason Wang <jasowang@redhat.com>
> CC: Maxime Coquelin <maxime.coquelin@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: QEMU Stable <qemu-stable@nongnu.org>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1627272
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   hw/i386/intel_iommu.c | 54 ++++++++++++++++++++++++++-----------------
>   1 file changed, 33 insertions(+), 21 deletions(-)

Reviewed-by: Jason Wang <jasowang@redhat.com>

Some nits, see below.

>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 3dfada19a6..2509520d6f 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -37,6 +37,8 @@
>   #include "kvm_i386.h"
>   #include "trace.h"
>   
> +static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
> +
>   static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                               uint64_t wmask, uint64_t w1cmask)
>   {
> @@ -1047,39 +1049,49 @@ static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
>           .notify_unmap = true,
>           .aw = s->aw_bits,
>           .as = vtd_as,
> +        .domain_id = VTD_CONTEXT_ENTRY_DID(ce->hi),
>       };
> -    VTDContextEntry ce_cache;
> +
> +    return vtd_page_walk(ce, addr, addr + size, &info);
> +}
> +
> +static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> +{
>       int ret;
> +    VTDContextEntry ce;
> +    IOMMUNotifier *n;
>   
> -    if (ce) {
> -        /* If the caller provided context entry, use it */
> -        ce_cache = *ce;
> -    } else {
> -        /* If the caller didn't provide ce, try to fetch */
> -        ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> -                                       vtd_as->devfn, &ce_cache);
> -        if (ret) {
> +    ret = vtd_dev_to_context_entry(vtd_as->iommu_state,
> +                                   pci_bus_num(vtd_as->bus),
> +                                   vtd_as->devfn, &ce);
> +    if (ret) {
> +        if (ret == -VTD_FR_CONTEXT_ENTRY_P) {
> +            /*
> +             * It's a valid scenario to have a context entry that is
> +             * not present.  For example, when a device is removed
> +             * from an existing domain then the context entry will be
> +             * zeroed by the guest before it was put into another
> +             * domain.  When this happens, instead of synchronizing
> +             * the shadow pages we should invalidate all existing
> +             * mappings and notify the backends.
> +             */
> +            IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
> +                vtd_address_space_unmap(vtd_as, n);
> +            }
> +        } else {
>               /*
>                * This should not really happen, but in case it happens,
>                * we just skip the sync for this time.  After all we even
>                * don't have the root table pointer!
>                */

It looks to me the comment is not accurate, no root pointer is not the 
only reason for the failure of vtd_dev_to_context_entry().

>               error_report_once("%s: invalid context entry for bus 0x%x"
> -                              " devfn 0x%x",
> -                              __func__, pci_bus_num(vtd_as->bus),
> -                              vtd_as->devfn);
> -            return 0;

I'm not quite sure error_report_once() is really needed here since all 
failures has been traced.

> +                              " devfn 0x%x", __func__,
> +                              pci_bus_num(vtd_as->bus), vtd_as->devfn);
>           }
> +        return 0;
>       }
>   
> -    info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
> -
> -    return vtd_page_walk(&ce_cache, addr, addr + size, &info);
> -}
> -
> -static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> -{
> -    return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
> +    return vtd_sync_shadow_page_table_range(vtd_as, &ce, 0, UINT64_MAX);
>   }

As has been discussed, this will left addr UINT64_MAX, it's better to 
have [start, end] instead of (start, range).

Thanks

>   
>   /*
Peter Xu Oct. 8, 2018, 5:59 a.m. UTC | #6
On Mon, Oct 01, 2018 at 01:36:50PM +0200, Auger Eric wrote:
> Hi Peter,
> On 9/13/18 9:55 AM, Peter Xu wrote:
> > There are two callers for vtd_sync_shadow_page_table_range(), one
> > provided a valid context entry and one not.  Move that fetching
> > operation into the caller vtd_sync_shadow_page_table() where we need to
> > fetch the context entry.
> > 
> > Meanwhile, we should handle VTD_FR_CONTEXT_ENTRY_P properly when
> > synchronizing shadow page tables.  Having invalid context entry there is
> > perfectly valid when we move a device out of an existing domain.  When
> > that happens, instead of posting an error we invalidate the whole region.
> > 
> > Without this patch, QEMU will crash if we do these steps:
> > 
> > (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe)
> > (2) bind the NICs with vfio-pci in the guest
> > (3) start testpmd with the NICs applied
> > (4) stop testpmd
> > (5) rebind the NIC back to ixgbe kernel driver
> > 
> > The patch should fix it.
> > 
> > Reported-by: Pei Zhang <pezhang@redhat.com>
> > Tested-by: Pei Zhang <pezhang@redhat.com>
> > CC: Pei Zhang <pezhang@redhat.com>
> > CC: Alex Williamson <alex.williamson@redhat.com>
> > CC: Jason Wang <jasowang@redhat.com>
> > CC: Maxime Coquelin <maxime.coquelin@redhat.com>
> > CC: Michael S. Tsirkin <mst@redhat.com>
> > CC: QEMU Stable <qemu-stable@nongnu.org>
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1627272
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c | 54 ++++++++++++++++++++++++++-----------------
> >  1 file changed, 33 insertions(+), 21 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 3dfada19a6..2509520d6f 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -37,6 +37,8 @@
> >  #include "kvm_i386.h"
> >  #include "trace.h"
> >  
> > +static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
> > +
> >  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
> >                              uint64_t wmask, uint64_t w1cmask)
> >  {
> Comment above is outdated:
> /* If context entry is NULL, we'll try to fetch it on our own. */

Indeed.

> > @@ -1047,39 +1049,49 @@ static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
> >          .notify_unmap = true,
> >          .aw = s->aw_bits,
> >          .as = vtd_as,
> > +        .domain_id = VTD_CONTEXT_ENTRY_DID(ce->hi),
> >      };
> > -    VTDContextEntry ce_cache;
> > +
> > +    return vtd_page_walk(ce, addr, addr + size, &info);
> > +}
> Maybe change would gain in clarity if split into 2 patches, code
> reorganization and fix on the side.

Sure.  Thanks for reviewing!
Peter Xu Oct. 8, 2018, 6:06 a.m. UTC | #7
On Mon, Oct 08, 2018 at 11:08:31AM +0800, Jason Wang wrote:

[...]

> > +static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> > +{
> >       int ret;
> > +    VTDContextEntry ce;
> > +    IOMMUNotifier *n;
> > -    if (ce) {
> > -        /* If the caller provided context entry, use it */
> > -        ce_cache = *ce;
> > -    } else {
> > -        /* If the caller didn't provide ce, try to fetch */
> > -        ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > -                                       vtd_as->devfn, &ce_cache);
> > -        if (ret) {
> > +    ret = vtd_dev_to_context_entry(vtd_as->iommu_state,
> > +                                   pci_bus_num(vtd_as->bus),
> > +                                   vtd_as->devfn, &ce);
> > +    if (ret) {
> > +        if (ret == -VTD_FR_CONTEXT_ENTRY_P) {
> > +            /*
> > +             * It's a valid scenario to have a context entry that is
> > +             * not present.  For example, when a device is removed
> > +             * from an existing domain then the context entry will be
> > +             * zeroed by the guest before it was put into another
> > +             * domain.  When this happens, instead of synchronizing
> > +             * the shadow pages we should invalidate all existing
> > +             * mappings and notify the backends.
> > +             */
> > +            IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
> > +                vtd_address_space_unmap(vtd_as, n);
> > +            }
> > +        } else {
> >               /*
> >                * This should not really happen, but in case it happens,
> >                * we just skip the sync for this time.  After all we even
> >                * don't have the root table pointer!
> >                */
> 
> It looks to me the comment is not accurate, no root pointer is not the only
> reason for the failure of vtd_dev_to_context_entry().
> 
> >               error_report_once("%s: invalid context entry for bus 0x%x"
> > -                              " devfn 0x%x",
> > -                              __func__, pci_bus_num(vtd_as->bus),
> > -                              vtd_as->devfn);
> > -            return 0;
> 
> I'm not quite sure error_report_once() is really needed here since all
> failures has been traced.

True; I'll then consider have all of them to be error_report_once()
and drop the one here.

> 
> > +                              " devfn 0x%x", __func__,
> > +                              pci_bus_num(vtd_as->bus), vtd_as->devfn);
> >           }
> > +        return 0;
> >       }
> > -    info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
> > -
> > -    return vtd_page_walk(&ce_cache, addr, addr + size, &info);
> > -}
> > -
> > -static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> > -{
> > -    return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
> > +    return vtd_sync_shadow_page_table_range(vtd_as, &ce, 0, UINT64_MAX);
> >   }
> 
> As has been discussed, this will left addr UINT64_MAX, it's better to have
> [start, end] instead of (start, range).

Hmm, this size is inclusive, so we should be fine.  Though I'll take
your advise to use start/end pair to be clearer.

Thanks!
Peter Xu Oct. 8, 2018, 6:33 a.m. UTC | #8
On Mon, Oct 08, 2018 at 02:06:20PM +0800, Peter Xu wrote:
> > > -static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> > > -{
> > > -    return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
> > > +    return vtd_sync_shadow_page_table_range(vtd_as, &ce, 0, UINT64_MAX);
> > >   }
> > 
> > As has been discussed, this will left addr UINT64_MAX, it's better to have
> > [start, end] instead of (start, range).
> 
> Hmm, this size is inclusive, so we should be fine.  Though I'll take
> your advise to use start/end pair to be clearer.

Sorry it's not...

Actually vtd_page_walk() itself is not inclusive, so we will need to
touch that up if we want to let the whole stack use the [start, end]
inclusive way.  However I would try not to bother with it since after
all page sync operation is per-small-page granularity, so imho missing
the last addr UINT64_MAX is ok (as long as we're covering the last
page, which is UINT64_MAX - PAGE_SIZE + 1).  I'm not sure whether
it'll worth it then to change the whole stack into inclusive way... so
I'll temporarily keep it as is.

Regards,
diff mbox series

Patch

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3dfada19a6..2509520d6f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -37,6 +37,8 @@ 
 #include "kvm_i386.h"
 #include "trace.h"
 
+static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
@@ -1047,39 +1049,49 @@  static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
         .notify_unmap = true,
         .aw = s->aw_bits,
         .as = vtd_as,
+        .domain_id = VTD_CONTEXT_ENTRY_DID(ce->hi),
     };
-    VTDContextEntry ce_cache;
+
+    return vtd_page_walk(ce, addr, addr + size, &info);
+}
+
+static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
+{
     int ret;
+    VTDContextEntry ce;
+    IOMMUNotifier *n;
 
-    if (ce) {
-        /* If the caller provided context entry, use it */
-        ce_cache = *ce;
-    } else {
-        /* If the caller didn't provide ce, try to fetch */
-        ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
-                                       vtd_as->devfn, &ce_cache);
-        if (ret) {
+    ret = vtd_dev_to_context_entry(vtd_as->iommu_state,
+                                   pci_bus_num(vtd_as->bus),
+                                   vtd_as->devfn, &ce);
+    if (ret) {
+        if (ret == -VTD_FR_CONTEXT_ENTRY_P) {
+            /*
+             * It's a valid scenario to have a context entry that is
+             * not present.  For example, when a device is removed
+             * from an existing domain then the context entry will be
+             * zeroed by the guest before it was put into another
+             * domain.  When this happens, instead of synchronizing
+             * the shadow pages we should invalidate all existing
+             * mappings and notify the backends.
+             */
+            IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
+                vtd_address_space_unmap(vtd_as, n);
+            }
+        } else {
             /*
              * This should not really happen, but in case it happens,
              * we just skip the sync for this time.  After all we even
              * don't have the root table pointer!
              */
             error_report_once("%s: invalid context entry for bus 0x%x"
-                              " devfn 0x%x",
-                              __func__, pci_bus_num(vtd_as->bus),
-                              vtd_as->devfn);
-            return 0;
+                              " devfn 0x%x", __func__,
+                              pci_bus_num(vtd_as->bus), vtd_as->devfn);
         }
+        return 0;
     }
 
-    info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
-
-    return vtd_page_walk(&ce_cache, addr, addr + size, &info);
-}
-
-static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
-{
-    return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
+    return vtd_sync_shadow_page_table_range(vtd_as, &ce, 0, UINT64_MAX);
 }
 
 /*