diff mbox series

[PULL,49/53] intel_iommu: Fix a potential issue in VFIO dirty page sync

Message ID d9acc3fa87f9e6cbedd6702f6d042ffe81a40b49.1687782442.git.mst@redhat.com
State New
Headers show
Series [PULL,01/53] bswap: Add the ability to store to an unaligned 24 bit field | expand

Commit Message

Michael S. Tsirkin June 26, 2023, 12:30 p.m. UTC
From: Zhenzhong Duan <zhenzhong.duan@intel.com>

Peter Xu found a potential issue:

"The other thing is when I am looking at the new code I found that we
actually extended the replay() to be used also in dirty tracking of vfio,
in vfio_sync_dirty_bitmap().  For that maybe it's already broken if
unmap_all() because afaiu log_sync() can be called in migration thread
anytime during DMA so I think it means the device is prone to DMA with the
IOMMU pgtable quickly erased and rebuilt here, which means the DMA could
fail unexpectedly.  Copy Alex, Kirti and Neo."

Fix it by replacing the unmap_all() to only evacuate the iova tree
(keeping all host mappings untouched, IOW, don't notify UNMAP), and
do a full resync in page walk which will notify all existing mappings
as MAP. This way we don't interrupt with any existing mapping if there
is (e.g. for the dirty sync case), meanwhile we keep sync too to latest
(for moving a vfio device into an existing iommu group).

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230615032626.314476-2-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)
diff mbox series

Patch

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 94d52f4205..34af12f392 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3825,13 +3825,10 @@  static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
     IntelIOMMUState *s = vtd_as->iommu_state;
     uint8_t bus_n = pci_bus_num(vtd_as->bus);
     VTDContextEntry ce;
+    DMAMap map = { .iova = 0, .size = HWADDR_MAX };
 
-    /*
-     * The replay can be triggered by either a invalidation or a newly
-     * created entry. No matter what, we release existing mappings
-     * (it means flushing caches for UNMAP-only registers).
-     */
-    vtd_address_space_unmap(vtd_as, n);
+    /* replay is protected by BQL, page walk will re-setup it safely */
+    iova_tree_remove(vtd_as->iova_tree, map);
 
     if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
         trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :