diff mbox series

[v10,03/12] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support

Message ID 20230209192043.14885-4-avihaih@nvidia.com
State New
Headers show
Series vfio/migration: Implement VFIO migration protocol v2 | expand

Commit Message

Avihai Horon Feb. 9, 2023, 7:20 p.m. UTC
Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked. This is because a DMA-able VFIO device
can dirty RAM pages without updating QEMU about it, thus breaking the
migration.

However, this doesn't mean that migration can't be done at all.
In such case, allow migration and let QEMU VFIO code mark all pages
dirty.

This guarantees that all pages that might have gotten dirty are reported
back, and thus guarantees a valid migration even without VFIO IOMMU
dirty tracking support.

The motivation for this patch is the introduction of iommufd [1].
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into its internal ops, allowing the usage of these IOCTLs
over iommufd. However, VFIO IOMMU dirty tracking is not supported by
this VFIO compatibility API.

This patch will allow migration by hosts that use the VFIO compatibility
API and prevent migration regressions caused by the lack of VFIO IOMMU
dirty tracking support.

[1]
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c    | 20 ++++++++++++++++++--
 hw/vfio/migration.c |  3 +--
 2 files changed, 19 insertions(+), 4 deletions(-)

Comments

Juan Quintela Feb. 15, 2023, 12:43 p.m. UTC | #1
Avihai Horon <avihaih@nvidia.com> wrote:
> Currently, if IOMMU of a VFIO container doesn't support dirty page
> tracking, migration is blocked. This is because a DMA-able VFIO device
> can dirty RAM pages without updating QEMU about it, thus breaking the
> migration.
>
> However, this doesn't mean that migration can't be done at all.
> In such case, allow migration and let QEMU VFIO code mark all pages
> dirty.
>
> This guarantees that all pages that might have gotten dirty are reported
> back, and thus guarantees a valid migration even without VFIO IOMMU
> dirty tracking support.
>
> The motivation for this patch is the introduction of iommufd [1].
> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
> mapping them into its internal ops, allowing the usage of these IOCTLs
> over iommufd. However, VFIO IOMMU dirty tracking is not supported by
> this VFIO compatibility API.
>
> This patch will allow migration by hosts that use the VFIO compatibility
> API and prevent migration regressions caused by the lack of VFIO IOMMU
> dirty tracking support.
>
> [1]
> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I know why you are doing this.

But I think this should print a warning, error, somewhere.

You are just dirtying all pages each time we arrive here.

Even calling the featura "experimental" is an understatement.

Later, Juan.
Avihai Horon Feb. 15, 2023, 5:47 p.m. UTC | #2
On 15/02/2023 14:43, Juan Quintela wrote:
> External email: Use caution opening links or attachments
>
>
> Avihai Horon <avihaih@nvidia.com> wrote:
>> Currently, if IOMMU of a VFIO container doesn't support dirty page
>> tracking, migration is blocked. This is because a DMA-able VFIO device
>> can dirty RAM pages without updating QEMU about it, thus breaking the
>> migration.
>>
>> However, this doesn't mean that migration can't be done at all.
>> In such case, allow migration and let QEMU VFIO code mark all pages
>> dirty.
>>
>> This guarantees that all pages that might have gotten dirty are reported
>> back, and thus guarantees a valid migration even without VFIO IOMMU
>> dirty tracking support.
>>
>> The motivation for this patch is the introduction of iommufd [1].
>> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
>> mapping them into its internal ops, allowing the usage of these IOCTLs
>> over iommufd. However, VFIO IOMMU dirty tracking is not supported by
>> this VFIO compatibility API.
>>
>> This patch will allow migration by hosts that use the VFIO compatibility
>> API and prevent migration regressions caused by the lack of VFIO IOMMU
>> dirty tracking support.
>>
>> [1]
>> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> I know why you are doing this.
>
> But I think this should print a warning, error, somewhere.

IMHO, I'm not sure it's really necessary.

To enable VFIO migration the user must explicitly set x-enable-migration=on.
I guess in this case the user is well aware of the dirty tracking 
capabilities the VFIO device has or doesn't have.
So I don't think adding this error/warning will help much.

Thanks.

>
> You are just dirtying all pages each time we arrive here.
>
> Even calling the featura "experimental" is an understatement.
>
> Later, Juan.
>
Juan Quintela Feb. 15, 2023, 6:04 p.m. UTC | #3
Avihai Horon <avihaih@nvidia.com> wrote:
> On 15/02/2023 14:43, Juan Quintela wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Avihai Horon <avihaih@nvidia.com> wrote:
>>> Currently, if IOMMU of a VFIO container doesn't support dirty page
>>> tracking, migration is blocked. This is because a DMA-able VFIO device
>>> can dirty RAM pages without updating QEMU about it, thus breaking the
>>> migration.
>>>
>>> However, this doesn't mean that migration can't be done at all.
>>> In such case, allow migration and let QEMU VFIO code mark all pages
>>> dirty.
>>>
>>> This guarantees that all pages that might have gotten dirty are reported
>>> back, and thus guarantees a valid migration even without VFIO IOMMU
>>> dirty tracking support.
>>>
>>> The motivation for this patch is the introduction of iommufd [1].
>>> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
>>> mapping them into its internal ops, allowing the usage of these IOCTLs
>>> over iommufd. However, VFIO IOMMU dirty tracking is not supported by
>>> this VFIO compatibility API.
>>>
>>> This patch will allow migration by hosts that use the VFIO compatibility
>>> API and prevent migration regressions caused by the lack of VFIO IOMMU
>>> dirty tracking support.
>>>
>>> [1]
>>> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>
>> I know why you are doing this.
>>
>> But I think this should print a warning, error, somewhere.
>
> IMHO, I'm not sure it's really necessary.
>
> To enable VFIO migration the user must explicitly set x-enable-migration=on.
> I guess in this case the user is well aware of the dirty tracking
> capabilities the VFIO device has or doesn't have.
> So I don't think adding this error/warning will help much.

Oops.  I missed that bit.
I retire my objection.

Sorry, Juan.

> Thanks.
>
>>
>> You are just dirtying all pages each time we arrive here.
>>
>> Even calling the featura "experimental" is an understatement.
>>
>> Later, Juan.
>>
Alex Williamson Feb. 15, 2023, 8:14 p.m. UTC | #4
On Wed, 15 Feb 2023 19:04:33 +0100
Juan Quintela <quintela@redhat.com> wrote:

> Avihai Horon <avihaih@nvidia.com> wrote:
> > On 15/02/2023 14:43, Juan Quintela wrote:  
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Avihai Horon <avihaih@nvidia.com> wrote:  
> >>> Currently, if IOMMU of a VFIO container doesn't support dirty page
> >>> tracking, migration is blocked. This is because a DMA-able VFIO device
> >>> can dirty RAM pages without updating QEMU about it, thus breaking the
> >>> migration.
> >>>
> >>> However, this doesn't mean that migration can't be done at all.
> >>> In such case, allow migration and let QEMU VFIO code mark all pages
> >>> dirty.
> >>>
> >>> This guarantees that all pages that might have gotten dirty are reported
> >>> back, and thus guarantees a valid migration even without VFIO IOMMU
> >>> dirty tracking support.
> >>>
> >>> The motivation for this patch is the introduction of iommufd [1].
> >>> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
> >>> mapping them into its internal ops, allowing the usage of these IOCTLs
> >>> over iommufd. However, VFIO IOMMU dirty tracking is not supported by
> >>> this VFIO compatibility API.
> >>>
> >>> This patch will allow migration by hosts that use the VFIO compatibility
> >>> API and prevent migration regressions caused by the lack of VFIO IOMMU
> >>> dirty tracking support.
> >>>
> >>> [1]
> >>> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
> >>>
> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >>> Reviewed-by: Cédric Le Goater <clg@redhat.com>  
> >> Reviewed-by: Juan Quintela <quintela@redhat.com>
> >>
> >> I know why you are doing this.
> >>
> >> But I think this should print a warning, error, somewhere.  
> >
> > IMHO, I'm not sure it's really necessary.
> >
> > To enable VFIO migration the user must explicitly set x-enable-migration=on.
> > I guess in this case the user is well aware of the dirty tracking
> > capabilities the VFIO device has or doesn't have.
> > So I don't think adding this error/warning will help much.  
> 
> Oops.  I missed that bit.
> I retire my objection.

Keep it in mind though as hopefully we'll be making vfio migration
non-experimental soon and enabled by default where devices support it.
We'll need to consider whether we want to keep "dumb" dirty tracking,
or even any form of dirty tracking in the type1 uAPI, under an
experimental opt-in.  Thanks,

Alex
Jason Gunthorpe Feb. 15, 2023, 8:38 p.m. UTC | #5
On Wed, Feb 15, 2023 at 01:14:35PM -0700, Alex Williamson wrote:

> We'll need to consider whether we want to keep "dumb" dirty tracking,
> or even any form of dirty tracking in the type1 uAPI, under an
> experimental opt-in.  Thanks,

I was expecting we'd delete the kernel code for type 1 dirty tracking
once the v2 parts are merged to qemu since we don't and won't have any
kernel implementation of it..

The big point of this to allow qmeu to continue on with a future
kernel that no longer reports it supports this.

Jason
Alex Williamson Feb. 15, 2023, 9:02 p.m. UTC | #6
On Wed, 15 Feb 2023 16:38:10 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Feb 15, 2023 at 01:14:35PM -0700, Alex Williamson wrote:
> 
> > We'll need to consider whether we want to keep "dumb" dirty tracking,
> > or even any form of dirty tracking in the type1 uAPI, under an
> > experimental opt-in.  Thanks,  
> 
> I was expecting we'd delete the kernel code for type 1 dirty tracking
> once the v2 parts are merged to qemu since we don't and won't have any
> kernel implementation of it..
> 
> The big point of this to allow qmeu to continue on with a future
> kernel that no longer reports it supports this.

Right, in the part-1 series adding v2 support, we have no other dirty
tracking, so it serves a purpose until we have device-level dirty
tracking in the part-2 series.  After that, we can certainly at least
remove the type1 dirty tracking since there never were, and likely never
will be, any in-kernel implementations.  I could go either way if we
want to keep the in-QEMU dirty-everything tracking around as an
experimental option to aid migration driver development/debug (ie.
maybe enabling it preempts device dirty tracking as a comparison for
debug).  Thanks,

Alex
diff mbox series

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 130e5d1dc7..f6dd571549 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -488,6 +488,12 @@  static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
+    if (iotlb && vfio_devices_all_running_and_saving(container)) {
+        cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
+                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
+                                            DIRTY_CLIENTS_NOCODE);
+    }
+
     return 0;
 }
 
@@ -1201,6 +1207,10 @@  static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
         .argsz = sizeof(dirty),
     };
 
+    if (!container->dirty_pages_supported) {
+        return;
+    }
+
     if (start) {
         dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
     } else {
@@ -1236,6 +1246,13 @@  static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
     uint64_t pages;
     int ret;
 
+    if (!container->dirty_pages_supported) {
+        cpu_physical_memory_set_dirty_range(ram_addr, size,
+                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
+                                            DIRTY_CLIENTS_NOCODE);
+        return 0;
+    }
+
     dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
     dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1409,8 +1426,7 @@  static void vfio_listener_log_sync(MemoryListener *listener,
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 
-    if (vfio_listener_skipped_section(section) ||
-        !container->dirty_pages_supported) {
+    if (vfio_listener_skipped_section(section)) {
         return;
     }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 02db30fe8d..c9f3117986 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -861,11 +861,10 @@  int64_t vfio_mig_bytes_transferred(void)
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
-    VFIOContainer *container = vbasedev->group->container;
     struct vfio_region_info *info = NULL;
     int ret = -ENOTSUP;
 
-    if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
+    if (!vbasedev->enable_migration) {
         goto add_blocker;
     }