Message ID | 20221229110345.12480-1-avihaih@nvidia.com |
---|---|
Headers | show |
Series | vfio/migration: Implement VFIO migration protocol v2 | expand |
On Thu, 29 Dec 2022 13:03:31 +0200 Avihai Horon <avihaih@nvidia.com> wrote: > Hello, > > Now that QEMU 8.0 development cycle has started and MIG_DATA_SIZE ioctl > is in kernel v6.2-rc1, I am sending v5 of this series with linux headers > update and with the preview patches in v4 merged into this series. > > > > Following VFIO migration protocol v2 acceptance in kernel, this series > implements VFIO migration according to the new v2 protocol and replaces > the now deprecated v1 implementation. > > The main differences between v1 and v2 migration protocols are: > 1. VFIO device state is represented as a finite state machine instead of > a bitmap. > > 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE > ioctl and normal read() and write() instead of the migration region > used in v1. > > 3. Pre-copy is made optional in v2 protocol. Support for pre-copy will > be added later on. > > Full description of the v2 protocol and the differences from v1 can be > found here [1]. > > > > Patch list: > > Patch 1 updates linux headers so we will have the MIG_DATA_SIZE ioctl. > > Patches 2-3 are patches taken from Juan's RFC [2]. > As discussed in the KVM call, since we have a new ioctl to get device > data size while it's RUNNING, we don't need the stop and resume VM > functionality from the RFC. > > Patches 4-9 are prep patches fixing bugs, adding QEMUFile function > that will be used later and refactoring v1 protocol code to make it > easier to add v2 protocol. > > Patches 10-14 implement v2 protocol and remove v1 protocol. Missing from the series is the all important question of what happens to "x-enable-migration" now. We have two in-kernel drivers supporting v2 migration, so while hardware and firmware may still be difficult to bring together, it does seem possible for the upstream community to test and maintain this. To declare this supported and not to impose any additional requirements on management tools, I think migration needs to be enabled by default for devices that support it. Is there any utility to keeping around some sort of device option to force it ON/OFF? My interpretation of ON seems rather redundant to the -only-migratable option, ie. fail the device if migration is not supported, and I can't think of any production use cases for OFF. So maybe we simply drop the option as an implicit AUTO feature and we can consider an experimental or supported explicit feature later for the more esoteric use cases as they develop? Thanks, Alex
On Fri, Jan 06, 2023 at 04:36:09PM -0700, Alex Williamson wrote: > Missing from the series is the all important question of what happens > to "x-enable-migration" now. We have two in-kernel drivers supporting > v2 migration, so while hardware and firmware may still be difficult to > bring together, it does seem possible for the upstream community to > test and maintain this. My post-break memory is a bit hazy, but don't we still need a qemu series for the new dirty tracking uAPI? I suggest that is the right spot to declare victory on this, as it is actually production usable and testable. I'm also hopeful we can see the system iommu dirty tracking > To declare this supported and not to impose any additional requirements > on management tools, I think migration needs to be enabled by default > for devices that support it. At least for mlx5 there will be a switch that causes the VF to not support migration, and that will be probably be the default. > Is there any utility to keeping around > some sort of device option to force it ON/OFF? I think not at the qemu, level. Even for testing purposes it is easy to disable live migration by not loading the valiant vfio driver. Jason