mbox series

[V4,00/14] allow cpr-reboot for vfio

Message ID 1708622920-68779-1-git-send-email-steven.sistare@oracle.com
Headers show
Series allow cpr-reboot for vfio | expand

Message

Steven Sistare Feb. 22, 2024, 5:28 p.m. UTC
Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.  The
user is responsible for suspending the guest before initiating cpr, such as
by issuing guest-suspend-ram to the qemu guest agent.

Most of the patches in this series enhance migration notifiers so they can
return an error status and message.  The last few patches register a notifier
for vfio that returns an error if the guest is not suspended.

Changes in V3:
  * update to tip, add RB's
  * replace MigrationStatus with new enum MigrationEventType
  * simplify migrate_fd_connect error recovery
  * support vfio iommufd containers
  * add patches:
      migration: stop vm for cpr
      migration: update cpr-reboot description

Changes in V4:
  * rebase to tip, add RB's
  * add patch to prevent options such as precopy from being used with cpr.
      migration: options incompatible with cpr
  * refactor subroutines in "stop vm for cpr"
  * simplify "notifier error checking" patch by restricting that only
    notifiers for MIG_EVENT_PRECOPY_SETUP may return an error.
  * doc that a fail event may be sent without a prior setup event

Steve Sistare (14):
  notify: pass error to notifier with return
  migration: remove error from notifier data
  migration: convert to NotifierWithReturn
  migration: MigrationEvent for notifiers
  migration: remove postcopy_after_devices
  migration: MigrationNotifyFunc
  migration: per-mode notifiers
  migration: refactor migrate_fd_connect failures
  migration: notifier error checking
  migration: stop vm for cpr
  vfio: register container for cpr
  vfio: allow cpr-reboot migration if suspended
  migration: update cpr-reboot description
  migration: options incompatible with cpr

 hw/net/virtio-net.c                   |  13 +--
 hw/vfio/common.c                      |   2 +-
 hw/vfio/container.c                   |  11 ++-
 hw/vfio/cpr.c                         |  39 +++++++++
 hw/vfio/iommufd.c                     |   6 ++
 hw/vfio/meson.build                   |   1 +
 hw/vfio/migration.c                   |  15 ++--
 hw/vfio/trace-events                  |   2 +-
 hw/virtio/vhost-user.c                |  10 +--
 hw/virtio/virtio-balloon.c            |   3 +-
 include/hw/vfio/vfio-common.h         |   5 +-
 include/hw/vfio/vfio-container-base.h |   1 +
 include/hw/virtio/virtio-net.h        |   2 +-
 include/migration/misc.h              |  47 +++++++++--
 include/qemu/notify.h                 |   8 +-
 migration/migration.c                 | 148 +++++++++++++++++++++++-----------
 migration/migration.h                 |   4 -
 migration/postcopy-ram.c              |   3 +-
 migration/postcopy-ram.h              |   1 -
 migration/ram.c                       |   3 +-
 net/vhost-vdpa.c                      |  14 ++--
 qapi/migration.json                   |  37 ++++++---
 roms/seabios-hppa                     |   2 +-
 ui/spice-core.c                       |  17 ++--
 util/notify.c                         |   5 +-
 25 files changed, 275 insertions(+), 124 deletions(-)
 create mode 100644 hw/vfio/cpr.c

Comments

Steven Sistare Feb. 22, 2024, 5:33 p.m. UTC | #1
Peter (and David if interested): these patches still need RB:
  migration: notifier error checking
  migration: stop vm for cpr
  migration: update cpr-reboot description
  migration: options incompatible with cpr

Alex, these patches still need RB:
  vfio: register container for cpr
  vfio: allow cpr-reboot migration if suspended

Thanks!

- Steve

On 2/22/2024 12:28 PM, Steve Sistare wrote:
> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.  The
> user is responsible for suspending the guest before initiating cpr, such as
> by issuing guest-suspend-ram to the qemu guest agent.
> 
> Most of the patches in this series enhance migration notifiers so they can
> return an error status and message.  The last few patches register a notifier
> for vfio that returns an error if the guest is not suspended.
> 
> Changes in V3:
>   * update to tip, add RB's
>   * replace MigrationStatus with new enum MigrationEventType
>   * simplify migrate_fd_connect error recovery
>   * support vfio iommufd containers
>   * add patches:
>       migration: stop vm for cpr
>       migration: update cpr-reboot description
> 
> Changes in V4:
>   * rebase to tip, add RB's
>   * add patch to prevent options such as precopy from being used with cpr.
>       migration: options incompatible with cpr
>   * refactor subroutines in "stop vm for cpr"
>   * simplify "notifier error checking" patch by restricting that only
>     notifiers for MIG_EVENT_PRECOPY_SETUP may return an error.
>   * doc that a fail event may be sent without a prior setup event
> 
> Steve Sistare (14):
>   notify: pass error to notifier with return
>   migration: remove error from notifier data
>   migration: convert to NotifierWithReturn
>   migration: MigrationEvent for notifiers
>   migration: remove postcopy_after_devices
>   migration: MigrationNotifyFunc
>   migration: per-mode notifiers
>   migration: refactor migrate_fd_connect failures
>   migration: notifier error checking
>   migration: stop vm for cpr
>   vfio: register container for cpr
>   vfio: allow cpr-reboot migration if suspended
>   migration: update cpr-reboot description
>   migration: options incompatible with cpr
> 
>  hw/net/virtio-net.c                   |  13 +--
>  hw/vfio/common.c                      |   2 +-
>  hw/vfio/container.c                   |  11 ++-
>  hw/vfio/cpr.c                         |  39 +++++++++
>  hw/vfio/iommufd.c                     |   6 ++
>  hw/vfio/meson.build                   |   1 +
>  hw/vfio/migration.c                   |  15 ++--
>  hw/vfio/trace-events                  |   2 +-
>  hw/virtio/vhost-user.c                |  10 +--
>  hw/virtio/virtio-balloon.c            |   3 +-
>  include/hw/vfio/vfio-common.h         |   5 +-
>  include/hw/vfio/vfio-container-base.h |   1 +
>  include/hw/virtio/virtio-net.h        |   2 +-
>  include/migration/misc.h              |  47 +++++++++--
>  include/qemu/notify.h                 |   8 +-
>  migration/migration.c                 | 148 +++++++++++++++++++++++-----------
>  migration/migration.h                 |   4 -
>  migration/postcopy-ram.c              |   3 +-
>  migration/postcopy-ram.h              |   1 -
>  migration/ram.c                       |   3 +-
>  net/vhost-vdpa.c                      |  14 ++--
>  qapi/migration.json                   |  37 ++++++---
>  roms/seabios-hppa                     |   2 +-
>  ui/spice-core.c                       |  17 ++--
>  util/notify.c                         |   5 +-
>  25 files changed, 275 insertions(+), 124 deletions(-)
>  create mode 100644 hw/vfio/cpr.c
>
Peter Xu Feb. 26, 2024, 2:14 a.m. UTC | #2
On Thu, Feb 22, 2024 at 12:33:42PM -0500, Steven Sistare wrote:
> Peter (and David if interested): these patches still need RB:
>   migration: notifier error checking
>   migration: stop vm for cpr
>   migration: update cpr-reboot description
>   migration: options incompatible with cpr

These all look fine to me.

> 
> Alex, these patches still need RB:
>   vfio: register container for cpr
>   vfio: allow cpr-reboot migration if suspended

I'll need to wait for comment from either Alex/Cedric on these.

As I asked in the other thread, afaict crp-reboot keeps changing behavior,
maybe I can merge migration patches first, then keep vfio patches
separately merged / discussed?  I always see cpr-reboot mode experimental
from this regard.  Please consider adding a patch to declare cpr-reboot
mode experimental if that matches your expectation, until all relevant
patches are merged, to make sure the ABI becomes stable.

Thanks,
Cédric Le Goater Feb. 26, 2024, 8:49 a.m. UTC | #3
On 2/26/24 03:14, Peter Xu wrote:
> On Thu, Feb 22, 2024 at 12:33:42PM -0500, Steven Sistare wrote:
>> Peter (and David if interested): these patches still need RB:
>>    migration: notifier error checking
>>    migration: stop vm for cpr
>>    migration: update cpr-reboot description
>>    migration: options incompatible with cpr
> 
> These all look fine to me.
> 
>>
>> Alex, these patches still need RB:
>>    vfio: register container for cpr
>>    vfio: allow cpr-reboot migration if suspended
> 
> I'll need to wait for comment from either Alex/Cedric on these.

Yes. It's on my list.

> As I asked in the other thread, afaict crp-reboot keeps changing behavior,
> maybe I can merge migration patches first, 

Go ahead. It will help me for the changes I am doing on error reporting
for VFIO migration. I will rebase on top.

> then keep vfio patches separately merged / discussed?  

Sure.

> I always see cpr-reboot mode experimental from this regard.  

This makes sense to me also.

Thanks,

C.



> Please consider adding a patch to declare cpr-reboot
> mode experimental if that matches your expectation, until all relevant
> patches are merged, to make sure the ABI becomes stable.
> 
> Thanks,
>
Peter Xu Feb. 26, 2024, 9:01 a.m. UTC | #4
On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
> Go ahead. It will help me for the changes I am doing on error reporting
> for VFIO migration. I will rebase on top.

Thanks for confirming.  I queued the migration patches then, but leave the
two vfio one for further discussion.
Steven Sistare Feb. 26, 2024, 8:21 p.m. UTC | #5
On 2/26/2024 4:01 AM, Peter Xu wrote:
> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
>> Go ahead. It will help me for the changes I am doing on error reporting
>> for VFIO migration. I will rebase on top.
> 
> Thanks for confirming.  I queued the migration patches then, but leave the
> two vfio one for further discussion.

Very good, thanks.  I am always happy to move the ball a few yards closer to
the goal line :)

- Steve
Steven Sistare Feb. 26, 2024, 10:08 p.m. UTC | #6
On 2/26/2024 3:21 PM, Steven Sistare wrote:
> On 2/26/2024 4:01 AM, Peter Xu wrote:
>> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
>>> Go ahead. It will help me for the changes I am doing on error reporting
>>> for VFIO migration. I will rebase on top.
>>
>> Thanks for confirming.  I queued the migration patches then, but leave the
>> two vfio one for further discussion.
> 
> Very good, thanks.  I am always happy to move the ball a few yards closer to
> the goal line :)

Peter, beware that patch 3 needs an edit before being queued.
This hunk snuck in and should be deleted:

[PATCH V4 03/14] migration: convert to NotifierWithReturn
diff --git a/roms/seabios-hppa b/roms/seabios-hppa
index 03774ed..e4eac85 160000
--- a/roms/seabios-hppa
+++ b/roms/seabios-hppa
@@ -1 +1 @@
-Subproject commit 03774edaad3bfae090ac96ca5450353c641637d1
+Subproject commit e4eac85880e8677f96d8b9e94de9f2eec9c0751f


- Steve
Peter Xu Feb. 27, 2024, 4:07 a.m. UTC | #7
On Mon, Feb 26, 2024 at 05:08:05PM -0500, Steven Sistare wrote:
> On 2/26/2024 3:21 PM, Steven Sistare wrote:
> > On 2/26/2024 4:01 AM, Peter Xu wrote:
> >> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
> >>> Go ahead. It will help me for the changes I am doing on error reporting
> >>> for VFIO migration. I will rebase on top.
> >>
> >> Thanks for confirming.  I queued the migration patches then, but leave the
> >> two vfio one for further discussion.
> > 
> > Very good, thanks.  I am always happy to move the ball a few yards closer to
> > the goal line :)
> 
> Peter, beware that patch 3 needs an edit before being queued.
> This hunk snuck in and should be deleted:
> 
> [PATCH V4 03/14] migration: convert to NotifierWithReturn
> diff --git a/roms/seabios-hppa b/roms/seabios-hppa
> index 03774ed..e4eac85 160000
> --- a/roms/seabios-hppa
> +++ b/roms/seabios-hppa
> @@ -1 +1 @@
> -Subproject commit 03774edaad3bfae090ac96ca5450353c641637d1
> +Subproject commit e4eac85880e8677f96d8b9e94de9f2eec9c0751f

I see, I dropped this change in the patch.

https://gitlab.com/peterx/qemu/-/commit/ccea71f8f222593c47670366d7d86554586e596e
Cédric Le Goater March 8, 2024, 4:52 p.m. UTC | #8
On 2/22/24 18:33, Steven Sistare wrote:
> Peter (and David if interested): these patches still need RB:
>    migration: notifier error checking
>    migration: stop vm for cpr
>    migration: update cpr-reboot description
>    migration: options incompatible with cpr
> 
> Alex, these patches still need RB:
>    vfio: register container for cpr
>    vfio: allow cpr-reboot migration if suspended

Applied to vfio-next.

Thanks,

C.