mbox series

[0/2] migration: Fix RP shutdown order

Message ID 20240226203122.22894-1-farosas@suse.de
Headers show
Series migration: Fix RP shutdown order | expand

Message

Fabiano Rosas Feb. 26, 2024, 8:31 p.m. UTC
These were extracted from:
[PATCH 00/14] migration: Improve error reporting
https://lore.kernel.org/r/20240207133347.1115903-1-clg@redhat.com

We're currently relying on the presence of a QEMUFile error in the
to_dst_file to infer whether the return path file (rp_state.from_dst_file)
might be hanging at a recvmsg() system call.

This has always been buggy because we actually clear the to_dst_file
pointer and close the file before attempting any of the above.

Move the RP cleanup before the to_dst_file cleanup, at both the
migrate_fd_cleanup() and postcopy_pause() paths to make sure we
reference a valid and open to_dst_file.

Also replace the error checking to use s->error instead of
f->last_error. This removes one more dependency on
QEMUFile::last_error.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1191131909

Cédric Le Goater (1):
  migration: Use migrate_has_error() in close_return_path_on_source()

Fabiano Rosas (1):
  migration: Join the return path thread before releasing to_dst_file

 migration/migration.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

Comments

Peter Xu Feb. 27, 2024, 8:16 a.m. UTC | #1
On Mon, Feb 26, 2024 at 05:31:20PM -0300, Fabiano Rosas wrote:
> These were extracted from:
> [PATCH 00/14] migration: Improve error reporting
> https://lore.kernel.org/r/20240207133347.1115903-1-clg@redhat.com
> 
> We're currently relying on the presence of a QEMUFile error in the
> to_dst_file to infer whether the return path file (rp_state.from_dst_file)
> might be hanging at a recvmsg() system call.
> 
> This has always been buggy because we actually clear the to_dst_file
> pointer and close the file before attempting any of the above.
> 
> Move the RP cleanup before the to_dst_file cleanup, at both the
> migrate_fd_cleanup() and postcopy_pause() paths to make sure we
> reference a valid and open to_dst_file.
> 
> Also replace the error checking to use s->error instead of
> f->last_error. This removes one more dependency on
> QEMUFile::last_error.
> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1191131909
> 
> Cédric Le Goater (1):
>   migration: Use migrate_has_error() in close_return_path_on_source()
> 
> Fabiano Rosas (1):
>   migration: Join the return path thread before releasing to_dst_file
> 
>  migration/migration.c | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)

Queued for now.  Comments / tags still welcomed if any.