Message ID | 20210415135851.862406-2-berrange@redhat.com |
---|---|
State | New |
Headers | show |
Series | block, migration: improve debugging of migration bdrv_flush failure | expand |
* Daniel P. Berrangé (berrange@redhat.com) wrote: > This is a critical failure scenario for migration that is hard to > diagnose from existing probes. Most likely it is caused by an error > from bdrv_flush(), but we're not logging the errno anywhere, hence > this new probe. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 1 + > migration/trace-events | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 8ca034136b..bee0dcd501 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3121,6 +3121,7 @@ static void migration_completion(MigrationState *s) > if (!ret) { > bool inactivate = !migrate_colo_enabled(); > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > + trace_migration_completion_vm_stop(ret); > if (ret >= 0) { > ret = migration_maybe_pause(s, ¤t_active_state, > MIGRATION_STATUS_DEVICE); > diff --git a/migration/trace-events b/migration/trace-events > index 668c562fed..8ec28432eb 100644 > --- a/migration/trace-events > +++ b/migration/trace-events > @@ -149,6 +149,7 @@ migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint > migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d" > migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64 > migration_completion_file_err(void) "" > +migration_completion_vm_stop(int ret) "ret %d" > migration_completion_postcopy_end(void) "" > migration_completion_postcopy_end_after_complete(void) "" > migration_rate_limit_pre(int ms) "%d ms" > -- > 2.30.2 >
diff --git a/migration/migration.c b/migration/migration.c index 8ca034136b..bee0dcd501 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3121,6 +3121,7 @@ static void migration_completion(MigrationState *s) if (!ret) { bool inactivate = !migrate_colo_enabled(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); + trace_migration_completion_vm_stop(ret); if (ret >= 0) { ret = migration_maybe_pause(s, ¤t_active_state, MIGRATION_STATUS_DEVICE); diff --git a/migration/trace-events b/migration/trace-events index 668c562fed..8ec28432eb 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -149,6 +149,7 @@ migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d" migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64 migration_completion_file_err(void) "" +migration_completion_vm_stop(int ret) "ret %d" migration_completion_postcopy_end(void) "" migration_completion_postcopy_end_after_complete(void) "" migration_rate_limit_pre(int ms) "%d ms"
This is a critical failure scenario for migration that is hard to diagnose from existing probes. Most likely it is caused by an error from bdrv_flush(), but we're not logging the errno anywhere, hence this new probe. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- migration/migration.c | 1 + migration/trace-events | 1 + 2 files changed, 2 insertions(+)