Message ID | 1371242153-11262-15-git-send-email-mrhines@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On 06/14/2013 09:35 PM, mrhines@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" <mrhines@us.ibm.com> > > For very large virtual machines, pinning can take a long time. > While this does not affect the migration's *actual* time itself, > it is still important for the user to know what's going on and > to know what component of the total time is actual taken up by > pinning. > > For example, using a 14GB virtual machine, pinning can take as > long as 5 seconds, for which the user would not otherwise know > what was happening. > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> > --- > +++ b/qapi-schema.json > @@ -586,7 +586,8 @@ > '*xbzrle-cache': 'XBZRLECacheStats', > '*total-time': 'int', > '*expected-downtime': 'int', > - '*downtime': 'int'} } > + '*downtime': 'int', > + '*pin-all-time' : 'int'} } Missing documentation of the new field (compare how @downtime is documented as a (since 1.3) field).
Il 15/06/2013 05:06, Eric Blake ha scritto: > On 06/14/2013 09:35 PM, mrhines@linux.vnet.ibm.com wrote: >> From: "Michael R. Hines" <mrhines@us.ibm.com> >> >> For very large virtual machines, pinning can take a long time. >> While this does not affect the migration's *actual* time itself, >> it is still important for the user to know what's going on and to >> know what component of the total time is actual taken up by >> pinning. >> >> For example, using a 14GB virtual machine, pinning can take as >> long as 5 seconds, for which the user would not otherwise know >> what was happening. >> >> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: >> Michael R. Hines <mrhines@us.ibm.com> --- > >> +++ b/qapi-schema.json @@ -586,7 +586,8 @@ '*xbzrle-cache': >> 'XBZRLECacheStats', '*total-time': 'int', '*expected-downtime': >> 'int', - '*downtime': 'int'} } + '*downtime': >> 'int', + '*pin-all-time' : 'int'} } > > Missing documentation of the new field (compare how @downtime is > documented as a (since 1.3) field). > Can you just time the setup phase instead? Paolo
On 06/15/2013 05:06 AM, Eric Blake wrote: > On 06/14/2013 09:35 PM, mrhines@linux.vnet.ibm.com wrote: >> From: "Michael R. Hines" <mrhines@us.ibm.com> >> >> For very large virtual machines, pinning can take a long time. >> While this does not affect the migration's *actual* time itself, >> it is still important for the user to know what's going on and >> to know what component of the total time is actual taken up by >> pinning. >> >> For example, using a 14GB virtual machine, pinning can take as >> long as 5 seconds, for which the user would not otherwise know >> what was happening. >> >> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> >> --- >> +++ b/qapi-schema.json >> @@ -586,7 +586,8 @@ >> '*xbzrle-cache': 'XBZRLECacheStats', >> '*total-time': 'int', >> '*expected-downtime': 'int', >> - '*downtime': 'int'} } >> + '*downtime': 'int', >> + '*pin-all-time' : 'int'} } > Missing documentation of the new field (compare how @downtime is > documented as a (since 1.3) field). > Acknowledged....
On 06/15/2013 01:32 PM, Paolo Bonzini wrote: > Il 15/06/2013 05:06, Eric Blake ha scritto: >> On 06/14/2013 09:35 PM, mrhines@linux.vnet.ibm.com wrote: >>> From: "Michael R. Hines" <mrhines@us.ibm.com> >>> >>> For very large virtual machines, pinning can take a long time. >>> While this does not affect the migration's *actual* time itself, >>> it is still important for the user to know what's going on and to >>> know what component of the total time is actual taken up by >>> pinning. >>> >>> For example, using a 14GB virtual machine, pinning can take as >>> long as 5 seconds, for which the user would not otherwise know >>> what was happening. >>> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: >>> Michael R. Hines <mrhines@us.ibm.com> --- >>> +++ b/qapi-schema.json @@ -586,7 +586,8 @@ '*xbzrle-cache': >>> 'XBZRLECacheStats', '*total-time': 'int', '*expected-downtime': >>> 'int', - '*downtime': 'int'} } + '*downtime': >>> 'int', + '*pin-all-time' : 'int'} } >> Missing documentation of the new field (compare how @downtime is >> documented as a (since 1.3) field). >> > Can you just time the setup phase instead? > > Paolo > That's a great idea..... better to make it a more generic time. I'll send out a V10 with everybody's suggestions. - Michael
On 6/14/2013 1:35 PM, mrhines@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" <mrhines@us.ibm.com> > > For very large virtual machines, pinning can take a long time. > While this does not affect the migration's *actual* time itself, > it is still important for the user to know what's going on and > to know what component of the total time is actual taken up by > pinning. > > For example, using a 14GB virtual machine, pinning can take as > long as 5 seconds, for which the user would not otherwise know > what was happening. > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> > --- > hmp.c | 4 +++ > include/migration/migration.h | 1 + > migration-rdma.c | 55 +++++++++++++++++++++++++++-------------- > migration.c | 13 +++++++++- > qapi-schema.json | 3 ++- > 5 files changed, 56 insertions(+), 20 deletions(-) > > diff --git a/hmp.c b/hmp.c > index 148a3fb..90c55f2 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -164,6 +164,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) > monitor_printf(mon, "downtime: %" PRIu64 " milliseconds\n", > info->downtime); > } > + if (info->has_pin_all_time) { > + monitor_printf(mon, "pin-all: %" PRIu64 " milliseconds\n", > + info->pin_all_time); > + } > } > > if (info->has_ram) { > diff --git a/include/migration/migration.h b/include/migration/migration.h > index b49e68b..d2ca75b 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -49,6 +49,7 @@ struct MigrationState > bool enabled_capabilities[MIGRATION_CAPABILITY_MAX]; > int64_t xbzrle_cache_size; > double mbps; > + int64_t pin_all_time; > }; > > void process_incoming_migration(QEMUFile *f); > diff --git a/migration-rdma.c b/migration-rdma.c > index 853de18..e407dce 100644 > --- a/migration-rdma.c > +++ b/migration-rdma.c > @@ -699,11 +699,11 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) > return 0; > } > > -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, > - RDMALocalBlocks *rdma_local_ram_blocks) > +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) > { > int i; > - uint64_t start = qemu_get_clock_ms(rt_clock); > + int64_t start = qemu_get_clock_ms(host_clock); > + RDMALocalBlocks *rdma_local_ram_blocks = &rdma->local_ram_blocks; > (void)start; > > for (i = 0; i < rdma_local_ram_blocks->num_blocks; i++) { > @@ -721,7 +721,8 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, > rdma->total_registrations++; > } > > - DPRINTF("lock time: %" PRIu64 "\n", qemu_get_clock_ms(rt_clock) - start); > + DPRINTF("local lock time: %" PRId64 "\n", > + qemu_get_clock_ms(host_clock) - start); > > if (i >= rdma_local_ram_blocks->num_blocks) { > return 0; > @@ -1262,7 +1263,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx, > */ > static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, > uint8_t *data, RDMAControlHeader *resp, > - int *resp_idx) > + int *resp_idx, > + int (*callback)(RDMAContext *rdma)) > { > int ret = 0; > int idx = 0; > @@ -1315,6 +1317,14 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, > * If we're expecting a response, block and wait for it. > */ > if (resp) { > + if (callback) { > + DPRINTF("Issuing callback before receiving response...\n"); > + ret = callback(rdma); > + if (ret < 0) { > + return ret; > + } > + } > + > DDPRINTF("Waiting for response %s\n", control_desc[resp->type]); > ret = qemu_rdma_exchange_get_response(rdma, resp, resp->type, idx + 1); > > @@ -1464,7 +1474,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma, > chunk, sge.length, current_index, offset); > > ret = qemu_rdma_exchange_send(rdma, &head, > - (uint8_t *) &comp, NULL, NULL); > + (uint8_t *) &comp, NULL, NULL, NULL); > > if (ret < 0) { > return -EIO; > @@ -1487,7 +1497,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma, > chunk, sge.length, current_index, offset); > > ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) ®, > - &resp, ®_result_idx); > + &resp, ®_result_idx, NULL); > if (ret < 0) { > return ret; > } > @@ -2126,7 +2136,7 @@ static int qemu_rdma_put_buffer(void *opaque, const uint8_t *buf, > head.len = r->len; > head.type = RDMA_CONTROL_QEMU_FILE; > > - ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL); > + ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL); > > if (ret < 0) { > rdma->error_state = ret; > @@ -2482,7 +2492,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque, > DPRINTF("Initial setup info requested.\n"); > > if (rdma->pin_all) { > - ret = qemu_rdma_reg_whole_ram_blocks(rdma, &rdma->local_ram_blocks); > + ret = qemu_rdma_reg_whole_ram_blocks(rdma); > if (ret) { > fprintf(stderr, "rdma migration: error dest " > "registering ram blocks!\n"); > @@ -2614,10 +2624,22 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, > } > > if (flags == RAM_CONTROL_SETUP) { > + int64_t start = qemu_get_clock_ms(host_clock); > + MigrationState *s = migrate_get_current(); > + > head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST; > DPRINTF("Sending registration setup for ram blocks...\n"); > > - ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, ®_result_idx); > + /* > + * Make sure that we parallelize the pinning on both size. > + * For very large guests, doing this serially takes a really > + * long time, so we have to 'interleave' the pinning locally > + * by performing it before we receive the response from the > + * destination that the pinning has completed. > + */ > + ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, > + ®_result_idx, rdma->pin_all ? > + qemu_rdma_reg_whole_ram_blocks : NULL); > if (ret < 0) { > ERROR(errp, "receiving remote info!\n"); > return ret; > @@ -2634,12 +2656,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, > } > > if (rdma->pin_all) { > - ret = qemu_rdma_reg_whole_ram_blocks(rdma, &rdma->local_ram_blocks); > - if (ret) { > - fprintf(stderr, "rdma migration: error source " > - "registering ram blocks!\n"); > - return ret; > - } > + s->pin_all_time = qemu_get_clock_ms(host_clock) - start; > } else { > int x = 0; > for (x = 0; x < rdma->local_ram_blocks.num_blocks; x++) { > @@ -2653,7 +2670,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, > DDDPRINTF("Sending registration finish %" PRIu64 "...\n", flags); > > head.type = RDMA_CONTROL_REGISTER_FINISHED; > - ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL); > + ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL); > > if (ret < 0) { > goto err; > @@ -2801,8 +2818,10 @@ void rdma_start_outgoing_migration(void *opaque, > > DPRINTF("qemu_rdma_source_connect success\n"); > > + if (rdma->pin_all) { > + s->pin_all_time = 0; > + } > s->file = qemu_fopen_rdma(rdma, "wb"); > - s->total_time = qemu_get_clock_ms(rt_clock); > migrate_fd_connect(s); > return; > err: > diff --git a/migration.c b/migration.c > index 62c6b85..17531e9 100644 > --- a/migration.c > +++ b/migration.c > @@ -67,6 +67,7 @@ MigrationState *migrate_get_current(void) > .bandwidth_limit = MAX_THROTTLE, > .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE, > .mbps = -1, > + .pin_all_time = -1, > }; > > return ¤t_migration; > @@ -189,7 +190,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) > break; > case MIG_STATE_ACTIVE: > info->has_status = true; > - info->status = g_strdup("active"); > + info->status = g_strdup(s->pin_all_time != 0 ? "active" : "pinning"); Pl. check the logic for setting the migration status correctly. In the current logic where you rely on "s->pin_all_time" if we did non-rdma migration the migration status will remain in "pinning" (instead of "active") till the migration completes.... Thx Vinod > info->has_total_time = true; > info->total_time = qemu_get_clock_ms(rt_clock) > - s->total_time; > @@ -208,6 +209,11 @@ MigrationInfo *qmp_query_migrate(Error **errp) > info->ram->dirty_pages_rate = s->dirty_pages_rate; > info->ram->mbps = s->mbps; > > + if (s->pin_all_time >= 0) { > + info->has_pin_all_time = true; > + info->pin_all_time = s->pin_all_time; > + } > + > if (blk_mig_active()) { > info->has_disk = true; > info->disk = g_malloc0(sizeof(*info->disk)); > @@ -227,6 +233,11 @@ MigrationInfo *qmp_query_migrate(Error **errp) > info->has_downtime = true; > info->downtime = s->downtime; > > + if (s->pin_all_time >= 0) { > + info->has_pin_all_time = true; > + info->pin_all_time = s->pin_all_time; > + } > + > info->has_ram = true; > info->ram = g_malloc0(sizeof(*info->ram)); > info->ram->transferred = ram_bytes_transferred(); > diff --git a/qapi-schema.json b/qapi-schema.json > index 4e9cd6d..dfae59a 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -586,7 +586,8 @@ > '*xbzrle-cache': 'XBZRLECacheStats', > '*total-time': 'int', > '*expected-downtime': 'int', > - '*downtime': 'int'} } > + '*downtime': 'int', > + '*pin-all-time' : 'int'} } > > ## > # @query-migrate
On 06/16/2013 02:37 PM, Chegu Vinod wrote: > On 6/14/2013 1:35 PM, mrhines@linux.vnet.ibm.com wrote: >> From: "Michael R. Hines" <mrhines@us.ibm.com> >> >> For very large virtual machines, pinning can take a long time. >> While this does not affect the migration's *actual* time itself, >> it is still important for the user to know what's going on and >> to know what component of the total time is actual taken up by >> pinning. >> >> For example, using a 14GB virtual machine, pinning can take as >> long as 5 seconds, for which the user would not otherwise know >> what was happening. >> >> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> >> --- >> hmp.c | 4 +++ >> include/migration/migration.h | 1 + >> migration-rdma.c | 55 >> +++++++++++++++++++++++++++-------------- >> migration.c | 13 +++++++++- >> qapi-schema.json | 3 ++- >> 5 files changed, 56 insertions(+), 20 deletions(-) >> >> diff --git a/hmp.c b/hmp.c >> index 148a3fb..90c55f2 100644 >> --- a/hmp.c >> +++ b/hmp.c >> @@ -164,6 +164,10 @@ void hmp_info_migrate(Monitor *mon, const QDict >> *qdict) >> monitor_printf(mon, "downtime: %" PRIu64 " >> milliseconds\n", >> info->downtime); >> } >> + if (info->has_pin_all_time) { >> + monitor_printf(mon, "pin-all: %" PRIu64 " milliseconds\n", >> + info->pin_all_time); >> + } >> } >> if (info->has_ram) { >> diff --git a/include/migration/migration.h >> b/include/migration/migration.h >> index b49e68b..d2ca75b 100644 >> --- a/include/migration/migration.h >> +++ b/include/migration/migration.h >> @@ -49,6 +49,7 @@ struct MigrationState >> bool enabled_capabilities[MIGRATION_CAPABILITY_MAX]; >> int64_t xbzrle_cache_size; >> double mbps; >> + int64_t pin_all_time; >> }; >> void process_incoming_migration(QEMUFile *f); >> diff --git a/migration-rdma.c b/migration-rdma.c >> index 853de18..e407dce 100644 >> --- a/migration-rdma.c >> +++ b/migration-rdma.c >> @@ -699,11 +699,11 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) >> return 0; >> } >> -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, >> - RDMALocalBlocks *rdma_local_ram_blocks) >> +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) >> { >> int i; >> - uint64_t start = qemu_get_clock_ms(rt_clock); >> + int64_t start = qemu_get_clock_ms(host_clock); >> + RDMALocalBlocks *rdma_local_ram_blocks = &rdma->local_ram_blocks; >> (void)start; >> for (i = 0; i < rdma_local_ram_blocks->num_blocks; i++) { >> @@ -721,7 +721,8 @@ static int >> qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, >> rdma->total_registrations++; >> } >> - DPRINTF("lock time: %" PRIu64 "\n", >> qemu_get_clock_ms(rt_clock) - start); >> + DPRINTF("local lock time: %" PRId64 "\n", >> + qemu_get_clock_ms(host_clock) - start); >> if (i >= rdma_local_ram_blocks->num_blocks) { >> return 0; >> @@ -1262,7 +1263,8 @@ static void qemu_rdma_move_header(RDMAContext >> *rdma, int idx, >> */ >> static int qemu_rdma_exchange_send(RDMAContext *rdma, >> RDMAControlHeader *head, >> uint8_t *data, RDMAControlHeader >> *resp, >> - int *resp_idx) >> + int *resp_idx, >> + int (*callback)(RDMAContext *rdma)) >> { >> int ret = 0; >> int idx = 0; >> @@ -1315,6 +1317,14 @@ static int qemu_rdma_exchange_send(RDMAContext >> *rdma, RDMAControlHeader *head, >> * If we're expecting a response, block and wait for it. >> */ >> if (resp) { >> + if (callback) { >> + DPRINTF("Issuing callback before receiving response...\n"); >> + ret = callback(rdma); >> + if (ret < 0) { >> + return ret; >> + } >> + } >> + >> DDPRINTF("Waiting for response %s\n", >> control_desc[resp->type]); >> ret = qemu_rdma_exchange_get_response(rdma, resp, >> resp->type, idx + 1); >> @@ -1464,7 +1474,7 @@ static int qemu_rdma_write_one(QEMUFile *f, >> RDMAContext *rdma, >> chunk, sge.length, current_index, offset); >> ret = qemu_rdma_exchange_send(rdma, &head, >> - (uint8_t *) &comp, NULL, NULL); >> + (uint8_t *) &comp, NULL, NULL, NULL); >> if (ret < 0) { >> return -EIO; >> @@ -1487,7 +1497,7 @@ static int qemu_rdma_write_one(QEMUFile *f, >> RDMAContext *rdma, >> chunk, sge.length, current_index, offset); >> ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t >> *) ®, >> - &resp, ®_result_idx); >> + &resp, ®_result_idx, NULL); >> if (ret < 0) { >> return ret; >> } >> @@ -2126,7 +2136,7 @@ static int qemu_rdma_put_buffer(void *opaque, >> const uint8_t *buf, >> head.len = r->len; >> head.type = RDMA_CONTROL_QEMU_FILE; >> - ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL); >> + ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, >> NULL); >> if (ret < 0) { >> rdma->error_state = ret; >> @@ -2482,7 +2492,7 @@ static int >> qemu_rdma_registration_handle(QEMUFile *f, void *opaque, >> DPRINTF("Initial setup info requested.\n"); >> if (rdma->pin_all) { >> - ret = qemu_rdma_reg_whole_ram_blocks(rdma, >> &rdma->local_ram_blocks); >> + ret = qemu_rdma_reg_whole_ram_blocks(rdma); >> if (ret) { >> fprintf(stderr, "rdma migration: error dest " >> "registering ram blocks!\n"); >> @@ -2614,10 +2624,22 @@ static int >> qemu_rdma_registration_stop(QEMUFile *f, void *opaque, >> } >> if (flags == RAM_CONTROL_SETUP) { >> + int64_t start = qemu_get_clock_ms(host_clock); >> + MigrationState *s = migrate_get_current(); >> + >> head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST; >> DPRINTF("Sending registration setup for ram blocks...\n"); >> - ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, >> ®_result_idx); >> + /* >> + * Make sure that we parallelize the pinning on both size. >> + * For very large guests, doing this serially takes a really >> + * long time, so we have to 'interleave' the pinning locally >> + * by performing it before we receive the response from the >> + * destination that the pinning has completed. >> + */ >> + ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, >> + ®_result_idx, rdma->pin_all ? >> + qemu_rdma_reg_whole_ram_blocks : NULL); >> if (ret < 0) { >> ERROR(errp, "receiving remote info!\n"); >> return ret; >> @@ -2634,12 +2656,7 @@ static int >> qemu_rdma_registration_stop(QEMUFile *f, void *opaque, >> } >> if (rdma->pin_all) { >> - ret = qemu_rdma_reg_whole_ram_blocks(rdma, >> &rdma->local_ram_blocks); >> - if (ret) { >> - fprintf(stderr, "rdma migration: error source " >> - "registering ram blocks!\n"); >> - return ret; >> - } >> + s->pin_all_time = qemu_get_clock_ms(host_clock) - start; >> } else { >> int x = 0; >> for (x = 0; x < rdma->local_ram_blocks.num_blocks; x++) { >> @@ -2653,7 +2670,7 @@ static int qemu_rdma_registration_stop(QEMUFile >> *f, void *opaque, >> DDDPRINTF("Sending registration finish %" PRIu64 "...\n", flags); >> head.type = RDMA_CONTROL_REGISTER_FINISHED; >> - ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL); >> + ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL); >> if (ret < 0) { >> goto err; >> @@ -2801,8 +2818,10 @@ void rdma_start_outgoing_migration(void *opaque, >> DPRINTF("qemu_rdma_source_connect success\n"); >> + if (rdma->pin_all) { >> + s->pin_all_time = 0; >> + } >> s->file = qemu_fopen_rdma(rdma, "wb"); >> - s->total_time = qemu_get_clock_ms(rt_clock); >> migrate_fd_connect(s); >> return; >> err: >> diff --git a/migration.c b/migration.c >> index 62c6b85..17531e9 100644 >> --- a/migration.c >> +++ b/migration.c >> @@ -67,6 +67,7 @@ MigrationState *migrate_get_current(void) >> .bandwidth_limit = MAX_THROTTLE, >> .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE, >> .mbps = -1, >> + .pin_all_time = -1, >> }; >> return ¤t_migration; >> @@ -189,7 +190,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) >> break; >> case MIG_STATE_ACTIVE: >> info->has_status = true; >> - info->status = g_strdup("active"); >> + info->status = g_strdup(s->pin_all_time != 0 ? "active" : >> "pinning"); > > Pl. check the logic for setting the migration status correctly. In the > current logic where you rely on "s->pin_all_time" if we did non-rdma > migration the migration status will remain in "pinning" (instead of > "active") till the migration completes.... > > Thx > Vinod I just sent out a V10 which I have also validated with a separate patch to libvirt. (I'm maintaining a sister patch for RDMA to libvirt on github as well, in case you need to use libvirt too). - Michael
diff --git a/hmp.c b/hmp.c index 148a3fb..90c55f2 100644 --- a/hmp.c +++ b/hmp.c @@ -164,6 +164,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) monitor_printf(mon, "downtime: %" PRIu64 " milliseconds\n", info->downtime); } + if (info->has_pin_all_time) { + monitor_printf(mon, "pin-all: %" PRIu64 " milliseconds\n", + info->pin_all_time); + } } if (info->has_ram) { diff --git a/include/migration/migration.h b/include/migration/migration.h index b49e68b..d2ca75b 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -49,6 +49,7 @@ struct MigrationState bool enabled_capabilities[MIGRATION_CAPABILITY_MAX]; int64_t xbzrle_cache_size; double mbps; + int64_t pin_all_time; }; void process_incoming_migration(QEMUFile *f); diff --git a/migration-rdma.c b/migration-rdma.c index 853de18..e407dce 100644 --- a/migration-rdma.c +++ b/migration-rdma.c @@ -699,11 +699,11 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) return 0; } -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, - RDMALocalBlocks *rdma_local_ram_blocks) +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) { int i; - uint64_t start = qemu_get_clock_ms(rt_clock); + int64_t start = qemu_get_clock_ms(host_clock); + RDMALocalBlocks *rdma_local_ram_blocks = &rdma->local_ram_blocks; (void)start; for (i = 0; i < rdma_local_ram_blocks->num_blocks; i++) { @@ -721,7 +721,8 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, rdma->total_registrations++; } - DPRINTF("lock time: %" PRIu64 "\n", qemu_get_clock_ms(rt_clock) - start); + DPRINTF("local lock time: %" PRId64 "\n", + qemu_get_clock_ms(host_clock) - start); if (i >= rdma_local_ram_blocks->num_blocks) { return 0; @@ -1262,7 +1263,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx, */ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, uint8_t *data, RDMAControlHeader *resp, - int *resp_idx) + int *resp_idx, + int (*callback)(RDMAContext *rdma)) { int ret = 0; int idx = 0; @@ -1315,6 +1317,14 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, * If we're expecting a response, block and wait for it. */ if (resp) { + if (callback) { + DPRINTF("Issuing callback before receiving response...\n"); + ret = callback(rdma); + if (ret < 0) { + return ret; + } + } + DDPRINTF("Waiting for response %s\n", control_desc[resp->type]); ret = qemu_rdma_exchange_get_response(rdma, resp, resp->type, idx + 1); @@ -1464,7 +1474,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma, chunk, sge.length, current_index, offset); ret = qemu_rdma_exchange_send(rdma, &head, - (uint8_t *) &comp, NULL, NULL); + (uint8_t *) &comp, NULL, NULL, NULL); if (ret < 0) { return -EIO; @@ -1487,7 +1497,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma, chunk, sge.length, current_index, offset); ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) ®, - &resp, ®_result_idx); + &resp, ®_result_idx, NULL); if (ret < 0) { return ret; } @@ -2126,7 +2136,7 @@ static int qemu_rdma_put_buffer(void *opaque, const uint8_t *buf, head.len = r->len; head.type = RDMA_CONTROL_QEMU_FILE; - ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL); + ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL); if (ret < 0) { rdma->error_state = ret; @@ -2482,7 +2492,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque, DPRINTF("Initial setup info requested.\n"); if (rdma->pin_all) { - ret = qemu_rdma_reg_whole_ram_blocks(rdma, &rdma->local_ram_blocks); + ret = qemu_rdma_reg_whole_ram_blocks(rdma); if (ret) { fprintf(stderr, "rdma migration: error dest " "registering ram blocks!\n"); @@ -2614,10 +2624,22 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, } if (flags == RAM_CONTROL_SETUP) { + int64_t start = qemu_get_clock_ms(host_clock); + MigrationState *s = migrate_get_current(); + head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST; DPRINTF("Sending registration setup for ram blocks...\n"); - ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, ®_result_idx); + /* + * Make sure that we parallelize the pinning on both size. + * For very large guests, doing this serially takes a really + * long time, so we have to 'interleave' the pinning locally + * by performing it before we receive the response from the + * destination that the pinning has completed. + */ + ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, + ®_result_idx, rdma->pin_all ? + qemu_rdma_reg_whole_ram_blocks : NULL); if (ret < 0) { ERROR(errp, "receiving remote info!\n"); return ret; @@ -2634,12 +2656,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, } if (rdma->pin_all) { - ret = qemu_rdma_reg_whole_ram_blocks(rdma, &rdma->local_ram_blocks); - if (ret) { - fprintf(stderr, "rdma migration: error source " - "registering ram blocks!\n"); - return ret; - } + s->pin_all_time = qemu_get_clock_ms(host_clock) - start; } else { int x = 0; for (x = 0; x < rdma->local_ram_blocks.num_blocks; x++) { @@ -2653,7 +2670,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, DDDPRINTF("Sending registration finish %" PRIu64 "...\n", flags); head.type = RDMA_CONTROL_REGISTER_FINISHED; - ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL); + ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL); if (ret < 0) { goto err; @@ -2801,8 +2818,10 @@ void rdma_start_outgoing_migration(void *opaque, DPRINTF("qemu_rdma_source_connect success\n"); + if (rdma->pin_all) { + s->pin_all_time = 0; + } s->file = qemu_fopen_rdma(rdma, "wb"); - s->total_time = qemu_get_clock_ms(rt_clock); migrate_fd_connect(s); return; err: diff --git a/migration.c b/migration.c index 62c6b85..17531e9 100644 --- a/migration.c +++ b/migration.c @@ -67,6 +67,7 @@ MigrationState *migrate_get_current(void) .bandwidth_limit = MAX_THROTTLE, .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE, .mbps = -1, + .pin_all_time = -1, }; return ¤t_migration; @@ -189,7 +190,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) break; case MIG_STATE_ACTIVE: info->has_status = true; - info->status = g_strdup("active"); + info->status = g_strdup(s->pin_all_time != 0 ? "active" : "pinning"); info->has_total_time = true; info->total_time = qemu_get_clock_ms(rt_clock) - s->total_time; @@ -208,6 +209,11 @@ MigrationInfo *qmp_query_migrate(Error **errp) info->ram->dirty_pages_rate = s->dirty_pages_rate; info->ram->mbps = s->mbps; + if (s->pin_all_time >= 0) { + info->has_pin_all_time = true; + info->pin_all_time = s->pin_all_time; + } + if (blk_mig_active()) { info->has_disk = true; info->disk = g_malloc0(sizeof(*info->disk)); @@ -227,6 +233,11 @@ MigrationInfo *qmp_query_migrate(Error **errp) info->has_downtime = true; info->downtime = s->downtime; + if (s->pin_all_time >= 0) { + info->has_pin_all_time = true; + info->pin_all_time = s->pin_all_time; + } + info->has_ram = true; info->ram = g_malloc0(sizeof(*info->ram)); info->ram->transferred = ram_bytes_transferred(); diff --git a/qapi-schema.json b/qapi-schema.json index 4e9cd6d..dfae59a 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -586,7 +586,8 @@ '*xbzrle-cache': 'XBZRLECacheStats', '*total-time': 'int', '*expected-downtime': 'int', - '*downtime': 'int'} } + '*downtime': 'int', + '*pin-all-time' : 'int'} } ## # @query-migrate