Message ID | 1533562177-16447-11-git-send-email-lidongchen@tencent.com |
---|---|
State | New |
Headers | show |
Series | Enable postcopy RDMA live migration | expand |
* Lidong Chen (jemmy858585@gmail.com) wrote: > It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource. > If rdma->error_state is ture, the message may not send successfully. > and the cm event can also notify the peer qemu. > > Signed-off-by: Lidong Chen <lidongchen@tencent.com> How does this keep 'cancel' working; I added 32bce196344 last year to make that code also send the RDMA_CONTROL_ERROR in 'cancelling'. Dave > --- > migration/rdma.c | 11 ----------- > 1 file changed, 11 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index ae07515..e1498f2 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > int idx; > > if (rdma->cm_id && rdma->connected) { > - if ((rdma->error_state || > - migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) && > - !rdma->received_error) { > - RDMAControlHeader head = { .len = 0, > - .type = RDMA_CONTROL_ERROR, > - .repeat = 1, > - }; > - error_report("Early error. Sending error."); > - qemu_rdma_post_send_control(rdma, NULL, &head); > - } > - > rdma_disconnect(rdma->cm_id); > trace_qemu_rdma_cleanup_disconnect(); > rdma->connected = false; > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Aug 17, 2018 at 10:04 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Lidong Chen (jemmy858585@gmail.com) wrote: >> It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource. >> If rdma->error_state is ture, the message may not send successfully. >> and the cm event can also notify the peer qemu. >> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com> > > How does this keep 'cancel' working; I added 32bce196344 last year to > make that code also send the RDMA_CONTROL_ERROR in 'cancelling'. I guess send the RDMA_CONTROL_ERROR is to notify peer qemu close rdma connection, and to make sure the receive rdma_disconnect event. But the two sides qemu should cleanup rdma independently. maybe the destination qemu is hang. 1.the current qemu version already not wait for RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect, 2.for peer qemu, it's already poll rdma->channel->fd, compare to send RDMA_CONTROL_ERROR, maybe use cm event to notify peer qemu to quit is better. maybe the rdma is already in error_state, and RDMA_CONTROL_ERROR cannot send successfully. when cancel migraiton, the destination will receive RDMA_CM_EVENT_DISCONNECTED. 3. I prefer use RDMA_CONTROL_ERROR to notify some code logic error, not rdma connection error. > > Dave > >> --- >> migration/rdma.c | 11 ----------- >> 1 file changed, 11 deletions(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index ae07515..e1498f2 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) >> int idx; >> >> if (rdma->cm_id && rdma->connected) { >> - if ((rdma->error_state || >> - migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) && >> - !rdma->received_error) { >> - RDMAControlHeader head = { .len = 0, >> - .type = RDMA_CONTROL_ERROR, >> - .repeat = 1, >> - }; >> - error_report("Early error. Sending error."); >> - qemu_rdma_post_send_control(rdma, NULL, &head); >> - } >> - >> rdma_disconnect(rdma->cm_id); >> trace_qemu_rdma_cleanup_disconnect(); >> rdma->connected = false; >> -- >> 1.8.3.1 >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/migration/rdma.c b/migration/rdma.c index ae07515..e1498f2 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) int idx; if (rdma->cm_id && rdma->connected) { - if ((rdma->error_state || - migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) && - !rdma->received_error) { - RDMAControlHeader head = { .len = 0, - .type = RDMA_CONTROL_ERROR, - .repeat = 1, - }; - error_report("Early error. Sending error."); - qemu_rdma_post_send_control(rdma, NULL, &head); - } - rdma_disconnect(rdma->cm_id); trace_qemu_rdma_cleanup_disconnect(); rdma->connected = false;
It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource. If rdma->error_state is ture, the message may not send successfully. and the cm event can also notify the peer qemu. Signed-off-by: Lidong Chen <lidongchen@tencent.com> --- migration/rdma.c | 11 ----------- 1 file changed, 11 deletions(-)