Message ID | 20190122173111.29821-1-dgilbert@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration/rdma: unegister fd handler | expand |
On Tue, Jan 22, 2019 at 05:31:11PM +0000, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Unregister the fd handler before we destroy the channel, > otherwise we've got a race where we might land in the > fd handler just as we're closing the device. > > (The race is quite data dependent, you just have to have > the right set of devices for it to trigger). > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> (Could the crash happened because the same fd number is re-used after the RDMA channel was destroyed? Then when the fd has an event, it'll be delivered to rdma_cm_poll_handler() while the fd is not really the RDMA channel handle any more) Reviewed-by: Peter Xu <peterx@redhat.com> Regards,
* Peter Xu (peterx@redhat.com) wrote: > On Tue, Jan 22, 2019 at 05:31:11PM +0000, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Unregister the fd handler before we destroy the channel, > > otherwise we've got a race where we might land in the > > fd handler just as we're closing the device. > > > > (The race is quite data dependent, you just have to have > > the right set of devices for it to trigger). > > > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > (Could the crash happened because the same fd number is re-used after > the RDMA channel was destroyed? Then when the fd has an event, it'll > be delivered to rdma_cm_poll_handler() while the fd is not really the > RDMA channel handle any more) That's an interesting thought, I'd assumed it was just a race, but being dependent on the fd numbering would explain why it was so delicate to reproduce it. > Reviewed-by: Peter Xu <peterx@redhat.com> Thanks! Dave > Regards, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Unregister the fd handler before we destroy the channel, > otherwise we've got a race where we might land in the > fd handler just as we're closing the device. > > (The race is quite data dependent, you just have to have > the right set of devices for it to trigger). > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Queued > --- > migration/rdma.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 9b2e7e10aa..54a3c11540 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > rdma->connected = false; > } > > + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > g_free(rdma->dest_blocks); > rdma->dest_blocks = NULL; > > -- > 2.20.1 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 1/23/19 12:44 PM, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: >> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> >> >> Unregister the fd handler before we destroy the channel, >> otherwise we've got a race where we might land in the >> fd handler just as we're closing the device. >> >> (The race is quite data dependent, you just have to have >> the right set of devices for it to trigger). >> >> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 >> >> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Queued Did you fixed the patch subject typo? "un(r)egister" >> --- >> migration/rdma.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 9b2e7e10aa..54a3c11540 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) >> rdma->connected = false; >> } >> >> + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); >> g_free(rdma->dest_blocks); >> rdma->dest_blocks = NULL; >> >> -- >> 2.20.1 >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >
* Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > On 1/23/19 12:44 PM, Dr. David Alan Gilbert wrote: > > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > >> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > >> > >> Unregister the fd handler before we destroy the channel, > >> otherwise we've got a race where we might land in the > >> fd handler just as we're closing the device. > >> > >> (The race is quite data dependent, you just have to have > >> the right set of devices for it to trigger). > >> > >> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > >> > >> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > Queued > > Did you fixed the patch subject typo? "un(r)egister" Yes; fortunately I spotted it during building the pull :-) Dave > >> --- > >> migration/rdma.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/migration/rdma.c b/migration/rdma.c > >> index 9b2e7e10aa..54a3c11540 100644 > >> --- a/migration/rdma.c > >> +++ b/migration/rdma.c > >> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > >> rdma->connected = false; > >> } > >> > >> + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > >> g_free(rdma->dest_blocks); > >> rdma->dest_blocks = NULL; > >> > >> -- > >> 2.20.1 > >> > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, 22 Jan 2019 at 19:08, Dr. David Alan Gilbert (git) <dgilbert@redhat.com> wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Unregister the fd handler before we destroy the channel, > otherwise we've got a race where we might land in the > fd handler just as we're closing the device. > > (The race is quite data dependent, you just have to have > the right set of devices for it to trigger). > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/rdma.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 9b2e7e10aa..54a3c11540 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > rdma->connected = false; > } > > + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > g_free(rdma->dest_blocks); > rdma->dest_blocks = NULL; Hi -- this patch makes coverity complain (CID 1398634), because here we use rdma->channel without checking that it is NULL, but later in the function we have an "if (rdma->channel)" test. Should this code be conditional on rmda->channel being non-NULL, or is the later test incorrect? thanks -- PMM
* Peter Maydell (peter.maydell@linaro.org) wrote: > On Tue, 22 Jan 2019 at 19:08, Dr. David Alan Gilbert (git) > <dgilbert@redhat.com> wrote: > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Unregister the fd handler before we destroy the channel, > > otherwise we've got a race where we might land in the > > fd handler just as we're closing the device. > > > > (The race is quite data dependent, you just have to have > > the right set of devices for it to trigger). > > > > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > --- > > migration/rdma.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/migration/rdma.c b/migration/rdma.c > > index 9b2e7e10aa..54a3c11540 100644 > > --- a/migration/rdma.c > > +++ b/migration/rdma.c > > @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > > rdma->connected = false; > > } > > > > + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); > > g_free(rdma->dest_blocks); > > rdma->dest_blocks = NULL; > > Hi -- this patch makes coverity complain (CID 1398634), > because here we use rdma->channel without checking that it is NULL, > but later in the function we have an "if (rdma->channel)" test. > Should this code be conditional on rmda->channel being non-NULL, > or is the later test incorrect? Yes, it's got a point - I can seg that. I'll post a fix. Dave > thanks > -- PMM -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/migration/rdma.c b/migration/rdma.c index 9b2e7e10aa..54a3c11540 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) rdma->connected = false; } + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); g_free(rdma->dest_blocks); rdma->dest_blocks = NULL;