diff mbox series

migration/rdma: unegister fd handler

Message ID 20190122173111.29821-1-dgilbert@redhat.com
State New
Headers show
Series migration/rdma: unegister fd handler | expand

Commit Message

Dr. David Alan Gilbert Jan. 22, 2019, 5:31 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Unregister the fd handler before we destroy the channel,
otherwise we've got a race where we might land in the
fd handler just as we're closing the device.

(The race is quite data dependent, you just have to have
the right set of devices for it to trigger).

Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Peter Xu Jan. 23, 2019, 8:13 a.m. UTC | #1
On Tue, Jan 22, 2019 at 05:31:11PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Unregister the fd handler before we destroy the channel,
> otherwise we've got a race where we might land in the
> fd handler just as we're closing the device.
> 
> (The race is quite data dependent, you just have to have
> the right set of devices for it to trigger).
> 
> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(Could the crash happened because the same fd number is re-used after
 the RDMA channel was destroyed?  Then when the fd has an event, it'll
 be delivered to rdma_cm_poll_handler() while the fd is not really the
 RDMA channel handle any more)

Reviewed-by: Peter Xu <peterx@redhat.com>

Regards,
Dr. David Alan Gilbert Jan. 23, 2019, 9:31 a.m. UTC | #2
* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Jan 22, 2019 at 05:31:11PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Unregister the fd handler before we destroy the channel,
> > otherwise we've got a race where we might land in the
> > fd handler just as we're closing the device.
> > 
> > (The race is quite data dependent, you just have to have
> > the right set of devices for it to trigger).
> > 
> > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> (Could the crash happened because the same fd number is re-used after
>  the RDMA channel was destroyed?  Then when the fd has an event, it'll
>  be delivered to rdma_cm_poll_handler() while the fd is not really the
>  RDMA channel handle any more)

That's an interesting thought, I'd assumed it was just a race, but being
dependent on the fd numbering would explain why it was so delicate to
reproduce it.

> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks!

Dave

> Regards,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert Jan. 23, 2019, 11:44 a.m. UTC | #3
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Unregister the fd handler before we destroy the channel,
> otherwise we've got a race where we might land in the
> fd handler just as we're closing the device.
> 
> (The race is quite data dependent, you just have to have
> the right set of devices for it to trigger).
> 
> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Queued

> ---
>  migration/rdma.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 9b2e7e10aa..54a3c11540 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>          rdma->connected = false;
>      }
>  
> +    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
>      g_free(rdma->dest_blocks);
>      rdma->dest_blocks = NULL;
>  
> -- 
> 2.20.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Philippe Mathieu-Daudé Jan. 23, 2019, 6:30 p.m. UTC | #4
On 1/23/19 12:44 PM, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
>> Unregister the fd handler before we destroy the channel,
>> otherwise we've got a race where we might land in the
>> fd handler just as we're closing the device.
>>
>> (The race is quite data dependent, you just have to have
>> the right set of devices for it to trigger).
>>
>> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
>>
>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Queued

Did you fixed the patch subject typo? "un(r)egister"

>> ---
>>  migration/rdma.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 9b2e7e10aa..54a3c11540 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>          rdma->connected = false;
>>      }
>>  
>> +    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
>>      g_free(rdma->dest_blocks);
>>      rdma->dest_blocks = NULL;
>>  
>> -- 
>> 2.20.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
Dr. David Alan Gilbert Jan. 23, 2019, 6:31 p.m. UTC | #5
* Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
> On 1/23/19 12:44 PM, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> >> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>
> >> Unregister the fd handler before we destroy the channel,
> >> otherwise we've got a race where we might land in the
> >> fd handler just as we're closing the device.
> >>
> >> (The race is quite data dependent, you just have to have
> >> the right set of devices for it to trigger).
> >>
> >> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
> >>
> >> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > Queued
> 
> Did you fixed the patch subject typo? "un(r)egister"

Yes; fortunately I spotted it during building the pull :-)

Dave

> >> ---
> >>  migration/rdma.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/migration/rdma.c b/migration/rdma.c
> >> index 9b2e7e10aa..54a3c11540 100644
> >> --- a/migration/rdma.c
> >> +++ b/migration/rdma.c
> >> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
> >>          rdma->connected = false;
> >>      }
> >>  
> >> +    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
> >>      g_free(rdma->dest_blocks);
> >>      rdma->dest_blocks = NULL;
> >>  
> >> -- 
> >> 2.20.1
> >>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Maydell Feb. 14, 2019, 6:08 p.m. UTC | #6
On Tue, 22 Jan 2019 at 19:08, Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Unregister the fd handler before we destroy the channel,
> otherwise we've got a race where we might land in the
> fd handler just as we're closing the device.
>
> (The race is quite data dependent, you just have to have
> the right set of devices for it to trigger).
>
> Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/rdma.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 9b2e7e10aa..54a3c11540 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>          rdma->connected = false;
>      }
>
> +    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
>      g_free(rdma->dest_blocks);
>      rdma->dest_blocks = NULL;

Hi -- this patch makes coverity complain (CID 1398634),
because here we use rdma->channel without checking that it is NULL,
but later in the function we have an "if (rdma->channel)" test.
Should this code be conditional on rmda->channel being non-NULL,
or is the later test incorrect?

thanks
-- PMM
Dr. David Alan Gilbert Feb. 14, 2019, 6:37 p.m. UTC | #7
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Tue, 22 Jan 2019 at 19:08, Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Unregister the fd handler before we destroy the channel,
> > otherwise we've got a race where we might land in the
> > fd handler just as we're closing the device.
> >
> > (The race is quite data dependent, you just have to have
> > the right set of devices for it to trigger).
> >
> > Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  migration/rdma.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index 9b2e7e10aa..54a3c11540 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -2321,6 +2321,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
> >          rdma->connected = false;
> >      }
> >
> > +    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
> >      g_free(rdma->dest_blocks);
> >      rdma->dest_blocks = NULL;
> 
> Hi -- this patch makes coverity complain (CID 1398634),
> because here we use rdma->channel without checking that it is NULL,
> but later in the function we have an "if (rdma->channel)" test.
> Should this code be conditional on rmda->channel being non-NULL,
> or is the later test incorrect?

Yes, it's got a point - I can seg that.

I'll post a fix.

Dave

> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox series

Patch

diff --git a/migration/rdma.c b/migration/rdma.c
index 9b2e7e10aa..54a3c11540 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2321,6 +2321,7 @@  static void qemu_rdma_cleanup(RDMAContext *rdma)
         rdma->connected = false;
     }
 
+    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
     g_free(rdma->dest_blocks);
     rdma->dest_blocks = NULL;