diff mbox series

[v4,04/12] migration: avoid concurrent invoke channel_close by different threads

Message ID 1527673416-31268-5-git-send-email-lidongchen@tencent.com
State New
Headers show
Series Enable postcopy RDMA live migration | expand

Commit Message

858585 jemmy May 30, 2018, 9:43 a.m. UTC
From: Lidong Chen <jemmy858585@gmail.com>

The channel_close maybe invoked by different threads. For example, source
qemu invokes qemu_fclose in main thread, migration thread and return path
thread. Destination qemu invokes qemu_fclose in main thread, listen thread
and COLO incoming thread.

Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/qemu-file.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Dr. David Alan Gilbert May 30, 2018, 2:45 p.m. UTC | #1
* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
> 
> The channel_close maybe invoked by different threads. For example, source
> qemu invokes qemu_fclose in main thread, migration thread and return path
> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
> and COLO incoming thread.
> 
> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  migration/qemu-file.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 977b9ae..87d0f05 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -52,6 +52,7 @@ struct QEMUFile {
>      unsigned int iovcnt;
>  
>      int last_error;
> +    QemuMutex lock;

That could do with a comment saying what you're protecting

>  };
>  
>  /*
> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>  
>      f = g_new0(QEMUFile, 1);
>  
> +    qemu_mutex_init(&f->lock);
>      f->opaque = opaque;
>      f->ops = ops;
>      return f;
> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>      ret = qemu_file_get_error(f);
>  
>      if (f->ops->close) {
> +        qemu_mutex_lock(&f->lock);
>          int ret2 = f->ops->close(f->opaque);
> +        qemu_mutex_unlock(&f->lock);

OK, and at least for the RDMA code, if it calls
close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
2nd time.

>          if (ret >= 0) {
>              ret = ret2;
>          }
> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>      if (f->last_error) {
>          ret = f->last_error;
>      }
> +    qemu_mutex_destroy(&f->lock);
>      g_free(f);

Hmm but that's not safe; if two things really do call qemu_fclose()
on the same structure they race here and can end up destroying the lock
twice, or doing f->lock  after the 1st one has already g_free(f).


So lets go back a step.
I think:
  a) There should always be a separate QEMUFile* for
     to_src_file and from_src_file - I don't see where you open
     the 2nd one; I don't see your implementation of
     f->ops->get_return_path.
  b) I *think* that while the different threads might all call
     fclose(), I think there should only ever be one qemu_fclose
     call for each direction on the QEMUFile.

But now we have two problems:
  If (a) is true then f->lock  is separate on each one so
   doesn't really protect if the two directions are closed
   at once. (Assuming (b) is true)

  If (a) is false and we actually share a single QEMUFile then
 that race at the end happens.

Dave


>      trace_qemu_file_fclose();
>      return ret;
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
858585 jemmy May 31, 2018, 7:07 a.m. UTC | #2
On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> From: Lidong Chen <jemmy858585@gmail.com>
>>
>> The channel_close maybe invoked by different threads. For example, source
>> qemu invokes qemu_fclose in main thread, migration thread and return path
>> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
>> and COLO incoming thread.
>>
>> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  migration/qemu-file.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index 977b9ae..87d0f05 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -52,6 +52,7 @@ struct QEMUFile {
>>      unsigned int iovcnt;
>>
>>      int last_error;
>> +    QemuMutex lock;
>
> That could do with a comment saying what you're protecting
>
>>  };
>>
>>  /*
>> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>>
>>      f = g_new0(QEMUFile, 1);
>>
>> +    qemu_mutex_init(&f->lock);
>>      f->opaque = opaque;
>>      f->ops = ops;
>>      return f;
>> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>>      ret = qemu_file_get_error(f);
>>
>>      if (f->ops->close) {
>> +        qemu_mutex_lock(&f->lock);
>>          int ret2 = f->ops->close(f->opaque);
>> +        qemu_mutex_unlock(&f->lock);
>
> OK, and at least for the RDMA code, if it calls
> close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
> 2nd time.
>
>>          if (ret >= 0) {
>>              ret = ret2;
>>          }
>> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>>      if (f->last_error) {
>>          ret = f->last_error;
>>      }
>> +    qemu_mutex_destroy(&f->lock);
>>      g_free(f);
>
> Hmm but that's not safe; if two things really do call qemu_fclose()
> on the same structure they race here and can end up destroying the lock
> twice, or doing f->lock  after the 1st one has already g_free(f).

>
>
> So lets go back a step.
> I think:
>   a) There should always be a separate QEMUFile* for
>      to_src_file and from_src_file - I don't see where you open
>      the 2nd one; I don't see your implementation of
>      f->ops->get_return_path.

yes, current qemu version use a separate QEMUFile* for to_src_file and
from_src_file.
and the two QEMUFile point to one QIOChannelRDMA.

the f->ops->get_return_path is implemented by channel_output_ops or
channel_input_ops.

>   b) I *think* that while the different threads might all call
>      fclose(), I think there should only ever be one qemu_fclose
>      call for each direction on the QEMUFile.
>
> But now we have two problems:
>   If (a) is true then f->lock  is separate on each one so
>    doesn't really protect if the two directions are closed
>    at once. (Assuming (b) is true)

yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
qio_channel_finalize.

Thank you.

>
>   If (a) is false and we actually share a single QEMUFile then
>  that race at the end happens.
>
> Dave
>
>
>>      trace_qemu_file_fclose();
>>      return ret;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert May 31, 2018, 10:52 a.m. UTC | #3
* 858585 jemmy (jemmy858585@gmail.com) wrote:
> On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> > * Lidong Chen (jemmy858585@gmail.com) wrote:
> >> From: Lidong Chen <jemmy858585@gmail.com>
> >>
> >> The channel_close maybe invoked by different threads. For example, source
> >> qemu invokes qemu_fclose in main thread, migration thread and return path
> >> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
> >> and COLO incoming thread.
> >>
> >> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
> >>
> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> ---
> >>  migration/qemu-file.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> index 977b9ae..87d0f05 100644
> >> --- a/migration/qemu-file.c
> >> +++ b/migration/qemu-file.c
> >> @@ -52,6 +52,7 @@ struct QEMUFile {
> >>      unsigned int iovcnt;
> >>
> >>      int last_error;
> >> +    QemuMutex lock;
> >
> > That could do with a comment saying what you're protecting
> >
> >>  };
> >>
> >>  /*
> >> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
> >>
> >>      f = g_new0(QEMUFile, 1);
> >>
> >> +    qemu_mutex_init(&f->lock);
> >>      f->opaque = opaque;
> >>      f->ops = ops;
> >>      return f;
> >> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
> >>      ret = qemu_file_get_error(f);
> >>
> >>      if (f->ops->close) {
> >> +        qemu_mutex_lock(&f->lock);
> >>          int ret2 = f->ops->close(f->opaque);
> >> +        qemu_mutex_unlock(&f->lock);
> >
> > OK, and at least for the RDMA code, if it calls
> > close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
> > 2nd time.
> >
> >>          if (ret >= 0) {
> >>              ret = ret2;
> >>          }
> >> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
> >>      if (f->last_error) {
> >>          ret = f->last_error;
> >>      }
> >> +    qemu_mutex_destroy(&f->lock);
> >>      g_free(f);
> >
> > Hmm but that's not safe; if two things really do call qemu_fclose()
> > on the same structure they race here and can end up destroying the lock
> > twice, or doing f->lock  after the 1st one has already g_free(f).
> 
> >
> >
> > So lets go back a step.
> > I think:
> >   a) There should always be a separate QEMUFile* for
> >      to_src_file and from_src_file - I don't see where you open
> >      the 2nd one; I don't see your implementation of
> >      f->ops->get_return_path.
> 
> yes, current qemu version use a separate QEMUFile* for to_src_file and
> from_src_file.
> and the two QEMUFile point to one QIOChannelRDMA.
> 
> the f->ops->get_return_path is implemented by channel_output_ops or
> channel_input_ops.

Ah OK, yes that makes sense.

> >   b) I *think* that while the different threads might all call
> >      fclose(), I think there should only ever be one qemu_fclose
> >      call for each direction on the QEMUFile.
> >
> > But now we have two problems:
> >   If (a) is true then f->lock  is separate on each one so
> >    doesn't really protect if the two directions are closed
> >    at once. (Assuming (b) is true)
> 
> yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
> QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
> qio_channel_finalize.

OK, that sounds better.

Dave

> Thank you.
> 
> >
> >   If (a) is false and we actually share a single QEMUFile then
> >  that race at the end happens.
> >
> > Dave
> >
> >
> >>      trace_qemu_file_fclose();
> >>      return ret;
> >> --
> >> 1.8.3.1
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
858585 jemmy June 3, 2018, 1:50 p.m. UTC | #4
On Thu, May 31, 2018 at 6:52 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * 858585 jemmy (jemmy858585@gmail.com) wrote:
>> On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
>> <dgilbert@redhat.com> wrote:
>> > * Lidong Chen (jemmy858585@gmail.com) wrote:
>> >> From: Lidong Chen <jemmy858585@gmail.com>
>> >>
>> >> The channel_close maybe invoked by different threads. For example, source
>> >> qemu invokes qemu_fclose in main thread, migration thread and return path
>> >> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
>> >> and COLO incoming thread.
>> >>
>> >> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
>> >>
>> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >> ---
>> >>  migration/qemu-file.c | 5 +++++
>> >>  1 file changed, 5 insertions(+)
>> >>
>> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> >> index 977b9ae..87d0f05 100644
>> >> --- a/migration/qemu-file.c
>> >> +++ b/migration/qemu-file.c
>> >> @@ -52,6 +52,7 @@ struct QEMUFile {
>> >>      unsigned int iovcnt;
>> >>
>> >>      int last_error;
>> >> +    QemuMutex lock;
>> >
>> > That could do with a comment saying what you're protecting
>> >
>> >>  };
>> >>
>> >>  /*
>> >> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>> >>
>> >>      f = g_new0(QEMUFile, 1);
>> >>
>> >> +    qemu_mutex_init(&f->lock);
>> >>      f->opaque = opaque;
>> >>      f->ops = ops;
>> >>      return f;
>> >> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>> >>      ret = qemu_file_get_error(f);
>> >>
>> >>      if (f->ops->close) {
>> >> +        qemu_mutex_lock(&f->lock);
>> >>          int ret2 = f->ops->close(f->opaque);
>> >> +        qemu_mutex_unlock(&f->lock);
>> >
>> > OK, and at least for the RDMA code, if it calls
>> > close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
>> > 2nd time.
>> >
>> >>          if (ret >= 0) {
>> >>              ret = ret2;
>> >>          }
>> >> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>> >>      if (f->last_error) {
>> >>          ret = f->last_error;
>> >>      }
>> >> +    qemu_mutex_destroy(&f->lock);
>> >>      g_free(f);
>> >
>> > Hmm but that's not safe; if two things really do call qemu_fclose()
>> > on the same structure they race here and can end up destroying the lock
>> > twice, or doing f->lock  after the 1st one has already g_free(f).
>>
>> >
>> >
>> > So lets go back a step.
>> > I think:
>> >   a) There should always be a separate QEMUFile* for
>> >      to_src_file and from_src_file - I don't see where you open
>> >      the 2nd one; I don't see your implementation of
>> >      f->ops->get_return_path.
>>
>> yes, current qemu version use a separate QEMUFile* for to_src_file and
>> from_src_file.
>> and the two QEMUFile point to one QIOChannelRDMA.
>>
>> the f->ops->get_return_path is implemented by channel_output_ops or
>> channel_input_ops.
>
> Ah OK, yes that makes sense.
>
>> >   b) I *think* that while the different threads might all call
>> >      fclose(), I think there should only ever be one qemu_fclose
>> >      call for each direction on the QEMUFile.
>> >
>> > But now we have two problems:
>> >   If (a) is true then f->lock  is separate on each one so
>> >    doesn't really protect if the two directions are closed
>> >    at once. (Assuming (b) is true)
>>
>> yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
>> QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
>> qio_channel_finalize.
>
> OK, that sounds better.
>
> Dave
>

Hi Dave:
    Another way is protect channel_close in migration part, like
QemuMutex rp_mutex.
    As Daniel mentioned, QIOChannel impls are only intended to a single thread.
    https://www.mail-archive.com/qemu-devel@nongnu.org/msg530100.html

    which way is better? Does QIOChannel have the plan to support multi thread?
    Not only channel_close need lock between different threads,
writev_buffer write also
    need.

    thanks.


>> Thank you.
>>
>> >
>> >   If (a) is false and we actually share a single QEMUFile then
>> >  that race at the end happens.
>> >
>> > Dave
>> >
>> >
>> >>      trace_qemu_file_fclose();
>> >>      return ret;
>> >> --
>> >> 1.8.3.1
>> >>
>> > --
>> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
858585 jemmy June 3, 2018, 2:43 p.m. UTC | #5
On Sun, Jun 3, 2018 at 9:50 PM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Thu, May 31, 2018 at 6:52 PM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>> * 858585 jemmy (jemmy858585@gmail.com) wrote:
>>> On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
>>> <dgilbert@redhat.com> wrote:
>>> > * Lidong Chen (jemmy858585@gmail.com) wrote:
>>> >> From: Lidong Chen <jemmy858585@gmail.com>
>>> >>
>>> >> The channel_close maybe invoked by different threads. For example, source
>>> >> qemu invokes qemu_fclose in main thread, migration thread and return path
>>> >> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
>>> >> and COLO incoming thread.
>>> >>
>>> >> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
>>> >>
>>> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>> >> ---
>>> >>  migration/qemu-file.c | 5 +++++
>>> >>  1 file changed, 5 insertions(+)
>>> >>
>>> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>>> >> index 977b9ae..87d0f05 100644
>>> >> --- a/migration/qemu-file.c
>>> >> +++ b/migration/qemu-file.c
>>> >> @@ -52,6 +52,7 @@ struct QEMUFile {
>>> >>      unsigned int iovcnt;
>>> >>
>>> >>      int last_error;
>>> >> +    QemuMutex lock;
>>> >
>>> > That could do with a comment saying what you're protecting
>>> >
>>> >>  };
>>> >>
>>> >>  /*
>>> >> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>>> >>
>>> >>      f = g_new0(QEMUFile, 1);
>>> >>
>>> >> +    qemu_mutex_init(&f->lock);
>>> >>      f->opaque = opaque;
>>> >>      f->ops = ops;
>>> >>      return f;
>>> >> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>>> >>      ret = qemu_file_get_error(f);
>>> >>
>>> >>      if (f->ops->close) {
>>> >> +        qemu_mutex_lock(&f->lock);
>>> >>          int ret2 = f->ops->close(f->opaque);
>>> >> +        qemu_mutex_unlock(&f->lock);
>>> >
>>> > OK, and at least for the RDMA code, if it calls
>>> > close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
>>> > 2nd time.
>>> >
>>> >>          if (ret >= 0) {
>>> >>              ret = ret2;
>>> >>          }
>>> >> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>>> >>      if (f->last_error) {
>>> >>          ret = f->last_error;
>>> >>      }
>>> >> +    qemu_mutex_destroy(&f->lock);
>>> >>      g_free(f);
>>> >
>>> > Hmm but that's not safe; if two things really do call qemu_fclose()
>>> > on the same structure they race here and can end up destroying the lock
>>> > twice, or doing f->lock  after the 1st one has already g_free(f).
>>>
>>> >
>>> >
>>> > So lets go back a step.
>>> > I think:
>>> >   a) There should always be a separate QEMUFile* for
>>> >      to_src_file and from_src_file - I don't see where you open
>>> >      the 2nd one; I don't see your implementation of
>>> >      f->ops->get_return_path.
>>>
>>> yes, current qemu version use a separate QEMUFile* for to_src_file and
>>> from_src_file.
>>> and the two QEMUFile point to one QIOChannelRDMA.
>>>
>>> the f->ops->get_return_path is implemented by channel_output_ops or
>>> channel_input_ops.
>>
>> Ah OK, yes that makes sense.
>>
>>> >   b) I *think* that while the different threads might all call
>>> >      fclose(), I think there should only ever be one qemu_fclose
>>> >      call for each direction on the QEMUFile.
>>> >
>>> > But now we have two problems:
>>> >   If (a) is true then f->lock  is separate on each one so
>>> >    doesn't really protect if the two directions are closed
>>> >    at once. (Assuming (b) is true)
>>>
>>> yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
>>> QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
>>> qio_channel_finalize.
>>
>> OK, that sounds better.
>>
>> Dave
>>
>
> Hi Dave:
>     Another way is protect channel_close in migration part, like
> QemuMutex rp_mutex.
>     As Daniel mentioned, QIOChannel impls are only intended to a single thread.
>     https://www.mail-archive.com/qemu-devel@nongnu.org/msg530100.html
>
>     which way is better? Does QIOChannel have the plan to support multi thread?
>     Not only channel_close need lock between different threads,
> writev_buffer write also
>     need.
>
>     thanks.
>
>

I find qemu not call qemu_mutex_destroy to release rp_mutex in
migration_instance_finalize:(
although qemu_mutex_destroy is not necceesary, but it is a good practice to do.
it's better we fixed it.

>>> Thank you.
>>>
>>> >
>>> >   If (a) is false and we actually share a single QEMUFile then
>>> >  that race at the end happens.
>>> >
>>> > Dave
>>> >
>>> >
>>> >>      trace_qemu_file_fclose();
>>> >>      return ret;
>>> >> --
>>> >> 1.8.3.1
>>> >>
>>> > --
>>> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox series

Patch

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 977b9ae..87d0f05 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -52,6 +52,7 @@  struct QEMUFile {
     unsigned int iovcnt;
 
     int last_error;
+    QemuMutex lock;
 };
 
 /*
@@ -96,6 +97,7 @@  QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
 
     f = g_new0(QEMUFile, 1);
 
+    qemu_mutex_init(&f->lock);
     f->opaque = opaque;
     f->ops = ops;
     return f;
@@ -328,7 +330,9 @@  int qemu_fclose(QEMUFile *f)
     ret = qemu_file_get_error(f);
 
     if (f->ops->close) {
+        qemu_mutex_lock(&f->lock);
         int ret2 = f->ops->close(f->opaque);
+        qemu_mutex_unlock(&f->lock);
         if (ret >= 0) {
             ret = ret2;
         }
@@ -339,6 +343,7 @@  int qemu_fclose(QEMUFile *f)
     if (f->last_error) {
         ret = f->last_error;
     }
+    qemu_mutex_destroy(&f->lock);
     g_free(f);
     trace_qemu_file_fclose();
     return ret;