Patchwork do not call monitor_resume() from migrate_fd_put_buffer() error path

login
register
mail settings
Submitter Michael Tokarev
Date July 19, 2011, 11:46 a.m.
Message ID <20110719115511.609C553E5@gandalf.tls.msk.ru>
Download mbox | patch
Permalink /patch/105469/
State New
Headers show

Comments

Michael Tokarev - July 19, 2011, 11:46 a.m.
If we do, it results in double monitor_resume() (second being called
from migrate_fd_cleanup() anyway) and monitor suspend count becoming
negative.

Cc'ing people from `git blame' list for the lines in question: the
change fixes the problem but I'm not sure what the original intention
of this code was in this place.  Unfortunately noone replied to two
my attempts to raise this issue.

Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
---
 migration.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)
Jan Kiszka - July 19, 2011, 9:48 p.m.
On 2011-07-19 13:46, Michael Tokarev wrote:
> If we do, it results in double monitor_resume() (second being called
> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
> negative.
> 
> Cc'ing people from `git blame' list for the lines in question: the
> change fixes the problem but I'm not sure what the original intention
> of this code was in this place.  Unfortunately noone replied to two
> my attempts to raise this issue.
> 
> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
> ---
>  migration.c |    3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/migration.c b/migration.c
> index af3a1f2..115588c 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>      if (ret == -EAGAIN) {
>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>      } else if (ret < 0) {
> -        if (s->mon) {
> -            monitor_resume(s->mon);
> -        }
>          s->state = MIG_STATE_ERROR;
>          notifier_list_notify(&migration_state_notifiers);
>      }

Looks reasonable to me, but Marcelo should comment on this, specifically
which scenario once required the resume.

Jan
Marcelo Tosatti - July 20, 2011, 4:34 p.m.
On Tue, Jul 19, 2011 at 11:48:22PM +0200, Jan Kiszka wrote:
> On 2011-07-19 13:46, Michael Tokarev wrote:
> > If we do, it results in double monitor_resume() (second being called
> > from migrate_fd_cleanup() anyway) and monitor suspend count becoming
> > negative.
> > 
> > Cc'ing people from `git blame' list for the lines in question: the
> > change fixes the problem but I'm not sure what the original intention
> > of this code was in this place.  Unfortunately noone replied to two
> > my attempts to raise this issue.
> > 
> > Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
> > ---
> >  migration.c |    3 ---
> >  1 files changed, 0 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration.c b/migration.c
> > index af3a1f2..115588c 100644
> > --- a/migration.c
> > +++ b/migration.c
> > @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
> >      if (ret == -EAGAIN) {
> >          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
> >      } else if (ret < 0) {
> > -        if (s->mon) {
> > -            monitor_resume(s->mon);
> > -        }
> >          s->state = MIG_STATE_ERROR;
> >          notifier_list_notify(&migration_state_notifiers);
> >      }
> 
> Looks reasonable to me, but Marcelo should comment on this, specifically
> which scenario once required the resume.
> 
> Jan

If the monitor was suspended (migrate without -d), then this path must
resume. Should record that somewhere and check here.
Jan Kiszka - July 20, 2011, 10:06 p.m.
On 2011-07-20 18:34, Marcelo Tosatti wrote:
> On Tue, Jul 19, 2011 at 11:48:22PM +0200, Jan Kiszka wrote:
>> On 2011-07-19 13:46, Michael Tokarev wrote:
>>> If we do, it results in double monitor_resume() (second being called
>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>> negative.
>>>
>>> Cc'ing people from `git blame' list for the lines in question: the
>>> change fixes the problem but I'm not sure what the original intention
>>> of this code was in this place.  Unfortunately noone replied to two
>>> my attempts to raise this issue.
>>>
>>> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
>>> ---
>>>  migration.c |    3 ---
>>>  1 files changed, 0 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/migration.c b/migration.c
>>> index af3a1f2..115588c 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>>>      if (ret == -EAGAIN) {
>>>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>>>      } else if (ret < 0) {
>>> -        if (s->mon) {
>>> -            monitor_resume(s->mon);
>>> -        }
>>>          s->state = MIG_STATE_ERROR;
>>>          notifier_list_notify(&migration_state_notifiers);
>>>      }
>>
>> Looks reasonable to me, but Marcelo should comment on this, specifically
>> which scenario once required the resume.
>>
>> Jan
> 
> If the monitor was suspended (migrate without -d), then this path must
> resume. Should record that somewhere and check here.

It's clear that we need to resume. The question is in which case
migrate_fd_cleanup may not be called after an error in
migrate_fd_put_buffer.

Jan
Michael Tokarev - Aug. 3, 2011, 7:38 a.m.
So, can we decide on this somehow?  I don't see a code
path where we don't call monitor_resume at the end,
so the "intermediate" monitor_resume can be dropped.
This way we fix real bug.  If there will be other
problem from that, it can be fixed later - this will
mean that code path is found...

Should I resend the initial patch again?

Thanks,

/mjt

21.07.2011 02:06, Jan Kiszka wriote:
> On 2011-07-20 18:34, Marcelo Tosatti wrote:
>> On Tue, Jul 19, 2011 at 11:48:22PM +0200, Jan Kiszka wrote:
>>> On 2011-07-19 13:46, Michael Tokarev wrote:
>>>> If we do, it results in double monitor_resume() (second being called
>>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>>> negative.
>>>>
>>>> Cc'ing people from `git blame' list for the lines in question: the
>>>> change fixes the problem but I'm not sure what the original intention
>>>> of this code was in this place.  Unfortunately noone replied to two
>>>> my attempts to raise this issue.
>>>>
>>>> Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
>>>> ---
>>>>  migration.c |    3 ---
>>>>  1 files changed, 0 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/migration.c b/migration.c
>>>> index af3a1f2..115588c 100644
>>>> --- a/migration.c
>>>> +++ b/migration.c
>>>> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
>>>>      if (ret == -EAGAIN) {
>>>>          qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
>>>>      } else if (ret < 0) {
>>>> -        if (s->mon) {
>>>> -            monitor_resume(s->mon);
>>>> -        }
>>>>          s->state = MIG_STATE_ERROR;
>>>>          notifier_list_notify(&migration_state_notifiers);
>>>>      }
>>>
>>> Looks reasonable to me, but Marcelo should comment on this, specifically
>>> which scenario once required the resume.
>>>
>>> Jan
>>
>> If the monitor was suspended (migrate without -d), then this path must
>> resume. Should record that somewhere and check here.
> 
> It's clear that we need to resume. The question is in which case
> migrate_fd_cleanup may not be called after an error in
> migrate_fd_put_buffer.
> 
> Jan
>
Jan Kiszka - Aug. 3, 2011, 1:22 p.m.
On 2011-08-03 09:38, Michael Tokarev wrote:
> So, can we decide on this somehow?  I don't see a code
> path where we don't call monitor_resume at the end,
> so the "intermediate" monitor_resume can be dropped.
> This way we fix real bug.  If there will be other
> problem from that, it can be fixed later - this will
> mean that code path is found...

I do not see any reason for the spurious resume as well, so you may add my

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>

> 
> Should I resend the initial patch again?

May help to finally trigger the merge. Direct it to Luiz Capitulino as
the subsystem maintainer.

Jan

Patch

diff --git a/migration.c b/migration.c
index af3a1f2..115588c 100644
--- a/migration.c
+++ b/migration.c
@@ -330,9 +330,6 @@  ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -EAGAIN) {
         qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
     } else if (ret < 0) {
-        if (s->mon) {
-            monitor_resume(s->mon);
-        }
         s->state = MIG_STATE_ERROR;
         notifier_list_notify(&migration_state_notifiers);
     }