diff mbox

[for-2.8] migration: Fix return code of ram_save_iterate()

Message ID 1478265017-5700-1-git-send-email-thuth@redhat.com
State New
Headers show

Commit Message

Thomas Huth Nov. 4, 2016, 1:10 p.m. UTC
qemu_savevm_state_iterate() expects the iterators to return 1
when they are done, and 0 if there is still something left to do.
However, ram_save_iterate() does not obey this rule and returns
the number of saved pages instead. This causes a fatal hang with
ppc64 guests when you run QEMU like this (also works with TCG):

 qemu-img create -f qcow2  /tmp/test.qcow2 1M
 qemu-system-ppc64 -nographic -nodefaults -m 256 \
                   -hda /tmp/test.qcow2 -serial mon:stdio

... then switch to the monitor by pressing CTRL-a c and try to
save a snapshot with "savevm test1" for example.

After the first iteration, ram_save_iterate() always returns 0 here,
so that qemu_savevm_state_iterate() hangs in an endless loop and you
can only "kill -9" the QEMU process.
Fix it by using proper return values in ram_save_iterate().

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 migration/ram.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

David Gibson Nov. 8, 2016, 1:14 a.m. UTC | #1
On Fri, Nov 04, 2016 at 02:10:17PM +0100, Thomas Huth wrote:
> qemu_savevm_state_iterate() expects the iterators to return 1
> when they are done, and 0 if there is still something left to do.
> However, ram_save_iterate() does not obey this rule and returns
> the number of saved pages instead. This causes a fatal hang with
> ppc64 guests when you run QEMU like this (also works with TCG):
> 
>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>                    -hda /tmp/test.qcow2 -serial mon:stdio
> 
> ... then switch to the monitor by pressing CTRL-a c and try to
> save a snapshot with "savevm test1" for example.
> 
> After the first iteration, ram_save_iterate() always returns 0 here,
> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> can only "kill -9" the QEMU process.
> Fix it by using proper return values in ram_save_iterate().
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>

Hmm.  I think the change is technically correct, but I'm uneasy with
this approach to the solution.  The whole reason this wasn't caught
earlier is that almost nothing looks at the return value.  Without
changing that I think it's very likely someone will mess this up
again.

I think it would be preferable to change the return type to void to
make it explicit that this function is not directly returning the
"completion" status, but instead that's calculated from the other
progress variables it updates.

> ---
>  migration/ram.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index fb9252d..a1c8089 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>      int ret;
>      int i;
>      int64_t t0;
> -    int pages_sent = 0;
> +    int done = 0;
>  
>      rcu_read_lock();
>      if (ram_list.version != last_version) {
> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          pages = ram_find_and_save_block(f, false, &bytes_transferred);
>          /* no more pages to sent */
>          if (pages == 0) {
> +            done = 1;
>              break;
>          }
> -        pages_sent += pages;
>          acct_info.iterations++;
>  
>          /* we want to check in the 1st loop, just in case it was the 1st time
> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> -    return pages_sent;
> +    return done;
>  }
>  
>  /* Called with iothread lock */
Thomas Huth Nov. 8, 2016, 6:57 a.m. UTC | #2
On 08.11.2016 02:14, David Gibson wrote:
> On Fri, Nov 04, 2016 at 02:10:17PM +0100, Thomas Huth wrote:
>> qemu_savevm_state_iterate() expects the iterators to return 1
>> when they are done, and 0 if there is still something left to do.
>> However, ram_save_iterate() does not obey this rule and returns
>> the number of saved pages instead. This causes a fatal hang with
>> ppc64 guests when you run QEMU like this (also works with TCG):
>>
>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>                    -hda /tmp/test.qcow2 -serial mon:stdio
>>
>> ... then switch to the monitor by pressing CTRL-a c and try to
>> save a snapshot with "savevm test1" for example.
>>
>> After the first iteration, ram_save_iterate() always returns 0 here,
>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>> can only "kill -9" the QEMU process.
>> Fix it by using proper return values in ram_save_iterate().
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> 
> Hmm.  I think the change is technically correct, but I'm uneasy with
> this approach to the solution.  The whole reason this wasn't caught
> earlier is that almost nothing looks at the return value.  Without
> changing that I think it's very likely someone will mess this up
> again.
> 
> I think it would be preferable to change the return type to void to
> make it explicit that this function is not directly returning the
> "completion" status, but instead that's calculated from the other
> progress variables it updates.

Not sure how such a patch should finally look like. Could you propose a
patch?

Anyway, we're in soft freeze already ... do we still want such a major
change of the logic at this point in time? If not, we should maybe go
with fixing the return type only for 2.8, and do the major change for
2.9 instead?

 Thomas
Amit Shah Nov. 9, 2016, 7:18 a.m. UTC | #3
On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote:
> qemu_savevm_state_iterate() expects the iterators to return 1
> when they are done, and 0 if there is still something left to do.
> However, ram_save_iterate() does not obey this rule and returns
> the number of saved pages instead. This causes a fatal hang with
> ppc64 guests when you run QEMU like this (also works with TCG):

"works with" -- does that mean reproduces with?

>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>                    -hda /tmp/test.qcow2 -serial mon:stdio
> 
> ... then switch to the monitor by pressing CTRL-a c and try to
> save a snapshot with "savevm test1" for example.
> 
> After the first iteration, ram_save_iterate() always returns 0 here,
> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> can only "kill -9" the QEMU process.
> Fix it by using proper return values in ram_save_iterate().
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  migration/ram.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index fb9252d..a1c8089 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>      int ret;
>      int i;
>      int64_t t0;
> -    int pages_sent = 0;
> +    int done = 0;
>  
>      rcu_read_lock();
>      if (ram_list.version != last_version) {
> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          pages = ram_find_and_save_block(f, false, &bytes_transferred);
>          /* no more pages to sent */
>          if (pages == 0) {
> +            done = 1;
>              break;
>          }
> -        pages_sent += pages;
>          acct_info.iterations++;
>  
>          /* we want to check in the 1st loop, just in case it was the 1st time
> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> -    return pages_sent;
> +    return done;
>  }

I agree with David, we can just remove the return value.  The first
patch of the series can do that; and this one could become the 2nd
patch.  Should be OK for the soft freeze.

		Amit
Thomas Huth Nov. 9, 2016, 7:46 a.m. UTC | #4
On 09.11.2016 08:18, Amit Shah wrote:
> On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote:
>> qemu_savevm_state_iterate() expects the iterators to return 1
>> when they are done, and 0 if there is still something left to do.
>> However, ram_save_iterate() does not obey this rule and returns
>> the number of saved pages instead. This causes a fatal hang with
>> ppc64 guests when you run QEMU like this (also works with TCG):
> 
> "works with" -- does that mean reproduces with?

Yes, that's what I've meant: You can reproduce it with TCG (e.g. running
on a x86 system), too, there's no need for a real POWER machine with KVM
here.

>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>                    -hda /tmp/test.qcow2 -serial mon:stdio
>>
>> ... then switch to the monitor by pressing CTRL-a c and try to
>> save a snapshot with "savevm test1" for example.
>>
>> After the first iteration, ram_save_iterate() always returns 0 here,
>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>> can only "kill -9" the QEMU process.
>> Fix it by using proper return values in ram_save_iterate().
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>  migration/ram.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index fb9252d..a1c8089 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>      int ret;
>>      int i;
>>      int64_t t0;
>> -    int pages_sent = 0;
>> +    int done = 0;
>>  
>>      rcu_read_lock();
>>      if (ram_list.version != last_version) {
>> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>          pages = ram_find_and_save_block(f, false, &bytes_transferred);
>>          /* no more pages to sent */
>>          if (pages == 0) {
>> +            done = 1;
>>              break;
>>          }
>> -        pages_sent += pages;
>>          acct_info.iterations++;
>>  
>>          /* we want to check in the 1st loop, just in case it was the 1st time
>> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>          return ret;
>>      }
>>  
>> -    return pages_sent;
>> +    return done;
>>  }
> 
> I agree with David, we can just remove the return value.  The first
> patch of the series can do that; and this one could become the 2nd
> patch.  Should be OK for the soft freeze.

Sorry, I still did not quite get it - if I'd change the return type of
ram_save_iterate() and the other iterate functions to "void", how is
qemu_savevm_state_iterate() supposed to know whether all iterators are
done or not? And other iterators also use negative return values to
signal errors - should that then be handled via an "Error **" parameter
instead? ... my gut feeling still says that such a bigger rework (we've
got to touch all iterators for this!) should rather not be done right in
the middle of the freeze period...

 Thomas
David Gibson Nov. 9, 2016, 1:08 p.m. UTC | #5
On Wed, Nov 09, 2016 at 08:46:34AM +0100, Thomas Huth wrote:
> On 09.11.2016 08:18, Amit Shah wrote:
> > On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote:
> >> qemu_savevm_state_iterate() expects the iterators to return 1
> >> when they are done, and 0 if there is still something left to do.
> >> However, ram_save_iterate() does not obey this rule and returns
> >> the number of saved pages instead. This causes a fatal hang with
> >> ppc64 guests when you run QEMU like this (also works with TCG):
> > 
> > "works with" -- does that mean reproduces with?
> 
> Yes, that's what I've meant: You can reproduce it with TCG (e.g. running
> on a x86 system), too, there's no need for a real POWER machine with KVM
> here.
> 
> >>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
> >>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
> >>                    -hda /tmp/test.qcow2 -serial mon:stdio
> >>
> >> ... then switch to the monitor by pressing CTRL-a c and try to
> >> save a snapshot with "savevm test1" for example.
> >>
> >> After the first iteration, ram_save_iterate() always returns 0 here,
> >> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> >> can only "kill -9" the QEMU process.
> >> Fix it by using proper return values in ram_save_iterate().
> >>
> >> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >> ---
> >>  migration/ram.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index fb9252d..a1c8089 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> >>      int ret;
> >>      int i;
> >>      int64_t t0;
> >> -    int pages_sent = 0;
> >> +    int done = 0;
> >>  
> >>      rcu_read_lock();
> >>      if (ram_list.version != last_version) {
> >> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> >>          pages = ram_find_and_save_block(f, false, &bytes_transferred);
> >>          /* no more pages to sent */
> >>          if (pages == 0) {
> >> +            done = 1;
> >>              break;
> >>          }
> >> -        pages_sent += pages;
> >>          acct_info.iterations++;
> >>  
> >>          /* we want to check in the 1st loop, just in case it was the 1st time
> >> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> >>          return ret;
> >>      }
> >>  
> >> -    return pages_sent;
> >> +    return done;
> >>  }
> > 
> > I agree with David, we can just remove the return value.  The first
> > patch of the series can do that; and this one could become the 2nd
> > patch.  Should be OK for the soft freeze.
> 
> Sorry, I still did not quite get it - if I'd change the return type of
> ram_save_iterate() and the other iterate functions to "void", how is
> qemu_savevm_state_iterate() supposed to know whether all iterators are
> done or not?

It doesn't - it's return value is, in turn, mostly ignored by the
caller.

On the migration path we already determine whether to proceed or not
based purely on the separate state_pending callbacks.

For the savevm path, we don't really need the iteration phase at all -
we can jump straight to the completion phase, since downtime is not an
issue.

> And other iterators also use negative return values to
> signal errors

Ah.. that's a good point.  Possibly we should leave in the negative
codes for errors and just remove all positive return values.

> - should that then be handled via an "Error **" parameter
> instead? ... my gut feeling still says that such a bigger rework (we've
> got to touch all iterators for this!) should rather not be done right in
> the middle of the freeze period...

Yeah the errors could - and probably should - be handled with Error **
instead of return codes, but I also wonder if that's too much for soft
freeze.  I guess that's the call of the migration guys.
Dr. David Alan Gilbert Nov. 9, 2016, 3:13 p.m. UTC | #6
* Thomas Huth (thuth@redhat.com) wrote:
> On 09.11.2016 08:18, Amit Shah wrote:
> > On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote:
> >> qemu_savevm_state_iterate() expects the iterators to return 1
> >> when they are done, and 0 if there is still something left to do.
> >> However, ram_save_iterate() does not obey this rule and returns
> >> the number of saved pages instead. This causes a fatal hang with
> >> ppc64 guests when you run QEMU like this (also works with TCG):
> > 
> > "works with" -- does that mean reproduces with?
> 
> Yes, that's what I've meant: You can reproduce it with TCG (e.g. running
> on a x86 system), too, there's no need for a real POWER machine with KVM
> here.

How did you trigger it on x86?

Dave

> >>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
> >>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
> >>                    -hda /tmp/test.qcow2 -serial mon:stdio
> >>
> >> ... then switch to the monitor by pressing CTRL-a c and try to
> >> save a snapshot with "savevm test1" for example.
> >>
> >> After the first iteration, ram_save_iterate() always returns 0 here,
> >> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> >> can only "kill -9" the QEMU process.
> >> Fix it by using proper return values in ram_save_iterate().
> >>
> >> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >> ---
> >>  migration/ram.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index fb9252d..a1c8089 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> >>      int ret;
> >>      int i;
> >>      int64_t t0;
> >> -    int pages_sent = 0;
> >> +    int done = 0;
> >>  
> >>      rcu_read_lock();
> >>      if (ram_list.version != last_version) {
> >> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> >>          pages = ram_find_and_save_block(f, false, &bytes_transferred);
> >>          /* no more pages to sent */
> >>          if (pages == 0) {
> >> +            done = 1;
> >>              break;
> >>          }
> >> -        pages_sent += pages;
> >>          acct_info.iterations++;
> >>  
> >>          /* we want to check in the 1st loop, just in case it was the 1st time
> >> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> >>          return ret;
> >>      }
> >>  
> >> -    return pages_sent;
> >> +    return done;
> >>  }
> > 
> > I agree with David, we can just remove the return value.  The first
> > patch of the series can do that; and this one could become the 2nd
> > patch.  Should be OK for the soft freeze.
> 
> Sorry, I still did not quite get it - if I'd change the return type of
> ram_save_iterate() and the other iterate functions to "void", how is
> qemu_savevm_state_iterate() supposed to know whether all iterators are
> done or not? And other iterators also use negative return values to
> signal errors - should that then be handled via an "Error **" parameter
> instead? ... my gut feeling still says that such a bigger rework (we've
> got to touch all iterators for this!) should rather not be done right in
> the middle of the freeze period...
> 
>  Thomas
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Thomas Huth Nov. 9, 2016, 3:28 p.m. UTC | #7
On 09.11.2016 16:13, Dr. David Alan Gilbert wrote:
> * Thomas Huth (thuth@redhat.com) wrote:
>> On 09.11.2016 08:18, Amit Shah wrote:
>>> On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote:
>>>> qemu_savevm_state_iterate() expects the iterators to return 1
>>>> when they are done, and 0 if there is still something left to do.
>>>> However, ram_save_iterate() does not obey this rule and returns
>>>> the number of saved pages instead. This causes a fatal hang with
>>>> ppc64 guests when you run QEMU like this (also works with TCG):
>>>
>>> "works with" -- does that mean reproduces with?
>>
>> Yes, that's what I've meant: You can reproduce it with TCG (e.g. running
>> on a x86 system), too, there's no need for a real POWER machine with KVM
>> here.
> 
> How did you trigger it on x86?

As described below - qemu-img + qemu-system-ppc64 + savevm is enough to
trigger it on a x86 host.

> 
>>>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>>>                    -hda /tmp/test.qcow2 -serial mon:stdio
>>>>
>>>> ... then switch to the monitor by pressing CTRL-a c and try to
>>>> save a snapshot with "savevm test1" for example.

 Thomas
Dr. David Alan Gilbert Nov. 9, 2016, 3:32 p.m. UTC | #8
* Thomas Huth (thuth@redhat.com) wrote:
> On 09.11.2016 16:13, Dr. David Alan Gilbert wrote:
> > * Thomas Huth (thuth@redhat.com) wrote:
> >> On 09.11.2016 08:18, Amit Shah wrote:
> >>> On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote:
> >>>> qemu_savevm_state_iterate() expects the iterators to return 1
> >>>> when they are done, and 0 if there is still something left to do.
> >>>> However, ram_save_iterate() does not obey this rule and returns
> >>>> the number of saved pages instead. This causes a fatal hang with
> >>>> ppc64 guests when you run QEMU like this (also works with TCG):
> >>>
> >>> "works with" -- does that mean reproduces with?
> >>
> >> Yes, that's what I've meant: You can reproduce it with TCG (e.g. running
> >> on a x86 system), too, there's no need for a real POWER machine with KVM
> >> here.
> > 
> > How did you trigger it on x86?
> 
> As described below - qemu-img + qemu-system-ppc64 + savevm is enough to
> trigger it on a x86 host.

Oh OK; so yes still ppc64 target.

Dave

> > 
> >>>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
> >>>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
> >>>>                    -hda /tmp/test.qcow2 -serial mon:stdio
> >>>>
> >>>> ... then switch to the monitor by pressing CTRL-a c and try to
> >>>> save a snapshot with "savevm test1" for example.
> 
>  Thomas
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Juan Quintela Nov. 14, 2016, 6:34 p.m. UTC | #9
Thomas Huth <thuth@redhat.com> wrote:
> qemu_savevm_state_iterate() expects the iterators to return 1
> when they are done, and 0 if there is still something left to do.
> However, ram_save_iterate() does not obey this rule and returns
> the number of saved pages instead. This causes a fatal hang with
> ppc64 guests when you run QEMU like this (also works with TCG):
>
>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>                    -hda /tmp/test.qcow2 -serial mon:stdio
>
> ... then switch to the monitor by pressing CTRL-a c and try to
> save a snapshot with "savevm test1" for example.
>
> After the first iteration, ram_save_iterate() always returns 0 here,
> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> can only "kill -9" the QEMU process.
> Fix it by using proper return values in ram_save_iterate().
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Applied.

I don't know how we broked this so much.

Thanks, Juan.
David Gibson Nov. 17, 2016, 3:45 a.m. UTC | #10
On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
> Thomas Huth <thuth@redhat.com> wrote:
> > qemu_savevm_state_iterate() expects the iterators to return 1
> > when they are done, and 0 if there is still something left to do.
> > However, ram_save_iterate() does not obey this rule and returns
> > the number of saved pages instead. This causes a fatal hang with
> > ppc64 guests when you run QEMU like this (also works with TCG):
> >
> >  qemu-img create -f qcow2  /tmp/test.qcow2 1M
> >  qemu-system-ppc64 -nographic -nodefaults -m 256 \
> >                    -hda /tmp/test.qcow2 -serial mon:stdio
> >
> > ... then switch to the monitor by pressing CTRL-a c and try to
> > save a snapshot with "savevm test1" for example.
> >
> > After the first iteration, ram_save_iterate() always returns 0 here,
> > so that qemu_savevm_state_iterate() hangs in an endless loop and you
> > can only "kill -9" the QEMU process.
> > Fix it by using proper return values in ram_save_iterate().
> >
> > Signed-off-by: Thomas Huth <thuth@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> Applied.
> 
> I don't know how we broked this so much.

Note that block save iterate has the same bug...
Thomas Huth Nov. 18, 2016, 8:13 a.m. UTC | #11
On 17.11.2016 04:45, David Gibson wrote:
> On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
>> Thomas Huth <thuth@redhat.com> wrote:
>>> qemu_savevm_state_iterate() expects the iterators to return 1
>>> when they are done, and 0 if there is still something left to do.
>>> However, ram_save_iterate() does not obey this rule and returns
>>> the number of saved pages instead. This causes a fatal hang with
>>> ppc64 guests when you run QEMU like this (also works with TCG):
>>>
>>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>>                    -hda /tmp/test.qcow2 -serial mon:stdio
>>>
>>> ... then switch to the monitor by pressing CTRL-a c and try to
>>> save a snapshot with "savevm test1" for example.
>>>
>>> After the first iteration, ram_save_iterate() always returns 0 here,
>>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>>> can only "kill -9" the QEMU process.
>>> Fix it by using proper return values in ram_save_iterate().
>>>
>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>
>> Applied.
>>
>> I don't know how we broked this so much.
> 
> Note that block save iterate has the same bug...

I think you're right. Care to send a patch?

 Thomas
Thomas Huth Dec. 16, 2016, 4:55 p.m. UTC | #12
On 18.11.2016 09:13, Thomas Huth wrote:
> On 17.11.2016 04:45, David Gibson wrote:
>> On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
>>> Thomas Huth <thuth@redhat.com> wrote:
>>>> qemu_savevm_state_iterate() expects the iterators to return 1
>>>> when they are done, and 0 if there is still something left to do.
>>>> However, ram_save_iterate() does not obey this rule and returns
>>>> the number of saved pages instead. This causes a fatal hang with
>>>> ppc64 guests when you run QEMU like this (also works with TCG):
>>>>
>>>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>>>                    -hda /tmp/test.qcow2 -serial mon:stdio
>>>>
>>>> ... then switch to the monitor by pressing CTRL-a c and try to
>>>> save a snapshot with "savevm test1" for example.
>>>>
>>>> After the first iteration, ram_save_iterate() always returns 0 here,
>>>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>>>> can only "kill -9" the QEMU process.
>>>> Fix it by using proper return values in ram_save_iterate().
>>>>
>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>
>>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>>
>>> Applied.
>>>
>>> I don't know how we broked this so much.
>>
>> Note that block save iterate has the same bug...
> 
> I think you're right. Care to send a patch?

Looking at this issue again ... could it be that block_save_iterate() is
currently just dead code?
As far as I can see, the ->save_live_iterate() handlers are only called
from qemu_savevm_state_iterate(), right? And qemu_savevm_state_iterate()
only calls the handlers if se->ops->is_active(se->opaque) returns true.
But block_is_active() seems to only return 0 during savevm, most likely
because qemu_savevm_state() explicitly sets the "blk" and "shared"
MigrationParams to zero.
So to me, it looks like we could also just remove block_save_iterate()
completely ... or did I miss something here?

 Thomas
Dr. David Alan Gilbert Dec. 16, 2016, 5:03 p.m. UTC | #13
* Thomas Huth (thuth@redhat.com) wrote:
> On 18.11.2016 09:13, Thomas Huth wrote:
> > On 17.11.2016 04:45, David Gibson wrote:
> >> On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
> >>> Thomas Huth <thuth@redhat.com> wrote:
> >>>> qemu_savevm_state_iterate() expects the iterators to return 1
> >>>> when they are done, and 0 if there is still something left to do.
> >>>> However, ram_save_iterate() does not obey this rule and returns
> >>>> the number of saved pages instead. This causes a fatal hang with
> >>>> ppc64 guests when you run QEMU like this (also works with TCG):
> >>>>
> >>>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
> >>>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
> >>>>                    -hda /tmp/test.qcow2 -serial mon:stdio
> >>>>
> >>>> ... then switch to the monitor by pressing CTRL-a c and try to
> >>>> save a snapshot with "savevm test1" for example.
> >>>>
> >>>> After the first iteration, ram_save_iterate() always returns 0 here,
> >>>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> >>>> can only "kill -9" the QEMU process.
> >>>> Fix it by using proper return values in ram_save_iterate().
> >>>>
> >>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >>>
> >>> Reviewed-by: Juan Quintela <quintela@redhat.com>
> >>>
> >>> Applied.
> >>>
> >>> I don't know how we broked this so much.
> >>
> >> Note that block save iterate has the same bug...
> > 
> > I think you're right. Care to send a patch?
> 
> Looking at this issue again ... could it be that block_save_iterate() is
> currently just dead code?
> As far as I can see, the ->save_live_iterate() handlers are only called
> from qemu_savevm_state_iterate(), right? And qemu_savevm_state_iterate()
> only calls the handlers if se->ops->is_active(se->opaque) returns true.
> But block_is_active() seems to only return 0 during savevm, most likely
> because qemu_savevm_state() explicitly sets the "blk" and "shared"
> MigrationParams to zero.
> So to me, it looks like we could also just remove block_save_iterate()
> completely ... or did I miss something here?

Doesn't it get called by migrate -b ?

Dave

>  Thomas
> 
> 



--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Thomas Huth Dec. 19, 2016, 4:30 p.m. UTC | #14
On 16.12.2016 18:03, Dr. David Alan Gilbert wrote:
> * Thomas Huth (thuth@redhat.com) wrote:
>> On 18.11.2016 09:13, Thomas Huth wrote:
>>> On 17.11.2016 04:45, David Gibson wrote:
>>>> On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
>>>>> Thomas Huth <thuth@redhat.com> wrote:
>>>>>> qemu_savevm_state_iterate() expects the iterators to return 1
>>>>>> when they are done, and 0 if there is still something left to do.
>>>>>> However, ram_save_iterate() does not obey this rule and returns
>>>>>> the number of saved pages instead. This causes a fatal hang with
>>>>>> ppc64 guests when you run QEMU like this (also works with TCG):
>>>>>>
>>>>>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>>>>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>>>>>                    -hda /tmp/test.qcow2 -serial mon:stdio
>>>>>>
>>>>>> ... then switch to the monitor by pressing CTRL-a c and try to
>>>>>> save a snapshot with "savevm test1" for example.
>>>>>>
>>>>>> After the first iteration, ram_save_iterate() always returns 0 here,
>>>>>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>>>>>> can only "kill -9" the QEMU process.
>>>>>> Fix it by using proper return values in ram_save_iterate().
>>>>>>
>>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>>
>>>>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>>>>
>>>>> Applied.
>>>>>
>>>>> I don't know how we broked this so much.
>>>>
>>>> Note that block save iterate has the same bug...
>>>
>>> I think you're right. Care to send a patch?
>>
>> Looking at this issue again ... could it be that block_save_iterate() is
>> currently just dead code?
>> As far as I can see, the ->save_live_iterate() handlers are only called
>> from qemu_savevm_state_iterate(), right? And qemu_savevm_state_iterate()
>> only calls the handlers if se->ops->is_active(se->opaque) returns true.
>> But block_is_active() seems to only return 0 during savevm, most likely
>> because qemu_savevm_state() explicitly sets the "blk" and "shared"
>> MigrationParams to zero.
>> So to me, it looks like we could also just remove block_save_iterate()
>> completely ... or did I miss something here?
> 
> Doesn't it get called by migrate -b ?

Ah, right, yes, I somehow missed that ... I probably shouldn't do such
experiments at the end of Friday afternoon ;-)

OK, so it seems that
- block_save_iterate() is not called during savevm at all
  (and thus the bad return code does not matter here)
- migrate -b runs block_save_iterate() but the return code is ignored in
  migration_thread()

So we do not have a real problem here, but I think we should still clean
up the return code of block_save_iterate() to be on the safe side for
the future...

 Thomas
John Snow Dec. 19, 2016, 8:19 p.m. UTC | #15
On 12/19/2016 11:30 AM, Thomas Huth wrote:
> On 16.12.2016 18:03, Dr. David Alan Gilbert wrote:
>> * Thomas Huth (thuth@redhat.com) wrote:
>>> On 18.11.2016 09:13, Thomas Huth wrote:
>>>> On 17.11.2016 04:45, David Gibson wrote:
>>>>> On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
>>>>>> Thomas Huth <thuth@redhat.com> wrote:
>>>>>>> qemu_savevm_state_iterate() expects the iterators to return 1
>>>>>>> when they are done, and 0 if there is still something left to do.
>>>>>>> However, ram_save_iterate() does not obey this rule and returns
>>>>>>> the number of saved pages instead. This causes a fatal hang with
>>>>>>> ppc64 guests when you run QEMU like this (also works with TCG):
>>>>>>>
>>>>>>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>>>>>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>>>>>>                    -hda /tmp/test.qcow2 -serial mon:stdio
>>>>>>>
>>>>>>> ... then switch to the monitor by pressing CTRL-a c and try to
>>>>>>> save a snapshot with "savevm test1" for example.
>>>>>>>
>>>>>>> After the first iteration, ram_save_iterate() always returns 0 here,
>>>>>>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>>>>>>> can only "kill -9" the QEMU process.
>>>>>>> Fix it by using proper return values in ram_save_iterate().
>>>>>>>
>>>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>>>
>>>>>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>>>>>
>>>>>> Applied.
>>>>>>
>>>>>> I don't know how we broked this so much.
>>>>>
>>>>> Note that block save iterate has the same bug...
>>>>
>>>> I think you're right. Care to send a patch?
>>>
>>> Looking at this issue again ... could it be that block_save_iterate() is
>>> currently just dead code?
>>> As far as I can see, the ->save_live_iterate() handlers are only called
>>> from qemu_savevm_state_iterate(), right? And qemu_savevm_state_iterate()
>>> only calls the handlers if se->ops->is_active(se->opaque) returns true.
>>> But block_is_active() seems to only return 0 during savevm, most likely
>>> because qemu_savevm_state() explicitly sets the "blk" and "shared"
>>> MigrationParams to zero.
>>> So to me, it looks like we could also just remove block_save_iterate()
>>> completely ... or did I miss something here?
>>
>> Doesn't it get called by migrate -b ?
> 
> Ah, right, yes, I somehow missed that ... I probably shouldn't do such
> experiments at the end of Friday afternoon ;-)
> 
> OK, so it seems that
> - block_save_iterate() is not called during savevm at all
>   (and thus the bad return code does not matter here)
> - migrate -b runs block_save_iterate() but the return code is ignored in
>   migration_thread()
> 
> So we do not have a real problem here, but I think we should still clean
> up the return code of block_save_iterate() to be on the safe side for
> the future...
> 
>  Thomas
> 
> 

If it confused you, it'll confuse someone else. Worth fixing for
consistency's sake alone.

--js
diff mbox

Patch

diff --git a/migration/ram.c b/migration/ram.c
index fb9252d..a1c8089 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1987,7 +1987,7 @@  static int ram_save_iterate(QEMUFile *f, void *opaque)
     int ret;
     int i;
     int64_t t0;
-    int pages_sent = 0;
+    int done = 0;
 
     rcu_read_lock();
     if (ram_list.version != last_version) {
@@ -2007,9 +2007,9 @@  static int ram_save_iterate(QEMUFile *f, void *opaque)
         pages = ram_find_and_save_block(f, false, &bytes_transferred);
         /* no more pages to sent */
         if (pages == 0) {
+            done = 1;
             break;
         }
-        pages_sent += pages;
         acct_info.iterations++;
 
         /* we want to check in the 1st loop, just in case it was the 1st time
@@ -2044,7 +2044,7 @@  static int ram_save_iterate(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    return pages_sent;
+    return done;
 }
 
 /* Called with iothread lock */