mbox

[RFC,0/7] Migration stats

Message ID 1344855057-32509-1-git-send-email-quintela@redhat.com
State New
Headers show

Pull-request

http://repo.or.cz/r/qemu/quintela.git migration-stats

Message

Juan Quintela Aug. 13, 2012, 10:50 a.m. UTC
Hi

This modifies the output of info migrate/qmp_query_migrate to add the
stats that I got request for.

- It moves total time to MigrationInfo instead of ram (luiz suggestion)
- Prints the real downtime that we have had

  really, it prints the total downtime of the complete phase, but the
  downtime also includes the last ram_iterate phase.  Working on
  fixing that one.

- Prints the expected downtime of the last time that we synchronized
  the dirty bitmap with kvm.  So we have one idea of what downtime
  value we need for migration to converge.

- Prints the dirty_pages_rate, that is the number of pages that we
  have written in the last second.  This one prints always zero.  To
  fill it, I need the dirty bitmap changes on the migration_thread
  series.

Patch series apply on top of the migration-next-20120808 series sent
to anthony.

What do I want to know:

- is there any stat that you want?  Once here, adding a new one should
  be easy.

- examples are not done, waiting until people agree with what params
  are needed.

- luiz added in case he has QMP commets.

- erik added for libvirt comments.

Added before is the link to the branch on my repository.

The following changes since commit 346fe0c4c0b88f11a3d0c01c34d9a170d73429cc:

  Merge remote-tracking branch 'stefanha/trivial-patches' into staging (2012-08-11 19:49:03 -0500)

are available in the git repository at:


  http://repo.or.cz/r/qemu/quintela.git migration-stats

for you to fetch changes up to e0599012abfc4f9a68185c6f0a10a7b98c0a180f:

  migration: Add dirty_pages_rate to query migrate output (2012-08-13 12:33:35 +0200)

Please review, and comment.

Juan Quintela (7):
  migration: move total_time from ram stats to migration info
  migration: store end_time in a local variable
  migration: print total downtime for final phase of migration
  migration: rename expected_time to expected_downtime
  migration: export migration_get_current()
  migration: print expected downtime in info migrate
  migration: Add dirty_pages_rate to query migrate output

 arch_init.c      | 19 +++++++++++--------
 hmp.c            | 16 ++++++++++++++--
 migration.c      | 19 ++++++++++++++-----
 migration.h      |  4 ++++
 qapi-schema.json | 26 +++++++++++++++++++-------
 5 files changed, 62 insertions(+), 22 deletions(-)

Comments

Eric Blake Aug. 13, 2012, 2:59 p.m. UTC | #1
On 08/13/2012 04:50 AM, Juan Quintela wrote:
> Hi
> 
> This modifies the output of info migrate/qmp_query_migrate to add the
> stats that I got request for.
> 
> - It moves total time to MigrationInfo instead of ram (luiz suggestion)

Now's the time to do this, since the stat is new to 1.2 and we haven't
yet made a release with it.

Should we also rename 'MigrationInfo' to 'MigrationRamInfo', so that we
don't make a similar mistake in the future of putting a stat in the
wrong category?

> What do I want to know:
> 
> - is there any stat that you want?  Once here, adding a new one should
>   be easy.

Should we have some sort of stat for the number of pages that are sent
more than once, and/or for the maximum count of times that a given page
was sent?  Having hot/cold page analysis might make it easier to decide
in the future which pages to avoid sending until the very end, and
knowing how many pages are sent multiple times as well as the maximum
times any one page is sent might help.

> 
> - examples are not done, waiting until people agree with what params
>   are needed.

Fair enough for RFC purposes.

> 
> - luiz added in case he has QMP commets.
> 
> - erik added for libvirt comments.

Eric, actually.
Juan Quintela Aug. 13, 2012, 3:08 p.m. UTC | #2
Eric Blake <eblake@redhat.com> wrote:
> On 08/13/2012 04:50 AM, Juan Quintela wrote:
>> Hi
>> 
>> This modifies the output of info migrate/qmp_query_migrate to add the
>> stats that I got request for.
>> 
>> - It moves total time to MigrationInfo instead of ram (luiz suggestion)
>
> Now's the time to do this, since the stat is new to 1.2 and we haven't
> yet made a release with it.
>
> Should we also rename 'MigrationInfo' to 'MigrationRamInfo', so that we
> don't make a similar mistake in the future of putting a stat in the
> wrong category?

Luiz?  I dont' care one way or the other O:-)

>> What do I want to know:
>> 
>> - is there any stat that you want?  Once here, adding a new one should
>>   be easy.
>
> Should we have some sort of stat for the number of pages that are sent
> more than once, and/or for the maximum count of times that a given page
> was sent?  Having hot/cold page analysis might make it easier to decide
> in the future which pages to avoid sending until the very end, and
> knowing how many pages are sent multiple times as well as the maximum
> times any one page is sent might help.

That is not trivial to do without "duplicating" the bitmap. Bitmap is
already quite big on machines with huge memory.  Adding a field to show
how many times a page have been sent sounds too much.  Furthermore, how
could this information could be used externally.

>> 
>> - examples are not done, waiting until people agree with what params
>>   are needed.
>
> Fair enough for RFC purposes.
>
>> 
>> - luiz added in case he has QMP commets.
>> 
>> - erik added for libvirt comments.
>
> Eric, actually.

Sorry :p

Later, Juan.
Eric Blake Aug. 13, 2012, 3:14 p.m. UTC | #3
On 08/13/2012 09:08 AM, Juan Quintela wrote:
>>
>> Should we have some sort of stat for the number of pages that are sent
>> more than once, and/or for the maximum count of times that a given page
>> was sent?  Having hot/cold page analysis might make it easier to decide
>> in the future which pages to avoid sending until the very end, and
>> knowing how many pages are sent multiple times as well as the maximum
>> times any one page is sent might help.
> 
> That is not trivial to do without "duplicating" the bitmap. Bitmap is
> already quite big on machines with huge memory.  Adding a field to show
> how many times a page have been sent sounds too much.  Furthermore, how
> could this information could be used externally.

No big loss, then.  I was just wondering out loud, and don't really have
a particular need for this information.

I guess with XBZRLE, you _do_ have a count of how many cache hits you
had, which is somewhat related (a cache hit implies sending a page more
than once).
Luiz Capitulino Aug. 13, 2012, 7:47 p.m. UTC | #4
On Mon, 13 Aug 2012 08:59:00 -0600
Eric Blake <eblake@redhat.com> wrote:

> On 08/13/2012 04:50 AM, Juan Quintela wrote:
> > Hi
> > 
> > This modifies the output of info migrate/qmp_query_migrate to add the
> > stats that I got request for.
> > 
> > - It moves total time to MigrationInfo instead of ram (luiz suggestion)
> 
> Now's the time to do this, since the stat is new to 1.2 and we haven't
> yet made a release with it.
> 
> Should we also rename 'MigrationInfo' to 'MigrationRamInfo', so that we
> don't make a similar mistake in the future of putting a stat in the
> wrong category?

It also has disk migration info there.
Qunfang Zhang Aug. 16, 2012, 10:25 a.m. UTC | #5
Hi, Juan
I have a brief test with these patches applied and it's very useful.  
It's more precise and time-saving than calculate it with some other 
method for the downtime,etc.

Thank you,
Qunfang

On 08/13/2012 06:50 PM, Juan Quintela wrote:
> Hi
>
> This modifies the output of info migrate/qmp_query_migrate to add the
> stats that I got request for.
>
> - It moves total time to MigrationInfo instead of ram (luiz suggestion)
> - Prints the real downtime that we have had
>
>    really, it prints the total downtime of the complete phase, but the
>    downtime also includes the last ram_iterate phase.  Working on
>    fixing that one.
>
> - Prints the expected downtime of the last time that we synchronized
>    the dirty bitmap with kvm.  So we have one idea of what downtime
>    value we need for migration to converge.
>
> - Prints the dirty_pages_rate, that is the number of pages that we
>    have written in the last second.  This one prints always zero.  To
>    fill it, I need the dirty bitmap changes on the migration_thread
>    series.
>
> Patch series apply on top of the migration-next-20120808 series sent
> to anthony.
>
> What do I want to know:
>
> - is there any stat that you want?  Once here, adding a new one should
>    be easy.
>
> - examples are not done, waiting until people agree with what params
>    are needed.
>
> - luiz added in case he has QMP commets.
>
> - erik added for libvirt comments.
>
> Added before is the link to the branch on my repository.
>
> The following changes since commit 346fe0c4c0b88f11a3d0c01c34d9a170d73429cc:
>
>    Merge remote-tracking branch 'stefanha/trivial-patches' into staging (2012-08-11 19:49:03 -0500)
>
> are available in the git repository at:
>
>
>    http://repo.or.cz/r/qemu/quintela.git migration-stats
>
> for you to fetch changes up to e0599012abfc4f9a68185c6f0a10a7b98c0a180f:
>
>    migration: Add dirty_pages_rate to query migrate output (2012-08-13 12:33:35 +0200)
>
> Please review, and comment.
>
> Juan Quintela (7):
>    migration: move total_time from ram stats to migration info
>    migration: store end_time in a local variable
>    migration: print total downtime for final phase of migration
>    migration: rename expected_time to expected_downtime
>    migration: export migration_get_current()
>    migration: print expected downtime in info migrate
>    migration: Add dirty_pages_rate to query migrate output
>
>   arch_init.c      | 19 +++++++++++--------
>   hmp.c            | 16 ++++++++++++++--
>   migration.c      | 19 ++++++++++++++-----
>   migration.h      |  4 ++++
>   qapi-schema.json | 26 +++++++++++++++++++-------
>   5 files changed, 62 insertions(+), 22 deletions(-)
>
>