Patchwork stop the iteration when too many pages is transferred

login
register
mail settings
Submitter Wen Congyang
Date Nov. 18, 2010, 2:32 a.m.
Message ID <4CE49053.3000608@cn.fujitsu.com>
Download mbox | patch
Permalink /patch/72031/
State New
Headers show

Comments

Wen Congyang - Nov. 18, 2010, 2:32 a.m.
When the total sent page size is larger than max_factor
times of the size of guest OS's memory, stop the
iteration.
The default value of max_factor is 3.

This is similar to XEN.


Signed-off-by: Wen Congyang

---
 arch_init.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)
Anthony Liguori - Nov. 20, 2010, 2:23 a.m.
On 11/17/2010 08:32 PM, Wen Congyang wrote:
> When the total sent page size is larger than max_factor
> times of the size of guest OS's memory, stop the
> iteration.
> The default value of max_factor is 3.
>
> This is similar to XEN.
>
>
> Signed-off-by: Wen Congyang
>   

I'm strongly opposed to doing this. I think Xen gets this totally wrong.

Migration is a contract. When you set the stop time, you're saying that
you want only want the guest to experience a fixed amount of downtime.
Stopping the guest after some arbitrary number of iterations makes the
downtime non-deterministic. With a very large guest, this could wreak
havoc causing dropped networking connections, etc.

It's totally unsafe.

If a management tool wants this behavior, they can set a timeout and
explicitly stop the guest during the live migration. IMHO, such a
management tool is not doing it's job properly but it still can be
implemented.

Regards,

Anthony Liguori

> ---
>  arch_init.c |   13 ++++++++++++-
>  1 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 4486925..67e90f8 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -212,6 +212,14 @@ uint64_t ram_bytes_total(void)
>      return total;
>  }
>  
> +static uint64_t ram_blocks_total(void)
> +{
> +    return ram_bytes_total() / TARGET_PAGE_SIZE;
> +}
> +
> +static uint64_t blocks_transferred = 0;
> +static int max_factor = 3;
> +
>  int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>  {
>      ram_addr_t addr;
> @@ -234,6 +242,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>          bytes_transferred = 0;
>          last_block = NULL;
>          last_offset = 0;
> +        blocks_transferred = 0;
>  
>          /* Make sure all dirty bits are set */
>          QLIST_FOREACH(block, &ram_list.blocks, next) {
> @@ -266,6 +275,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>  
>          bytes_sent = ram_save_block(f);
>          bytes_transferred += bytes_sent;
> +        blocks_transferred += !!bytes_sent;
>          if (bytes_sent == 0) { /* no more blocks */
>              break;
>          }
> @@ -295,7 +305,8 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>  
>      expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>  
> -    return (stage == 2) && (expected_time <= migrate_max_downtime());
> +    return (stage == 2) && ((expected_time <= migrate_max_downtime())
> +            || (blocks_transferred > ram_blocks_total() * max_factor));
>  }
>  
>  static inline void *host_from_stream_offset(QEMUFile *f,
>
Wen Congyang - Nov. 22, 2010, 2:25 a.m.
At 2010年11月20日 10:23, Anthony Liguori Write:
> On 11/17/2010 08:32 PM, Wen Congyang wrote:
>> When the total sent page size is larger than max_factor
>> times of the size of guest OS's memory, stop the
>> iteration.
>> The default value of max_factor is 3.
>>
>> This is similar to XEN.
>>
>>
>> Signed-off-by: Wen Congyang
>>   
> 
> I'm strongly opposed to doing this. I think Xen gets this totally wrong.
> 
> Migration is a contract. When you set the stop time, you're saying that
> you want only want the guest to experience a fixed amount of downtime.
> Stopping the guest after some arbitrary number of iterations makes the
> downtime non-deterministic. With a very large guest, this could wreak
> havoc causing dropped networking connections, etc.
> 

Thanks for your comment.
As a developer, I know the downtime.
But as a user, he does not know the downtime.
When he migrates a very large guest lively without setting the stop time,
he does not say "I want the guest to experience a fixed amount of downtime",
he only wants to migrate the guest in a short time, the migration should
be done during some minutes, not ever for ever.

If we set the stop time too larger, this could also wreak havoc causing
dropped networking connections, etc.

I think we can do it as the following:
1. If the user does not set the stop time, we should complete the
   migration in a short time.
2. If the user sets the stop time, we do it as now.


> It's totally unsafe.
> 
> If a management tool wants this behavior, they can set a timeout and
> explicitly stop the guest during the live migration. IMHO, such a
> management tool is not doing it's job properly but it still can be
> implemented.
> 
> Regards,
> 
> Anthony Liguori
>
KAMEZAWA Hiroyuki - Nov. 22, 2010, 7:11 a.m.
On Fri, 19 Nov 2010 20:23:55 -0600
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 11/17/2010 08:32 PM, Wen Congyang wrote:
> > When the total sent page size is larger than max_factor
> > times of the size of guest OS's memory, stop the
> > iteration.
> > The default value of max_factor is 3.
> >
> > This is similar to XEN.
> >
> >
> > Signed-off-by: Wen Congyang
> >   
> 
> I'm strongly opposed to doing this. I think Xen gets this totally wrong.
> 
> Migration is a contract. When you set the stop time, you're saying that
> you want only want the guest to experience a fixed amount of downtime.
> Stopping the guest after some arbitrary number of iterations makes the
> downtime non-deterministic. With a very large guest, this could wreak
> havoc causing dropped networking connections, etc.
> 
> It's totally unsafe.
> 
> If a management tool wants this behavior, they can set a timeout and
> explicitly stop the guest during the live migration. IMHO, such a
> management tool is not doing it's job properly but it still can be
> implemented.
> 

Hmm, is there any information available for management-tools about
"the reason migration failed was because migration never ends because
 of new dirty pages" or some ?

I'm grad if I know cold-migraton will success at high rate before stop machine
even when live migration failed by timeout. If the "network" or "target node is
too busy" is the reason of failure, cold migration will also be in trouble and
we'll see longer down time than expected.
I think it's helpful to show how the transfer was done, as "sent 3x pages of guest
pages but failed."

any idea ?

Thanks,
-Kame
Michael S. Tsirkin - Nov. 24, 2010, 9:23 a.m.
On Fri, Nov 19, 2010 at 08:23:55PM -0600, Anthony Liguori wrote:
> On 11/17/2010 08:32 PM, Wen Congyang wrote:
> > When the total sent page size is larger than max_factor
> > times of the size of guest OS's memory, stop the
> > iteration.
> > The default value of max_factor is 3.
> >
> > This is similar to XEN.
> >
> >
> > Signed-off-by: Wen Congyang
> >   
> 
> I'm strongly opposed to doing this. I think Xen gets this totally wrong.

Additionally, please don't just change the behaviour of existing commands.
Make any new behaviour opt-in.

Patch

diff --git a/arch_init.c b/arch_init.c
index 4486925..67e90f8 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -212,6 +212,14 @@  uint64_t ram_bytes_total(void)
     return total;
 }
 
+static uint64_t ram_blocks_total(void)
+{
+    return ram_bytes_total() / TARGET_PAGE_SIZE;
+}
+
+static uint64_t blocks_transferred = 0;
+static int max_factor = 3;
+
 int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
 {
     ram_addr_t addr;
@@ -234,6 +242,7 @@  int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         bytes_transferred = 0;
         last_block = NULL;
         last_offset = 0;
+        blocks_transferred = 0;
 
         /* Make sure all dirty bits are set */
         QLIST_FOREACH(block, &ram_list.blocks, next) {
@@ -266,6 +275,7 @@  int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
 
         bytes_sent = ram_save_block(f);
         bytes_transferred += bytes_sent;
+        blocks_transferred += !!bytes_sent;
         if (bytes_sent == 0) { /* no more blocks */
             break;
         }
@@ -295,7 +305,8 @@  int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
 
     expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
 
-    return (stage == 2) && (expected_time <= migrate_max_downtime());
+    return (stage == 2) && ((expected_time <= migrate_max_downtime())
+            || (blocks_transferred > ram_blocks_total() * max_factor));
 }
 
 static inline void *host_from_stream_offset(QEMUFile *f,