Message ID | 20100413123318.0f2cd334@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, 13 Apr 2010 12:33:18 +0300 Izik Eidus <ieidus@redhat.com> wrote: > From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001 > From: Izik Eidus <ieidus@redhat.com> > Date: Tue, 13 Apr 2010 12:24:57 +0300 > Subject: [PATCH] fix migration with large mem Anyone ??? > > In cases of guests with large mem that have pages > that all their bytes content are the same, we will > spend alot of time reading the memory from the guest > (is_dup_page()) > > It is happening beacuse ram_save_live() function have > limit of how much we can send to the dest but not how > much we read from it, and in cases we have many is_dup_page() > hits, we might read huge amount of data without updating important > stuff like the timers... > > The guest lose all its repsonsibility and have many softlock ups > inside itself. > > this patch add limit on the size we can read from the guest each > iteration. > > Thanks. > > Signed-off-by: Izik Eidus <ieidus@redhat.com> > --- > arch_init.c | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index cfc03ea..e27b1a0 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH; > #define RAM_SAVE_FLAG_PAGE 0x08 > #define RAM_SAVE_FLAG_EOS 0x10 > > +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024 > + > static int is_dup_page(uint8_t *page, uint8_t ch) > { > uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch; > @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) > uint64_t bytes_transferred_last; > double bwidth = 0; > uint64_t expected_time = 0; > + int data_read = 0; > > if (stage < 0) { > cpu_physical_memory_set_dirty_tracking(0); > @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) > bytes_transferred_last = bytes_transferred; > bwidth = qemu_get_clock_ns(rt_clock); > > - while (!qemu_file_rate_limit(f)) { > + while (!qemu_file_rate_limit(f) && data_read < MAX_SAVE_BLOCK_READ) { > int ret; > > ret = ram_save_block(f); > + data_read += ret * TARGET_PAGE_SIZE; > bytes_transferred += ret * TARGET_PAGE_SIZE; > if (ret == 0) { /* no more blocks */ > break;
On 05/09/2010 02:29 PM, Izik Eidus wrote: > On Tue, 13 Apr 2010 12:33:18 +0300 > Izik Eidus<ieidus@redhat.com> wrote: > >> From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001 >> From: Izik Eidus<ieidus@redhat.com> >> Date: Tue, 13 Apr 2010 12:24:57 +0300 >> Subject: [PATCH] fix migration with large mem > > Anyone ??? Acked-by: Paolo Bonzini <pbonzini@redhat.com> Maybe this could be merged via uq/master. Paolo
On 05/09/2010 04:57 PM, Paolo Bonzini wrote: > On 05/09/2010 02:29 PM, Izik Eidus wrote: >> On Tue, 13 Apr 2010 12:33:18 +0300 >> Izik Eidus<ieidus@redhat.com> wrote: >> >>> From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001 >>> From: Izik Eidus<ieidus@redhat.com> >>> Date: Tue, 13 Apr 2010 12:24:57 +0300 >>> Subject: [PATCH] fix migration with large mem >> >> Anyone ??? > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > Maybe this could be merged via uq/master. It's not intrinsically kvm related.
On 04/13/2010 04:33 AM, Izik Eidus wrote: > From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001 > From: Izik Eidus<ieidus@redhat.com> > Date: Tue, 13 Apr 2010 12:24:57 +0300 > Subject: [PATCH] fix migration with large mem > > In cases of guests with large mem that have pages > that all their bytes content are the same, we will > spend alot of time reading the memory from the guest > (is_dup_page()) > > It is happening beacuse ram_save_live() function have > limit of how much we can send to the dest but not how > much we read from it, and in cases we have many is_dup_page() > hits, we might read huge amount of data without updating important > stuff like the timers... > > The guest lose all its repsonsibility and have many softlock ups > inside itself. > > this patch add limit on the size we can read from the guest each > iteration. > > Thanks. > > Signed-off-by: Izik Eidus<ieidus@redhat.com> > --- > arch_init.c | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index cfc03ea..e27b1a0 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH; > #define RAM_SAVE_FLAG_PAGE 0x08 > #define RAM_SAVE_FLAG_EOS 0x10 > > +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024 > + > static int is_dup_page(uint8_t *page, uint8_t ch) > { > uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch; > @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) > uint64_t bytes_transferred_last; > double bwidth = 0; > uint64_t expected_time = 0; > + int data_read = 0; > > if (stage< 0) { > cpu_physical_memory_set_dirty_tracking(0); > @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) > bytes_transferred_last = bytes_transferred; > bwidth = qemu_get_clock_ns(rt_clock); > > - while (!qemu_file_rate_limit(f)) { > + while (!qemu_file_rate_limit(f)&& data_read< MAX_SAVE_BLOCK_READ) { > The effect of this patch is that we'll never send more than 10mb/s during live migration? If so, it's totally wrong as a fix to the problem. It would be better to account the deduplicated pages as part of the rate limiting calculations. Regards, Anthony Liguori > int ret; > > ret = ram_save_block(f); > + data_read += ret * TARGET_PAGE_SIZE; > bytes_transferred += ret * TARGET_PAGE_SIZE; > if (ret == 0) { /* no more blocks */ > break; >
On Mon, 10 May 2010 15:24:20 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 04/13/2010 04:33 AM, Izik Eidus wrote: > > From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 > > 2001 From: Izik Eidus<ieidus@redhat.com> > > Date: Tue, 13 Apr 2010 12:24:57 +0300 > > Subject: [PATCH] fix migration with large mem > > > > In cases of guests with large mem that have pages > > that all their bytes content are the same, we will > > spend alot of time reading the memory from the guest > > (is_dup_page()) > > > > It is happening beacuse ram_save_live() function have > > limit of how much we can send to the dest but not how > > much we read from it, and in cases we have many is_dup_page() > > hits, we might read huge amount of data without updating important > > stuff like the timers... > > > > The guest lose all its repsonsibility and have many softlock ups > > inside itself. > > > > this patch add limit on the size we can read from the guest each > > iteration. > > > > Thanks. > > > > Signed-off-by: Izik Eidus<ieidus@redhat.com> > > --- > > arch_init.c | 6 +++++- > > 1 files changed, 5 insertions(+), 1 deletions(-) > > > > diff --git a/arch_init.c b/arch_init.c > > index cfc03ea..e27b1a0 100644 > > --- a/arch_init.c > > +++ b/arch_init.c > > @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH; > > #define RAM_SAVE_FLAG_PAGE 0x08 > > #define RAM_SAVE_FLAG_EOS 0x10 > > > > +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024 > > + > > static int is_dup_page(uint8_t *page, uint8_t ch) > > { > > uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch; > > @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, > > int stage, void *opaque) uint64_t bytes_transferred_last; > > double bwidth = 0; > > uint64_t expected_time = 0; > > + int data_read = 0; > > > > if (stage< 0) { > > cpu_physical_memory_set_dirty_tracking(0); > > @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, > > int stage, void *opaque) bytes_transferred_last = bytes_transferred; > > bwidth = qemu_get_clock_ns(rt_clock); > > > > - while (!qemu_file_rate_limit(f)) { > > + while (!qemu_file_rate_limit(f)&& data_read< > > MAX_SAVE_BLOCK_READ) { > > The effect of this patch is that we'll never send more than 10mb/s > during live migration? If so, it's totally wrong as a fix to the > problem. It is 100mb/s... (if I remember correct the migration code will run this thing 10 times for each iteration) My feeling is that limit it with the network 32mb/s limit is too low, reading memory for 100mb/s is not such a problem as long as we don`t read gigas out of memory every sec... (Still we want to optimize the billion of zeros cases of windows guests) Anyway if the above does not make sense to you, I will just change it into what you suggested So ? > > It would be better to account the deduplicated pages as part of the > rate limiting calculations. > > Regards, > > Anthony Liguori > > > int ret; > > > > ret = ram_save_block(f); > > + data_read += ret * TARGET_PAGE_SIZE; > > bytes_transferred += ret * TARGET_PAGE_SIZE; > > if (ret == 0) { /* no more blocks */ > > break; > > >
On 05/10/2010 04:45 PM, Izik Eidus wrote: > On Mon, 10 May 2010 15:24:20 -0500 > Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> On 04/13/2010 04:33 AM, Izik Eidus wrote: >> >>> From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 >>> 2001 From: Izik Eidus<ieidus@redhat.com> >>> Date: Tue, 13 Apr 2010 12:24:57 +0300 >>> Subject: [PATCH] fix migration with large mem >>> >>> In cases of guests with large mem that have pages >>> that all their bytes content are the same, we will >>> spend alot of time reading the memory from the guest >>> (is_dup_page()) >>> >>> It is happening beacuse ram_save_live() function have >>> limit of how much we can send to the dest but not how >>> much we read from it, and in cases we have many is_dup_page() >>> hits, we might read huge amount of data without updating important >>> stuff like the timers... >>> >>> The guest lose all its repsonsibility and have many softlock ups >>> inside itself. >>> >>> this patch add limit on the size we can read from the guest each >>> iteration. >>> >>> Thanks. >>> >>> Signed-off-by: Izik Eidus<ieidus@redhat.com> >>> --- >>> arch_init.c | 6 +++++- >>> 1 files changed, 5 insertions(+), 1 deletions(-) >>> >>> diff --git a/arch_init.c b/arch_init.c >>> index cfc03ea..e27b1a0 100644 >>> --- a/arch_init.c >>> +++ b/arch_init.c >>> @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH; >>> #define RAM_SAVE_FLAG_PAGE 0x08 >>> #define RAM_SAVE_FLAG_EOS 0x10 >>> >>> +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024 >>> + >>> static int is_dup_page(uint8_t *page, uint8_t ch) >>> { >>> uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch; >>> @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, >>> int stage, void *opaque) uint64_t bytes_transferred_last; >>> double bwidth = 0; >>> uint64_t expected_time = 0; >>> + int data_read = 0; >>> >>> if (stage< 0) { >>> cpu_physical_memory_set_dirty_tracking(0); >>> @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, >>> int stage, void *opaque) bytes_transferred_last = bytes_transferred; >>> bwidth = qemu_get_clock_ns(rt_clock); >>> >>> - while (!qemu_file_rate_limit(f)) { >>> + while (!qemu_file_rate_limit(f)&& data_read< >>> MAX_SAVE_BLOCK_READ) { >>> >> The effect of this patch is that we'll never send more than 10mb/s >> during live migration? If so, it's totally wrong as a fix to the >> problem. >> > It is 100mb/s... (if I remember correct the migration code will run > this thing 10 times for each iteration) > No, it only runs it once. > My feeling is that limit it with the network 32mb/s limit is too low, > reading memory for 100mb/s is not such a problem as long as we don`t > read gigas out of memory every sec... > You've limited bandwidth to 10 mb/sec. Even if it was 100 mb/sec a fixed limit is wrong. On a 10gbit (or 40gbit) link, 100 mb/sec is not enough. > (Still we want to optimize the billion of zeros cases of windows guests) > > Anyway if the above does not make sense to you, I will just change it > into what you suggested > > So ? > That would work for me. Regards, Anthony Liguori
diff --git a/arch_init.c b/arch_init.c index cfc03ea..e27b1a0 100644 --- a/arch_init.c +++ b/arch_init.c @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH; #define RAM_SAVE_FLAG_PAGE 0x08 #define RAM_SAVE_FLAG_EOS 0x10 +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024 + static int is_dup_page(uint8_t *page, uint8_t ch) { uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch; @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) uint64_t bytes_transferred_last; double bwidth = 0; uint64_t expected_time = 0; + int data_read = 0; if (stage < 0) { cpu_physical_memory_set_dirty_tracking(0); @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) bytes_transferred_last = bytes_transferred; bwidth = qemu_get_clock_ns(rt_clock); - while (!qemu_file_rate_limit(f)) { + while (!qemu_file_rate_limit(f) && data_read < MAX_SAVE_BLOCK_READ) { int ret; ret = ram_save_block(f); + data_read += ret * TARGET_PAGE_SIZE; bytes_transferred += ret * TARGET_PAGE_SIZE; if (ret == 0) { /* no more blocks */ break;