diff mbox series

[v2,8/8] migration: do not flush_compressed_data at the end of each iteration

Message ID 20180719121520.30026-9-xiaoguangrong@tencent.com
State New
Headers show
Series migration: compression optimization | expand

Commit Message

Xiao Guangrong July 19, 2018, 12:15 p.m. UTC
From: Xiao Guangrong <xiaoguangrong@tencent.com>

flush_compressed_data() needs to wait all compression threads to
finish their work, after that all threads are free until the
migration feeds new request to them, reducing its call can improve
the throughput and use CPU resource more effectively

We do not need to flush all threads at the end of iteration, the
data can be kept locally until the memory block is changed or
memory migration starts over in that case we will meet a dirtied
page which may still exists in compression threads's ring

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Peter Xu July 23, 2018, 5:49 a.m. UTC | #1
On Thu, Jul 19, 2018 at 08:15:20PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> flush_compressed_data() needs to wait all compression threads to
> finish their work, after that all threads are free until the
> migration feeds new request to them, reducing its call can improve
> the throughput and use CPU resource more effectively
> 
> We do not need to flush all threads at the end of iteration, the
> data can be kept locally until the memory block is changed or
> memory migration starts over in that case we will meet a dirtied
> page which may still exists in compression threads's ring
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 89305c7af5..fdab13821d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -315,6 +315,8 @@ struct RAMState {
>      uint64_t iterations;
>      /* number of dirty bits in the bitmap */
>      uint64_t migration_dirty_pages;
> +    /* last dirty_sync_count we have seen */
> +    uint64_t dirty_sync_count;

Better suffix it with "_prev" as well?  So that we can quickly
identify that it's only a cache and it can be different from the one
in the ram_counters.

>      /* protects modification of the bitmap */
>      QemuMutex bitmap_mutex;
>      /* The RAMBlock used in the last src_page_requests */
> @@ -2532,6 +2534,7 @@ static void ram_save_cleanup(void *opaque)
>      }
>  
>      xbzrle_cleanup();
> +    flush_compressed_data(*rsp);

Could I ask why do we need this considering that we have
compress_threads_save_cleanup() right down there?

>      compress_threads_save_cleanup();
>      ram_state_cleanup(rsp);
>  }
> @@ -3203,6 +3206,17 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  
>      ram_control_before_iterate(f, RAM_CONTROL_ROUND);
>  
> +    /*
> +     * if memory migration starts over, we will meet a dirtied page which
> +     * may still exists in compression threads's ring, so we should flush
> +     * the compressed data to make sure the new page is not overwritten by
> +     * the old one in the destination.
> +     */
> +    if (ram_counters.dirty_sync_count != rs->dirty_sync_count) {
> +        rs->dirty_sync_count = ram_counters.dirty_sync_count;
> +        flush_compressed_data(rs);
> +    }
> +
>      t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>      i = 0;
>      while ((ret = qemu_file_rate_limit(f)) == 0 ||
> @@ -3235,7 +3249,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          }
>          i++;
>      }
> -    flush_compressed_data(rs);

This looks sane to me, but I'd like to see how other people would
think about it too...

>      rcu_read_unlock();
>  
>      /*
> -- 
> 2.14.4
> 

Regards,
Xiao Guangrong July 23, 2018, 8:05 a.m. UTC | #2
On 07/23/2018 01:49 PM, Peter Xu wrote:
> On Thu, Jul 19, 2018 at 08:15:20PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> flush_compressed_data() needs to wait all compression threads to
>> finish their work, after that all threads are free until the
>> migration feeds new request to them, reducing its call can improve
>> the throughput and use CPU resource more effectively
>>
>> We do not need to flush all threads at the end of iteration, the
>> data can be kept locally until the memory block is changed or
>> memory migration starts over in that case we will meet a dirtied
>> page which may still exists in compression threads's ring
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>> ---
>>   migration/ram.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 89305c7af5..fdab13821d 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -315,6 +315,8 @@ struct RAMState {
>>       uint64_t iterations;
>>       /* number of dirty bits in the bitmap */
>>       uint64_t migration_dirty_pages;
>> +    /* last dirty_sync_count we have seen */
>> +    uint64_t dirty_sync_count;
> 
> Better suffix it with "_prev" as well?  So that we can quickly
> identify that it's only a cache and it can be different from the one
> in the ram_counters.

Indeed, will update it.

> 
>>       /* protects modification of the bitmap */
>>       QemuMutex bitmap_mutex;
>>       /* The RAMBlock used in the last src_page_requests */
>> @@ -2532,6 +2534,7 @@ static void ram_save_cleanup(void *opaque)
>>       }
>>   
>>       xbzrle_cleanup();
>> +    flush_compressed_data(*rsp);
> 
> Could I ask why do we need this considering that we have
> compress_threads_save_cleanup() right down there?

Dave ask it too. :(

"This is for the error condition, if any error occurred during live migration,
there is no chance to call ram_save_complete. After using the lockless
multithreads model, we assert all requests have been handled before destroy
the work threads."

That makes sure there is nothing left in the threads before doing
compress_threads_save_cleanup() as current behavior. For lockless
mutilthread model, we check if all requests are free before destroy
them.

> 
>>       compress_threads_save_cleanup();
>>       ram_state_cleanup(rsp);
>>   }
>> @@ -3203,6 +3206,17 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>   
>>       ram_control_before_iterate(f, RAM_CONTROL_ROUND);
>>   
>> +    /*
>> +     * if memory migration starts over, we will meet a dirtied page which
>> +     * may still exists in compression threads's ring, so we should flush
>> +     * the compressed data to make sure the new page is not overwritten by
>> +     * the old one in the destination.
>> +     */
>> +    if (ram_counters.dirty_sync_count != rs->dirty_sync_count) {
>> +        rs->dirty_sync_count = ram_counters.dirty_sync_count;
>> +        flush_compressed_data(rs);
>> +    }
>> +
>>       t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>       i = 0;
>>       while ((ret = qemu_file_rate_limit(f)) == 0 ||
>> @@ -3235,7 +3249,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>           }
>>           i++;
>>       }
>> -    flush_compressed_data(rs);
> 
> This looks sane to me, but I'd like to see how other people would
> think about it too...

Thank you a lot, Peter! :)
Peter Xu July 23, 2018, 8:35 a.m. UTC | #3
On Mon, Jul 23, 2018 at 04:05:21PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/23/2018 01:49 PM, Peter Xu wrote:
> > On Thu, Jul 19, 2018 at 08:15:20PM +0800, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > flush_compressed_data() needs to wait all compression threads to
> > > finish their work, after that all threads are free until the
> > > migration feeds new request to them, reducing its call can improve
> > > the throughput and use CPU resource more effectively
> > > 
> > > We do not need to flush all threads at the end of iteration, the
> > > data can be kept locally until the memory block is changed or
> > > memory migration starts over in that case we will meet a dirtied
> > > page which may still exists in compression threads's ring
> > > 
> > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > ---
> > >   migration/ram.c | 15 ++++++++++++++-
> > >   1 file changed, 14 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 89305c7af5..fdab13821d 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -315,6 +315,8 @@ struct RAMState {
> > >       uint64_t iterations;
> > >       /* number of dirty bits in the bitmap */
> > >       uint64_t migration_dirty_pages;
> > > +    /* last dirty_sync_count we have seen */
> > > +    uint64_t dirty_sync_count;
> > 
> > Better suffix it with "_prev" as well?  So that we can quickly
> > identify that it's only a cache and it can be different from the one
> > in the ram_counters.
> 
> Indeed, will update it.
> 
> > 
> > >       /* protects modification of the bitmap */
> > >       QemuMutex bitmap_mutex;
> > >       /* The RAMBlock used in the last src_page_requests */
> > > @@ -2532,6 +2534,7 @@ static void ram_save_cleanup(void *opaque)
> > >       }
> > >       xbzrle_cleanup();
> > > +    flush_compressed_data(*rsp);
> > 
> > Could I ask why do we need this considering that we have
> > compress_threads_save_cleanup() right down there?
> 
> Dave ask it too. :(
> 
> "This is for the error condition, if any error occurred during live migration,
> there is no chance to call ram_save_complete. After using the lockless
> multithreads model, we assert all requests have been handled before destroy
> the work threads."
> 
> That makes sure there is nothing left in the threads before doing
> compress_threads_save_cleanup() as current behavior. For lockless
> mutilthread model, we check if all requests are free before destroy
> them.

But why do we need to explicitly flush it here?  Now in
compress_threads_save_cleanup() we have qemu_fclose() on the buffers,
which logically will flush the data and clean up everything too.
Would that suffice?

> 
> > 
> > >       compress_threads_save_cleanup();
> > >       ram_state_cleanup(rsp);
> > >   }
> > > @@ -3203,6 +3206,17 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> > >       ram_control_before_iterate(f, RAM_CONTROL_ROUND);
> > > +    /*
> > > +     * if memory migration starts over, we will meet a dirtied page which
> > > +     * may still exists in compression threads's ring, so we should flush
> > > +     * the compressed data to make sure the new page is not overwritten by
> > > +     * the old one in the destination.
> > > +     */
> > > +    if (ram_counters.dirty_sync_count != rs->dirty_sync_count) {
> > > +        rs->dirty_sync_count = ram_counters.dirty_sync_count;
> > > +        flush_compressed_data(rs);
> > > +    }
> > > +
> > >       t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> > >       i = 0;
> > >       while ((ret = qemu_file_rate_limit(f)) == 0 ||
> > > @@ -3235,7 +3249,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> > >           }
> > >           i++;
> > >       }
> > > -    flush_compressed_data(rs);
> > 
> > This looks sane to me, but I'd like to see how other people would
> > think about it too...
> 
> Thank you a lot, Peter! :)

Welcome. :)

Regards,
Xiao Guangrong July 23, 2018, 8:53 a.m. UTC | #4
On 07/23/2018 04:35 PM, Peter Xu wrote:
> On Mon, Jul 23, 2018 at 04:05:21PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 07/23/2018 01:49 PM, Peter Xu wrote:
>>> On Thu, Jul 19, 2018 at 08:15:20PM +0800, guangrong.xiao@gmail.com wrote:
>>>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>>
>>>> flush_compressed_data() needs to wait all compression threads to
>>>> finish their work, after that all threads are free until the
>>>> migration feeds new request to them, reducing its call can improve
>>>> the throughput and use CPU resource more effectively
>>>>
>>>> We do not need to flush all threads at the end of iteration, the
>>>> data can be kept locally until the memory block is changed or
>>>> memory migration starts over in that case we will meet a dirtied
>>>> page which may still exists in compression threads's ring
>>>>
>>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>> ---
>>>>    migration/ram.c | 15 ++++++++++++++-
>>>>    1 file changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 89305c7af5..fdab13821d 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -315,6 +315,8 @@ struct RAMState {
>>>>        uint64_t iterations;
>>>>        /* number of dirty bits in the bitmap */
>>>>        uint64_t migration_dirty_pages;
>>>> +    /* last dirty_sync_count we have seen */
>>>> +    uint64_t dirty_sync_count;
>>>
>>> Better suffix it with "_prev" as well?  So that we can quickly
>>> identify that it's only a cache and it can be different from the one
>>> in the ram_counters.
>>
>> Indeed, will update it.
>>
>>>
>>>>        /* protects modification of the bitmap */
>>>>        QemuMutex bitmap_mutex;
>>>>        /* The RAMBlock used in the last src_page_requests */
>>>> @@ -2532,6 +2534,7 @@ static void ram_save_cleanup(void *opaque)
>>>>        }
>>>>        xbzrle_cleanup();
>>>> +    flush_compressed_data(*rsp);
>>>
>>> Could I ask why do we need this considering that we have
>>> compress_threads_save_cleanup() right down there?
>>
>> Dave ask it too. :(
>>
>> "This is for the error condition, if any error occurred during live migration,
>> there is no chance to call ram_save_complete. After using the lockless
>> multithreads model, we assert all requests have been handled before destroy
>> the work threads."
>>
>> That makes sure there is nothing left in the threads before doing
>> compress_threads_save_cleanup() as current behavior. For lockless
>> mutilthread model, we check if all requests are free before destroy
>> them.
> 
> But why do we need to explicitly flush it here?  Now in
> compress_threads_save_cleanup() we have qemu_fclose() on the buffers,
> which logically will flush the data and clean up everything too.
> Would that suffice?
> 

Yes, it's sufficient for current thread model, will drop it for now
and add it at the time when the lockless mutilthread model is applied. :)
Peter Xu July 23, 2018, 9:01 a.m. UTC | #5
On Mon, Jul 23, 2018 at 04:53:11PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/23/2018 04:35 PM, Peter Xu wrote:
> > On Mon, Jul 23, 2018 at 04:05:21PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 07/23/2018 01:49 PM, Peter Xu wrote:
> > > > On Thu, Jul 19, 2018 at 08:15:20PM +0800, guangrong.xiao@gmail.com wrote:
> > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > > > 
> > > > > flush_compressed_data() needs to wait all compression threads to
> > > > > finish their work, after that all threads are free until the
> > > > > migration feeds new request to them, reducing its call can improve
> > > > > the throughput and use CPU resource more effectively
> > > > > 
> > > > > We do not need to flush all threads at the end of iteration, the
> > > > > data can be kept locally until the memory block is changed or
> > > > > memory migration starts over in that case we will meet a dirtied
> > > > > page which may still exists in compression threads's ring
> > > > > 
> > > > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > > > ---
> > > > >    migration/ram.c | 15 ++++++++++++++-
> > > > >    1 file changed, 14 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > > index 89305c7af5..fdab13821d 100644
> > > > > --- a/migration/ram.c
> > > > > +++ b/migration/ram.c
> > > > > @@ -315,6 +315,8 @@ struct RAMState {
> > > > >        uint64_t iterations;
> > > > >        /* number of dirty bits in the bitmap */
> > > > >        uint64_t migration_dirty_pages;
> > > > > +    /* last dirty_sync_count we have seen */
> > > > > +    uint64_t dirty_sync_count;
> > > > 
> > > > Better suffix it with "_prev" as well?  So that we can quickly
> > > > identify that it's only a cache and it can be different from the one
> > > > in the ram_counters.
> > > 
> > > Indeed, will update it.
> > > 
> > > > 
> > > > >        /* protects modification of the bitmap */
> > > > >        QemuMutex bitmap_mutex;
> > > > >        /* The RAMBlock used in the last src_page_requests */
> > > > > @@ -2532,6 +2534,7 @@ static void ram_save_cleanup(void *opaque)
> > > > >        }
> > > > >        xbzrle_cleanup();
> > > > > +    flush_compressed_data(*rsp);
> > > > 
> > > > Could I ask why do we need this considering that we have
> > > > compress_threads_save_cleanup() right down there?
> > > 
> > > Dave ask it too. :(
> > > 
> > > "This is for the error condition, if any error occurred during live migration,
> > > there is no chance to call ram_save_complete. After using the lockless
> > > multithreads model, we assert all requests have been handled before destroy
> > > the work threads."
> > > 
> > > That makes sure there is nothing left in the threads before doing
> > > compress_threads_save_cleanup() as current behavior. For lockless
> > > mutilthread model, we check if all requests are free before destroy
> > > them.
> > 
> > But why do we need to explicitly flush it here?  Now in
> > compress_threads_save_cleanup() we have qemu_fclose() on the buffers,
> > which logically will flush the data and clean up everything too.
> > Would that suffice?
> > 
> 
> Yes, it's sufficient for current thread model, will drop it for now
> and add it at the time when the lockless mutilthread model is applied. :)

Ah I think I see your point.  Even if so I would think it better to do
any extra cleanup directly in compress_threads_save_cleanup() if
possible.

Regards,
Xiao Guangrong July 24, 2018, 7:29 a.m. UTC | #6
On 07/23/2018 05:01 PM, Peter Xu wrote:

>> Yes, it's sufficient for current thread model, will drop it for now
>> and add it at the time when the lockless mutilthread model is applied. :)
> 
> Ah I think I see your point.  Even if so I would think it better to do
> any extra cleanup directly in compress_threads_save_cleanup() if
> possible.
> 

Okay, got it.
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 89305c7af5..fdab13821d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -315,6 +315,8 @@  struct RAMState {
     uint64_t iterations;
     /* number of dirty bits in the bitmap */
     uint64_t migration_dirty_pages;
+    /* last dirty_sync_count we have seen */
+    uint64_t dirty_sync_count;
     /* protects modification of the bitmap */
     QemuMutex bitmap_mutex;
     /* The RAMBlock used in the last src_page_requests */
@@ -2532,6 +2534,7 @@  static void ram_save_cleanup(void *opaque)
     }
 
     xbzrle_cleanup();
+    flush_compressed_data(*rsp);
     compress_threads_save_cleanup();
     ram_state_cleanup(rsp);
 }
@@ -3203,6 +3206,17 @@  static int ram_save_iterate(QEMUFile *f, void *opaque)
 
     ram_control_before_iterate(f, RAM_CONTROL_ROUND);
 
+    /*
+     * if memory migration starts over, we will meet a dirtied page which
+     * may still exists in compression threads's ring, so we should flush
+     * the compressed data to make sure the new page is not overwritten by
+     * the old one in the destination.
+     */
+    if (ram_counters.dirty_sync_count != rs->dirty_sync_count) {
+        rs->dirty_sync_count = ram_counters.dirty_sync_count;
+        flush_compressed_data(rs);
+    }
+
     t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     i = 0;
     while ((ret = qemu_file_rate_limit(f)) == 0 ||
@@ -3235,7 +3249,6 @@  static int ram_save_iterate(QEMUFile *f, void *opaque)
         }
         i++;
     }
-    flush_compressed_data(rs);
     rcu_read_unlock();
 
     /*