diff mbox series

[v3,3/3] migration/ram: Optimize ram_save_host_page()

Message ID 20210305075035.1852-4-jiangkunkun@huawei.com
State New
Headers show
Series Some modifications about ram_save_host_page() | expand

Commit Message

Kunkun Jiang March 5, 2021, 7:50 a.m. UTC
Starting from pss->page, ram_save_host_page() will check every page
and send the dirty pages up to the end of the current host page or
the boundary of used_length of the block. If the host page size is
a huge page, the step "check" will take a lot of time.

This will improve performance to use migration_bitmap_find_dirty().

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 migration/ram.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

Comments

Peter Xu March 5, 2021, 2:30 p.m. UTC | #1
On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote:
> Starting from pss->page, ram_save_host_page() will check every page
> and send the dirty pages up to the end of the current host page or
> the boundary of used_length of the block. If the host page size is
> a huge page, the step "check" will take a lot of time.
> 
> This will improve performance to use migration_bitmap_find_dirty().

Is there any measurement done?

This looks like an optimization, but to me it seems to have changed a lot
context that it doesn't need to... Do you think it'll also work to just look up
dirty again and update pss->page properly if migration_bitmap_clear_dirty()
returned zero?

Thanks,

> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
>  migration/ram.c | 39 +++++++++++++++++++--------------------
>  1 file changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9fc5b2997c..28215aefe4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>      int pages = 0;
>      size_t pagesize_bits =
>          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> +    unsigned long hostpage_boundary =
> +        QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
>      unsigned long start_page = pss->page;
>      int res;
>  
> @@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>          int pages_this_iteration = 0;
>  
>          /* Check if the page is dirty and send it if it is */
> -        if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> -            pss->page++;
> -            continue;
> -        }
> -
> -        pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
> -        if (pages_this_iteration < 0) {
> -            return pages_this_iteration;
> -        }
> +        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> +            pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
> +            if (pages_this_iteration < 0) {
> +                return pages_this_iteration;
> +            }
>  
> -        pages += pages_this_iteration;
> -        pss->page++;
> -        /*
> -         * Allow rate limiting to happen in the middle of huge pages if
> -         * something is sent in the current iteration.
> -         */
> -        if (pagesize_bits > 1 && pages_this_iteration > 0) {
> -            migration_rate_limit();
> +            pages += pages_this_iteration;
> +            /*
> +             * Allow rate limiting to happen in the middle of huge pages if
> +             * something is sent in the current iteration.
> +             */
> +            if (pagesize_bits > 1 && pages_this_iteration > 0) {
> +                migration_rate_limit();
> +            }
>          }
> -    } while ((pss->page & (pagesize_bits - 1)) &&
> +        pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
> +    } while ((pss->page < hostpage_boundary) &&
>               offset_in_ramblock(pss->block,
>                                  ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
> -    /* The offset we leave with is the last one we looked at */
> -    pss->page--;
> +    /* The offset we leave with is the min boundary of host page and block */
> +    pss->page = MIN(pss->page, hostpage_boundary) - 1;
>  
>      res = ram_save_release_protection(rs, pss, start_page);
>      return (res < 0 ? res : pages);
> -- 
> 2.23.0
>
Kunkun Jiang March 8, 2021, 1:58 p.m. UTC | #2
Hi,

On 2021/3/5 22:30, Peter Xu wrote:
> On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote:
>> Starting from pss->page, ram_save_host_page() will check every page
>> and send the dirty pages up to the end of the current host page or
>> the boundary of used_length of the block. If the host page size is
>> a huge page, the step "check" will take a lot of time.
>>
>> This will improve performance to use migration_bitmap_find_dirty().
> Is there any measurement done?
I tested it on Kunpeng 920.  VM params: 1U 4G( page size 1G).
The time of ram_save_host_page() in the last round of ram saving:
before optimize: 9250us               after optimize: 34us
> This looks like an optimization, but to me it seems to have changed a lot
> context that it doesn't need to... Do you think it'll also work to just look up
> dirty again and update pss->page properly if migration_bitmap_clear_dirty()
> returned zero?
>
> Thanks,
This just inverted the body of the loop, suggested by @David Edmondson.
Here is the v2[1]. Do you mean to change it like this?

[1]: 
http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/

Thanks,
Kunkun Jiang
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>   migration/ram.c | 39 +++++++++++++++++++--------------------
>>   1 file changed, 19 insertions(+), 20 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9fc5b2997c..28215aefe4 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>>       int pages = 0;
>>       size_t pagesize_bits =
>>           qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
>> +    unsigned long hostpage_boundary =
>> +        QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
>>       unsigned long start_page = pss->page;
>>       int res;
>>   
>> @@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>>           int pages_this_iteration = 0;
>>   
>>           /* Check if the page is dirty and send it if it is */
>> -        if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>> -            pss->page++;
>> -            continue;
>> -        }
>> -
>> -        pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
>> -        if (pages_this_iteration < 0) {
>> -            return pages_this_iteration;
>> -        }
>> +        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>> +            pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
>> +            if (pages_this_iteration < 0) {
>> +                return pages_this_iteration;
>> +            }
>>   
>> -        pages += pages_this_iteration;
>> -        pss->page++;
>> -        /*
>> -         * Allow rate limiting to happen in the middle of huge pages if
>> -         * something is sent in the current iteration.
>> -         */
>> -        if (pagesize_bits > 1 && pages_this_iteration > 0) {
>> -            migration_rate_limit();
>> +            pages += pages_this_iteration;
>> +            /*
>> +             * Allow rate limiting to happen in the middle of huge pages if
>> +             * something is sent in the current iteration.
>> +             */
>> +            if (pagesize_bits > 1 && pages_this_iteration > 0) {
>> +                migration_rate_limit();
>> +            }
>>           }
>> -    } while ((pss->page & (pagesize_bits - 1)) &&
>> +        pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
>> +    } while ((pss->page < hostpage_boundary) &&
>>                offset_in_ramblock(pss->block,
>>                                   ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
>> -    /* The offset we leave with is the last one we looked at */
>> -    pss->page--;
>> +    /* The offset we leave with is the min boundary of host page and block */
>> +    pss->page = MIN(pss->page, hostpage_boundary) - 1;
>>   
>>       res = ram_save_release_protection(rs, pss, start_page);
>>       return (res < 0 ? res : pages);
>> -- 
>> 2.23.0
>>
Peter Xu March 8, 2021, 9:36 p.m. UTC | #3
On Mon, Mar 08, 2021 at 09:58:02PM +0800, Kunkun Jiang wrote:
> Hi,
> 
> On 2021/3/5 22:30, Peter Xu wrote:
> > On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote:
> > > Starting from pss->page, ram_save_host_page() will check every page
> > > and send the dirty pages up to the end of the current host page or
> > > the boundary of used_length of the block. If the host page size is
> > > a huge page, the step "check" will take a lot of time.
> > > 
> > > This will improve performance to use migration_bitmap_find_dirty().
> > Is there any measurement done?
> I tested it on Kunpeng 920.  VM params: 1U 4G( page size 1G).
> The time of ram_save_host_page() in the last round of ram saving:
> before optimize: 9250us               after optimize: 34us

Looks like an idle VM, but still this is a great improvement.  Would you mind
add this into the commit message too?

> > This looks like an optimization, but to me it seems to have changed a lot
> > context that it doesn't need to... Do you think it'll also work to just look up
> > dirty again and update pss->page properly if migration_bitmap_clear_dirty()
> > returned zero?
> > 
> > Thanks,
> This just inverted the body of the loop, suggested by @David Edmondson.
> Here is the v2[1]. Do you mean to change it like this?
> 
> [1]: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/

I see, then it's okay - But indeed I still prefer your previous version. :)

Thanks,
Kunkun Jiang March 9, 2021, 12:47 p.m. UTC | #4
Hi,

On 2021/3/9 5:36, Peter Xu wrote:
> On Mon, Mar 08, 2021 at 09:58:02PM +0800, Kunkun Jiang wrote:
>> Hi,
>>
>> On 2021/3/5 22:30, Peter Xu wrote:
>>> On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote:
>>>> Starting from pss->page, ram_save_host_page() will check every page
>>>> and send the dirty pages up to the end of the current host page or
>>>> the boundary of used_length of the block. If the host page size is
>>>> a huge page, the step "check" will take a lot of time.
>>>>
>>>> This will improve performance to use migration_bitmap_find_dirty().
>>> Is there any measurement done?
>> I tested it on Kunpeng 920.  VM params: 1U 4G( page size 1G).
>> The time of ram_save_host_page() in the last round of ram saving:
>> before optimize: 9250us               after optimize: 34us
> Looks like an idle VM, but still this is a great improvement.  Would you mind
> add this into the commit message too?
Ok, I will add it in the next version.😉
>>> This looks like an optimization, but to me it seems to have changed a lot
>>> context that it doesn't need to... Do you think it'll also work to just look up
>>> dirty again and update pss->page properly if migration_bitmap_clear_dirty()
>>> returned zero?
>>>
>>> Thanks,
>> This just inverted the body of the loop, suggested by @David Edmondson.
>> Here is the v2[1]. Do you mean to change it like this?
>>
>> [1]: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/
> I see, then it's okay - But indeed I still prefer your previous version. :)
>
> Thanks,
>
Both versions are fine to me. This version may make the final code 
slightly cleaner, I think.

Thanks,

Kunkun Jiang
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 9fc5b2997c..28215aefe4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1991,6 +1991,8 @@  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
     int pages = 0;
     size_t pagesize_bits =
         qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
+    unsigned long hostpage_boundary =
+        QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
     unsigned long start_page = pss->page;
     int res;
 
@@ -2003,30 +2005,27 @@  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
         int pages_this_iteration = 0;
 
         /* Check if the page is dirty and send it if it is */
-        if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
-            pss->page++;
-            continue;
-        }
-
-        pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
-        if (pages_this_iteration < 0) {
-            return pages_this_iteration;
-        }
+        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
+            pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
+            if (pages_this_iteration < 0) {
+                return pages_this_iteration;
+            }
 
-        pages += pages_this_iteration;
-        pss->page++;
-        /*
-         * Allow rate limiting to happen in the middle of huge pages if
-         * something is sent in the current iteration.
-         */
-        if (pagesize_bits > 1 && pages_this_iteration > 0) {
-            migration_rate_limit();
+            pages += pages_this_iteration;
+            /*
+             * Allow rate limiting to happen in the middle of huge pages if
+             * something is sent in the current iteration.
+             */
+            if (pagesize_bits > 1 && pages_this_iteration > 0) {
+                migration_rate_limit();
+            }
         }
-    } while ((pss->page & (pagesize_bits - 1)) &&
+        pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
+    } while ((pss->page < hostpage_boundary) &&
              offset_in_ramblock(pss->block,
                                 ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
-    /* The offset we leave with is the last one we looked at */
-    pss->page--;
+    /* The offset we leave with is the min boundary of host page and block */
+    pss->page = MIN(pss->page, hostpage_boundary) - 1;
 
     res = ram_save_release_protection(rs, pss, start_page);
     return (res < 0 ? res : pages);