Message ID | 20210305075035.1852-4-jiangkunkun@huawei.com |
---|---|
State | New |
Headers | show |
Series | Some modifications about ram_save_host_page() | expand |
On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote: > Starting from pss->page, ram_save_host_page() will check every page > and send the dirty pages up to the end of the current host page or > the boundary of used_length of the block. If the host page size is > a huge page, the step "check" will take a lot of time. > > This will improve performance to use migration_bitmap_find_dirty(). Is there any measurement done? This looks like an optimization, but to me it seems to have changed a lot context that it doesn't need to... Do you think it'll also work to just look up dirty again and update pss->page properly if migration_bitmap_clear_dirty() returned zero? Thanks, > > Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> > Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> > --- > migration/ram.c | 39 +++++++++++++++++++-------------------- > 1 file changed, 19 insertions(+), 20 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 9fc5b2997c..28215aefe4 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, > int pages = 0; > size_t pagesize_bits = > qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS; > + unsigned long hostpage_boundary = > + QEMU_ALIGN_UP(pss->page + 1, pagesize_bits); > unsigned long start_page = pss->page; > int res; > > @@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, > int pages_this_iteration = 0; > > /* Check if the page is dirty and send it if it is */ > - if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) { > - pss->page++; > - continue; > - } > - > - pages_this_iteration = ram_save_target_page(rs, pss, last_stage); > - if (pages_this_iteration < 0) { > - return pages_this_iteration; > - } > + if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) { > + pages_this_iteration = ram_save_target_page(rs, pss, last_stage); > + if (pages_this_iteration < 0) { > + return pages_this_iteration; > + } > > - pages += pages_this_iteration; > - pss->page++; > - /* > - * Allow rate limiting to happen in the middle of huge pages if > - * something is sent in the current iteration. > - */ > - if (pagesize_bits > 1 && pages_this_iteration > 0) { > - migration_rate_limit(); > + pages += pages_this_iteration; > + /* > + * Allow rate limiting to happen in the middle of huge pages if > + * something is sent in the current iteration. > + */ > + if (pagesize_bits > 1 && pages_this_iteration > 0) { > + migration_rate_limit(); > + } > } > - } while ((pss->page & (pagesize_bits - 1)) && > + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page); > + } while ((pss->page < hostpage_boundary) && > offset_in_ramblock(pss->block, > ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)); > - /* The offset we leave with is the last one we looked at */ > - pss->page--; > + /* The offset we leave with is the min boundary of host page and block */ > + pss->page = MIN(pss->page, hostpage_boundary) - 1; > > res = ram_save_release_protection(rs, pss, start_page); > return (res < 0 ? res : pages); > -- > 2.23.0 >
Hi, On 2021/3/5 22:30, Peter Xu wrote: > On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote: >> Starting from pss->page, ram_save_host_page() will check every page >> and send the dirty pages up to the end of the current host page or >> the boundary of used_length of the block. If the host page size is >> a huge page, the step "check" will take a lot of time. >> >> This will improve performance to use migration_bitmap_find_dirty(). > Is there any measurement done? I tested it on Kunpeng 920. VM params: 1U 4G( page size 1G). The time of ram_save_host_page() in the last round of ram saving: before optimize: 9250us after optimize: 34us > This looks like an optimization, but to me it seems to have changed a lot > context that it doesn't need to... Do you think it'll also work to just look up > dirty again and update pss->page properly if migration_bitmap_clear_dirty() > returned zero? > > Thanks, This just inverted the body of the loop, suggested by @David Edmondson. Here is the v2[1]. Do you mean to change it like this? [1]: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/ Thanks, Kunkun Jiang >> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> >> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> >> --- >> migration/ram.c | 39 +++++++++++++++++++-------------------- >> 1 file changed, 19 insertions(+), 20 deletions(-) >> >> diff --git a/migration/ram.c b/migration/ram.c >> index 9fc5b2997c..28215aefe4 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, >> int pages = 0; >> size_t pagesize_bits = >> qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS; >> + unsigned long hostpage_boundary = >> + QEMU_ALIGN_UP(pss->page + 1, pagesize_bits); >> unsigned long start_page = pss->page; >> int res; >> >> @@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, >> int pages_this_iteration = 0; >> >> /* Check if the page is dirty and send it if it is */ >> - if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) { >> - pss->page++; >> - continue; >> - } >> - >> - pages_this_iteration = ram_save_target_page(rs, pss, last_stage); >> - if (pages_this_iteration < 0) { >> - return pages_this_iteration; >> - } >> + if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) { >> + pages_this_iteration = ram_save_target_page(rs, pss, last_stage); >> + if (pages_this_iteration < 0) { >> + return pages_this_iteration; >> + } >> >> - pages += pages_this_iteration; >> - pss->page++; >> - /* >> - * Allow rate limiting to happen in the middle of huge pages if >> - * something is sent in the current iteration. >> - */ >> - if (pagesize_bits > 1 && pages_this_iteration > 0) { >> - migration_rate_limit(); >> + pages += pages_this_iteration; >> + /* >> + * Allow rate limiting to happen in the middle of huge pages if >> + * something is sent in the current iteration. >> + */ >> + if (pagesize_bits > 1 && pages_this_iteration > 0) { >> + migration_rate_limit(); >> + } >> } >> - } while ((pss->page & (pagesize_bits - 1)) && >> + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page); >> + } while ((pss->page < hostpage_boundary) && >> offset_in_ramblock(pss->block, >> ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)); >> - /* The offset we leave with is the last one we looked at */ >> - pss->page--; >> + /* The offset we leave with is the min boundary of host page and block */ >> + pss->page = MIN(pss->page, hostpage_boundary) - 1; >> >> res = ram_save_release_protection(rs, pss, start_page); >> return (res < 0 ? res : pages); >> -- >> 2.23.0 >>
On Mon, Mar 08, 2021 at 09:58:02PM +0800, Kunkun Jiang wrote: > Hi, > > On 2021/3/5 22:30, Peter Xu wrote: > > On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote: > > > Starting from pss->page, ram_save_host_page() will check every page > > > and send the dirty pages up to the end of the current host page or > > > the boundary of used_length of the block. If the host page size is > > > a huge page, the step "check" will take a lot of time. > > > > > > This will improve performance to use migration_bitmap_find_dirty(). > > Is there any measurement done? > I tested it on Kunpeng 920. VM params: 1U 4G( page size 1G). > The time of ram_save_host_page() in the last round of ram saving: > before optimize: 9250us after optimize: 34us Looks like an idle VM, but still this is a great improvement. Would you mind add this into the commit message too? > > This looks like an optimization, but to me it seems to have changed a lot > > context that it doesn't need to... Do you think it'll also work to just look up > > dirty again and update pss->page properly if migration_bitmap_clear_dirty() > > returned zero? > > > > Thanks, > This just inverted the body of the loop, suggested by @David Edmondson. > Here is the v2[1]. Do you mean to change it like this? > > [1]: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/ I see, then it's okay - But indeed I still prefer your previous version. :) Thanks,
Hi, On 2021/3/9 5:36, Peter Xu wrote: > On Mon, Mar 08, 2021 at 09:58:02PM +0800, Kunkun Jiang wrote: >> Hi, >> >> On 2021/3/5 22:30, Peter Xu wrote: >>> On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote: >>>> Starting from pss->page, ram_save_host_page() will check every page >>>> and send the dirty pages up to the end of the current host page or >>>> the boundary of used_length of the block. If the host page size is >>>> a huge page, the step "check" will take a lot of time. >>>> >>>> This will improve performance to use migration_bitmap_find_dirty(). >>> Is there any measurement done? >> I tested it on Kunpeng 920. VM params: 1U 4G( page size 1G). >> The time of ram_save_host_page() in the last round of ram saving: >> before optimize: 9250us after optimize: 34us > Looks like an idle VM, but still this is a great improvement. Would you mind > add this into the commit message too? Ok, I will add it in the next version.😉 >>> This looks like an optimization, but to me it seems to have changed a lot >>> context that it doesn't need to... Do you think it'll also work to just look up >>> dirty again and update pss->page properly if migration_bitmap_clear_dirty() >>> returned zero? >>> >>> Thanks, >> This just inverted the body of the loop, suggested by @David Edmondson. >> Here is the v2[1]. Do you mean to change it like this? >> >> [1]: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/ > I see, then it's okay - But indeed I still prefer your previous version. :) > > Thanks, > Both versions are fine to me. This version may make the final code slightly cleaner, I think. Thanks, Kunkun Jiang
diff --git a/migration/ram.c b/migration/ram.c index 9fc5b2997c..28215aefe4 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, int pages = 0; size_t pagesize_bits = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS; + unsigned long hostpage_boundary = + QEMU_ALIGN_UP(pss->page + 1, pagesize_bits); unsigned long start_page = pss->page; int res; @@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, int pages_this_iteration = 0; /* Check if the page is dirty and send it if it is */ - if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) { - pss->page++; - continue; - } - - pages_this_iteration = ram_save_target_page(rs, pss, last_stage); - if (pages_this_iteration < 0) { - return pages_this_iteration; - } + if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) { + pages_this_iteration = ram_save_target_page(rs, pss, last_stage); + if (pages_this_iteration < 0) { + return pages_this_iteration; + } - pages += pages_this_iteration; - pss->page++; - /* - * Allow rate limiting to happen in the middle of huge pages if - * something is sent in the current iteration. - */ - if (pagesize_bits > 1 && pages_this_iteration > 0) { - migration_rate_limit(); + pages += pages_this_iteration; + /* + * Allow rate limiting to happen in the middle of huge pages if + * something is sent in the current iteration. + */ + if (pagesize_bits > 1 && pages_this_iteration > 0) { + migration_rate_limit(); + } } - } while ((pss->page & (pagesize_bits - 1)) && + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page); + } while ((pss->page < hostpage_boundary) && offset_in_ramblock(pss->block, ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)); - /* The offset we leave with is the last one we looked at */ - pss->page--; + /* The offset we leave with is the min boundary of host page and block */ + pss->page = MIN(pss->page, hostpage_boundary) - 1; res = ram_save_release_protection(rs, pss, start_page); return (res < 0 ? res : pages);