diff mbox series

[v6,09/10] Add the function of colo_bitmap_clear_dirty

Message ID 1617938456-315058-10-git-send-email-lei.rao@intel.com
State New
Headers show
Series Fixed some bugs and optimized some codes for COLO | expand

Commit Message

Lei Rao April 9, 2021, 3:20 a.m. UTC
From: "Rao, Lei" <lei.rao@intel.com>

When we use continuous dirty memory copy for flushing ram cache on
secondary VM, we can also clean up the bitmap of contiguous dirty
page memory. This also can reduce the VM stop time during checkpoint.

The performance test for COLO as follow:

Server configuraton:
CPU :Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
MEM :251G(type:DDR4 Speed:2666 MT/s)
SSD :Intel 730 and DC S35x0/3610/3700 Series SSDs

dirty pages:3189376  migration_bitmap_clear_dirty time consuming(ns):105194000
dirty pages:3189784  migration_bitmap_clear_dirty time consuming(ns):105297000
dirty pages:3190501  migration_bitmap_clear_dirty time consuming(ns):105410000
dirty pages:3188734  migration_bitmap_clear_dirty time consuming(ns):105138000
dirty pages:3189464  migration_bitmap_clear_dirty time consuming(ns):111736000
dirty pages:3188558  migration_bitmap_clear_dirty time consuming(ns):105079000
dirty pages:3239489  migration_bitmap_clear_dirty time consuming(ns):106761000

dirty pages:3190240  colo_bitmap_clear_dirty time consuming(ns):8369000
dirty pages:3189293  colo_bitmap_clear_dirty time consuming(ns):8388000
dirty pages:3189171  colo_bitmap_clear_dirty time consuming(ns):8641000
dirty pages:3189099  colo_bitmap_clear_dirty time consuming(ns):8280000
dirty pages:3189974  colo_bitmap_clear_dirty time consuming(ns):8352000
dirty pages:3189471  colo_bitmap_clear_dirty time consuming(ns):8348000
dirty pages:3189681  colo_bitmap_clear_dirty time consuming(ns):8426000

it can be seen from the data that colo_bitmap_clear_dirty is more
efficient.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/ram.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

Comments

Lei Rao April 16, 2021, 7:56 a.m. UTC | #1
Hi, Dave

The performance data has added to the commit messages. 
Do you have any other suggestions?

Thanks
Lei.

-----Original Message-----
From: Rao, Lei <lei.rao@intel.com> 
Sent: Friday, April 9, 2021 11:21 AM
To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com; pbonzini@redhat.com; lukasstraub2@web.de
Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
Subject: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty

From: "Rao, Lei" <lei.rao@intel.com>

When we use continuous dirty memory copy for flushing ram cache on secondary VM, we can also clean up the bitmap of contiguous dirty page memory. This also can reduce the VM stop time during checkpoint.

The performance test for COLO as follow:

Server configuraton:
CPU :Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz MEM :251G(type:DDR4 Speed:2666 MT/s) SSD :Intel 730 and DC S35x0/3610/3700 Series SSDs

dirty pages:3189376  migration_bitmap_clear_dirty time consuming(ns):105194000 dirty pages:3189784  migration_bitmap_clear_dirty time consuming(ns):105297000 dirty pages:3190501  migration_bitmap_clear_dirty time consuming(ns):105410000 dirty pages:3188734  migration_bitmap_clear_dirty time consuming(ns):105138000 dirty pages:3189464  migration_bitmap_clear_dirty time consuming(ns):111736000 dirty pages:3188558  migration_bitmap_clear_dirty time consuming(ns):105079000 dirty pages:3239489  migration_bitmap_clear_dirty time consuming(ns):106761000

dirty pages:3190240  colo_bitmap_clear_dirty time consuming(ns):8369000 dirty pages:3189293  colo_bitmap_clear_dirty time consuming(ns):8388000 dirty pages:3189171  colo_bitmap_clear_dirty time consuming(ns):8641000 dirty pages:3189099  colo_bitmap_clear_dirty time consuming(ns):8280000 dirty pages:3189974  colo_bitmap_clear_dirty time consuming(ns):8352000 dirty pages:3189471  colo_bitmap_clear_dirty time consuming(ns):8348000 dirty pages:3189681  colo_bitmap_clear_dirty time consuming(ns):8426000

it can be seen from the data that colo_bitmap_clear_dirty is more efficient.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/ram.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c index 8661d82..11275cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -857,6 +857,36 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return first;
 }
 
+/**
+ * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use
+ * continuous memory copy, so we can also clean up the bitmap of 
+contiguous
+ * dirty memory.
+ */
+static inline bool colo_bitmap_clear_dirty(RAMState *rs,
+                                           RAMBlock *rb,
+                                           unsigned long start,
+                                           unsigned long num) {
+    bool ret;
+    unsigned long i = 0;
+
+    /*
+     * Since flush ram cache to ram can only happen on Secondary VM.
+     * and the clear bitmap always is NULL on destination side.
+     * Therefore, there is unnecessary to judge whether the
+     * clear_bitmap needs clear.
+     */
+    QEMU_LOCK_GUARD(&rs->bitmap_mutex);
+    for (i = 0; i < num; i++) {
+        ret = test_and_clear_bit(start + i, rb->bmap);
+        if (ret) {
+            rs->migration_dirty_pages--;
+        }
+    }
+
+    return ret;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page) @@ -3774,11 +3804,7 @@ void colo_flush_ram_cache(void)
                 num = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-                unsigned long i = 0;
-
-                for (i = 0; i < num; i++) {
-                    migration_bitmap_clear_dirty(ram_state, block, offset + i);
-                }
+                colo_bitmap_clear_dirty(ram_state, block, offset, num);
                 dst_host = block->host
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
                 src_host = block->colo_cache
--
1.8.3.1
Lei Rao April 27, 2021, 2:32 a.m. UTC | #2
Hi, Dave

I think this set of patches is beneficial to upstream. Please check these performance data. If you have any other ideas, please let me know.

Thanks
Lei.

-----Original Message-----
From: Rao, Lei 
Sent: Friday, April 16, 2021 3:57 PM
To: dgilbert@redhat.com
Cc: qemu-devel@nongnu.org; Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; quintela@redhat.com; pbonzini@redhat.com; lukasstraub2@web.de
Subject: RE: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty

Hi, Dave

The performance data has added to the commit messages. 
Do you have any other suggestions?

Thanks
Lei.

-----Original Message-----
From: Rao, Lei <lei.rao@intel.com>
Sent: Friday, April 9, 2021 11:21 AM
To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com; pbonzini@redhat.com; lukasstraub2@web.de
Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
Subject: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty

From: "Rao, Lei" <lei.rao@intel.com>

When we use continuous dirty memory copy for flushing ram cache on secondary VM, we can also clean up the bitmap of contiguous dirty page memory. This also can reduce the VM stop time during checkpoint.

The performance test for COLO as follow:

Server configuraton:
CPU :Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz MEM :251G(type:DDR4 Speed:2666 MT/s) SSD :Intel 730 and DC S35x0/3610/3700 Series SSDs

dirty pages:3189376  migration_bitmap_clear_dirty time consuming(ns):105194000 dirty pages:3189784  migration_bitmap_clear_dirty time consuming(ns):105297000 dirty pages:3190501  migration_bitmap_clear_dirty time consuming(ns):105410000 dirty pages:3188734  migration_bitmap_clear_dirty time consuming(ns):105138000 dirty pages:3189464  migration_bitmap_clear_dirty time consuming(ns):111736000 dirty pages:3188558  migration_bitmap_clear_dirty time consuming(ns):105079000 dirty pages:3239489  migration_bitmap_clear_dirty time consuming(ns):106761000

dirty pages:3190240  colo_bitmap_clear_dirty time consuming(ns):8369000 dirty pages:3189293  colo_bitmap_clear_dirty time consuming(ns):8388000 dirty pages:3189171  colo_bitmap_clear_dirty time consuming(ns):8641000 dirty pages:3189099  colo_bitmap_clear_dirty time consuming(ns):8280000 dirty pages:3189974  colo_bitmap_clear_dirty time consuming(ns):8352000 dirty pages:3189471  colo_bitmap_clear_dirty time consuming(ns):8348000 dirty pages:3189681  colo_bitmap_clear_dirty time consuming(ns):8426000

it can be seen from the data that colo_bitmap_clear_dirty is more efficient.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/ram.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c index 8661d82..11275cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -857,6 +857,36 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return first;
 }
 
+/**
+ * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use
+ * continuous memory copy, so we can also clean up the bitmap of 
+contiguous
+ * dirty memory.
+ */
+static inline bool colo_bitmap_clear_dirty(RAMState *rs,
+                                           RAMBlock *rb,
+                                           unsigned long start,
+                                           unsigned long num) {
+    bool ret;
+    unsigned long i = 0;
+
+    /*
+     * Since flush ram cache to ram can only happen on Secondary VM.
+     * and the clear bitmap always is NULL on destination side.
+     * Therefore, there is unnecessary to judge whether the
+     * clear_bitmap needs clear.
+     */
+    QEMU_LOCK_GUARD(&rs->bitmap_mutex);
+    for (i = 0; i < num; i++) {
+        ret = test_and_clear_bit(start + i, rb->bmap);
+        if (ret) {
+            rs->migration_dirty_pages--;
+        }
+    }
+
+    return ret;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page) @@ -3774,11 +3804,7 @@ void colo_flush_ram_cache(void)
                 num = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-                unsigned long i = 0;
-
-                for (i = 0; i < num; i++) {
-                    migration_bitmap_clear_dirty(ram_state, block, offset + i);
-                }
+                colo_bitmap_clear_dirty(ram_state, block, offset, num);
                 dst_host = block->host
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
                 src_host = block->colo_cache
--
1.8.3.1
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 8661d82..11275cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -857,6 +857,36 @@  unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return first;
 }
 
+/**
+ * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use
+ * continuous memory copy, so we can also clean up the bitmap of contiguous
+ * dirty memory.
+ */
+static inline bool colo_bitmap_clear_dirty(RAMState *rs,
+                                           RAMBlock *rb,
+                                           unsigned long start,
+                                           unsigned long num)
+{
+    bool ret;
+    unsigned long i = 0;
+
+    /*
+     * Since flush ram cache to ram can only happen on Secondary VM.
+     * and the clear bitmap always is NULL on destination side.
+     * Therefore, there is unnecessary to judge whether the
+     * clear_bitmap needs clear.
+     */
+    QEMU_LOCK_GUARD(&rs->bitmap_mutex);
+    for (i = 0; i < num; i++) {
+        ret = test_and_clear_bit(start + i, rb->bmap);
+        if (ret) {
+            rs->migration_dirty_pages--;
+        }
+    }
+
+    return ret;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page)
@@ -3774,11 +3804,7 @@  void colo_flush_ram_cache(void)
                 num = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-                unsigned long i = 0;
-
-                for (i = 0; i < num; i++) {
-                    migration_bitmap_clear_dirty(ram_state, block, offset + i);
-                }
+                colo_bitmap_clear_dirty(ram_state, block, offset, num);
                 dst_host = block->host
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
                 src_host = block->colo_cache