diff mbox

[COLO-Frame,v15,00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

Message ID 20160301122554.GA3745@work-vm
State New
Headers show

Commit Message

Dr. David Alan Gilbert March 1, 2016, 12:25 p.m. UTC
* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >>>* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> >>>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>>From: root <root@localhost.localdomain>
> >>>>>
> >>>>>This is the 15th version of COLO (Still only support periodic checkpoint).
> >>>>>
> >>>>>Here is only COLO frame part, you can get the whole codes from github:
> >>>>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>>>
> >>>>>There are little changes for this series except the network releated part.
> >>>>
> >>>>I was looking at the time the guest is paused during COLO and
> >>>>was surprised to find one of the larger chunks was the time to reset
> >>>>the guest before loading each checkpoint;  I've traced it part way, the
> >>>>biggest contributors for my test VM seem to be:
> >>>>
> >>>>   3.8ms  pcibus_reset: VGA
> >>>>   1.8ms  pcibus_reset: virtio-net-pci
> >>>>   1.5ms  pcibus_reset: virtio-blk-pci
> >>>>   1.5ms  qemu_devices_reset: piix4_reset
> >>>>   1.1ms  pcibus_reset: piix3-ide
> >>>>   1.1ms  pcibus_reset: virtio-rng-pci
> >>>>
> >>>>I've not looked deeper yet, but some of these are very silly;
> >>>>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>>>going to be interesting.
> >>>>Also, my only block device is the virtio-blk, so while I understand the
> >>>>standard PC machine has the IDE controller, why it takes it over a ms
> >>>>to reset an unused device.
> >>>
> >>>OK, so I've dug a bit deeper, and it appears that it's the changes in
> >>>PCI bars that actually take the time;  every time we do a reset we
> >>>reset all the BARs, this causes it to do a pci_update_mappings and
> >>>end up doing a memory_region_del_subregion.
> >>>Then we load the config space of the PCI device as we do the vmstate_load,
> >>>and this recreates all the mappings again.
> >>>
> >>>I'm not sure what the fix is, but that sounds like it would
> >>>speed up the checkpoints usefully if we can avoid the map/remap when
> >>>they're the same.
> >>>
> >>
> >>Interesting, and thanks for your report.
> >>
> >>We already known qemu_system_reset() is a time-consuming function, we shouldn't
> >>call it here, but if we didn't do that, there will be a bug, which we have
> >>reported before in the previous COLO series, the bellow is the copy of the related
> >>patch comment:

Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
(in particular if the guest changed a bar and the device code tried to access the memory
while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load
times:

Dave

commit 7570b2984143860005ad9fe79f5394c75f294328
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Tue Mar 1 12:08:14 2016 +0000

    COLO: Lock memory map around reset/load
    
    Changing the memory map appears to be expensive; we see this
    partiuclarly when on loading a checkpoint we:
       a) reset the devices
          This causes PCI bars to be reset
       b) Loading the device states
          This causes the PCI bars to be reloaded.
    
    Turning this all into a single memory_region_transaction saves
     ~10ms/checkpoint.
    
    TBD: What happens if the device code accesses the RAM during loading
    the checkpoint?
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Comments

Zhanghailiang March 2, 2016, 1:01 p.m. UTC | #1
On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
>>> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
>>>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>>>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>>>> From: root <root@localhost.localdomain>
>>>>>>>
>>>>>>> This is the 15th version of COLO (Still only support periodic checkpoint).
>>>>>>>
>>>>>>> Here is only COLO frame part, you can get the whole codes from github:
>>>>>>> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
>>>>>>>
>>>>>>> There are little changes for this series except the network releated part.
>>>>>>
>>>>>> I was looking at the time the guest is paused during COLO and
>>>>>> was surprised to find one of the larger chunks was the time to reset
>>>>>> the guest before loading each checkpoint;  I've traced it part way, the
>>>>>> biggest contributors for my test VM seem to be:
>>>>>>
>>>>>>    3.8ms  pcibus_reset: VGA
>>>>>>    1.8ms  pcibus_reset: virtio-net-pci
>>>>>>    1.5ms  pcibus_reset: virtio-blk-pci
>>>>>>    1.5ms  qemu_devices_reset: piix4_reset
>>>>>>    1.1ms  pcibus_reset: piix3-ide
>>>>>>    1.1ms  pcibus_reset: virtio-rng-pci
>>>>>>
>>>>>> I've not looked deeper yet, but some of these are very silly;
>>>>>> I'm running with -nographic so why it's taking 3.8ms to reset VGA is
>>>>>> going to be interesting.
>>>>>> Also, my only block device is the virtio-blk, so while I understand the
>>>>>> standard PC machine has the IDE controller, why it takes it over a ms
>>>>>> to reset an unused device.
>>>>>
>>>>> OK, so I've dug a bit deeper, and it appears that it's the changes in
>>>>> PCI bars that actually take the time;  every time we do a reset we
>>>>> reset all the BARs, this causes it to do a pci_update_mappings and
>>>>> end up doing a memory_region_del_subregion.
>>>>> Then we load the config space of the PCI device as we do the vmstate_load,
>>>>> and this recreates all the mappings again.
>>>>>
>>>>> I'm not sure what the fix is, but that sounds like it would
>>>>> speed up the checkpoints usefully if we can avoid the map/remap when
>>>>> they're the same.
>>>>>
>>>>
>>>> Interesting, and thanks for your report.
>>>>
>>>> We already known qemu_system_reset() is a time-consuming function, we shouldn't
>>>> call it here, but if we didn't do that, there will be a bug, which we have
>>>> reported before in the previous COLO series, the bellow is the copy of the related
>>>> patch comment:
>
> Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
> (in particular if the guest changed a bar and the device code tried to access the memory
> while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load
> times:
>

Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here,
it should be safe to apply to qemu_system_reset() independently (I tested it too,
it will shaves about 5ms off).

Hailiang

> Dave
>
> commit 7570b2984143860005ad9fe79f5394c75f294328
> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Date:   Tue Mar 1 12:08:14 2016 +0000
>
>      COLO: Lock memory map around reset/load
>
>      Changing the memory map appears to be expensive; we see this
>      partiuclarly when on loading a checkpoint we:
>         a) reset the devices
>            This causes PCI bars to be reset
>         b) Loading the device states
>            This causes the PCI bars to be reloaded.
>
>      Turning this all into a single memory_region_transaction saves
>       ~10ms/checkpoint.
>
>      TBD: What happens if the device code accesses the RAM during loading
>      the checkpoint?
>
>      Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>      Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 45c3432..c44fb2a 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -22,6 +22,7 @@
>   #include "net/colo-proxy.h"
>   #include "net/net.h"
>   #include "block/block_int.h"
> +#include "exec/memory.h"
>
>   static bool vmstate_loading;
>
> @@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)
>
>           stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
>           qemu_mutex_lock_iothread();
> +        memory_region_transaction_begin();
>           qemu_system_reset(VMRESET_SILENT);
>           stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
>           timed_average_account(&mis->colo_state.time_reset,
> @@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
>                             stage_time_end - stage_time_start);
>           stage_time_start = stage_time_end;
>           ret = qemu_load_device_state(fb);
> +        memory_region_transaction_commit();
>           if (ret < 0) {
>               error_report("COLO: load device state failed\n");
>               vmstate_loading = false;
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
Dr. David Alan Gilbert March 3, 2016, 8:13 p.m. UTC | #2
* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >>>* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >>>>>* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> >>>>>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>>>>From: root <root@localhost.localdomain>
> >>>>>>>
> >>>>>>>This is the 15th version of COLO (Still only support periodic checkpoint).
> >>>>>>>
> >>>>>>>Here is only COLO frame part, you can get the whole codes from github:
> >>>>>>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>>>>>
> >>>>>>>There are little changes for this series except the network releated part.
> >>>>>>
> >>>>>>I was looking at the time the guest is paused during COLO and
> >>>>>>was surprised to find one of the larger chunks was the time to reset
> >>>>>>the guest before loading each checkpoint;  I've traced it part way, the
> >>>>>>biggest contributors for my test VM seem to be:
> >>>>>>
> >>>>>>   3.8ms  pcibus_reset: VGA
> >>>>>>   1.8ms  pcibus_reset: virtio-net-pci
> >>>>>>   1.5ms  pcibus_reset: virtio-blk-pci
> >>>>>>   1.5ms  qemu_devices_reset: piix4_reset
> >>>>>>   1.1ms  pcibus_reset: piix3-ide
> >>>>>>   1.1ms  pcibus_reset: virtio-rng-pci
> >>>>>>
> >>>>>>I've not looked deeper yet, but some of these are very silly;
> >>>>>>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>>>>>going to be interesting.
> >>>>>>Also, my only block device is the virtio-blk, so while I understand the
> >>>>>>standard PC machine has the IDE controller, why it takes it over a ms
> >>>>>>to reset an unused device.
> >>>>>
> >>>>>OK, so I've dug a bit deeper, and it appears that it's the changes in
> >>>>>PCI bars that actually take the time;  every time we do a reset we
> >>>>>reset all the BARs, this causes it to do a pci_update_mappings and
> >>>>>end up doing a memory_region_del_subregion.
> >>>>>Then we load the config space of the PCI device as we do the vmstate_load,
> >>>>>and this recreates all the mappings again.
> >>>>>
> >>>>>I'm not sure what the fix is, but that sounds like it would
> >>>>>speed up the checkpoints usefully if we can avoid the map/remap when
> >>>>>they're the same.
> >>>>>
> >>>>
> >>>>Interesting, and thanks for your report.
> >>>>
> >>>>We already known qemu_system_reset() is a time-consuming function, we shouldn't
> >>>>call it here, but if we didn't do that, there will be a bug, which we have
> >>>>reported before in the previous COLO series, the bellow is the copy of the related
> >>>>patch comment:
> >
> >Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
> >(in particular if the guest changed a bar and the device code tried to access the memory
> >while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load
> >times:
> >
> 
> Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here,
> it should be safe to apply to qemu_system_reset() independently (I tested it too,
> it will shaves about 5ms off).

Yes, it seems quite nice.
I did find today one VM that wont boot with COLO with that change; it's
an ubuntu VM that has a delay in Grub, and it's when it does the first
checkpoint during Grub still being displayed it gets an error from
the inbound migrate.

The error is VQ 0 size 0x80 Guest index 0x2444 inconsistent with Host index 0x119e: delta 0x12a6
from virtio-blk - so maybe virtio-blk is accessing the memory during loading.

Dave

> Hailiang
> 
> >Dave
> >
> >commit 7570b2984143860005ad9fe79f5394c75f294328
> >Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >Date:   Tue Mar 1 12:08:14 2016 +0000
> >
> >     COLO: Lock memory map around reset/load
> >
> >     Changing the memory map appears to be expensive; we see this
> >     partiuclarly when on loading a checkpoint we:
> >        a) reset the devices
> >           This causes PCI bars to be reset
> >        b) Loading the device states
> >           This causes the PCI bars to be reloaded.
> >
> >     Turning this all into a single memory_region_transaction saves
> >      ~10ms/checkpoint.
> >
> >     TBD: What happens if the device code accesses the RAM during loading
> >     the checkpoint?
> >
> >     Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >     Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> >
> >diff --git a/migration/colo.c b/migration/colo.c
> >index 45c3432..c44fb2a 100644
> >--- a/migration/colo.c
> >+++ b/migration/colo.c
> >@@ -22,6 +22,7 @@
> >  #include "net/colo-proxy.h"
> >  #include "net/net.h"
> >  #include "block/block_int.h"
> >+#include "exec/memory.h"
> >
> >  static bool vmstate_loading;
> >
> >@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)
> >
> >          stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
> >          qemu_mutex_lock_iothread();
> >+        memory_region_transaction_begin();
> >          qemu_system_reset(VMRESET_SILENT);
> >          stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
> >          timed_average_account(&mis->colo_state.time_reset,
> >@@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
> >                            stage_time_end - stage_time_start);
> >          stage_time_start = stage_time_end;
> >          ret = qemu_load_device_state(fb);
> >+        memory_region_transaction_commit();
> >          if (ret < 0) {
> >              error_report("COLO: load device state failed\n");
> >              vmstate_loading = false;
> >
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/migration/colo.c b/migration/colo.c
index 45c3432..c44fb2a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -22,6 +22,7 @@ 
 #include "net/colo-proxy.h"
 #include "net/net.h"
 #include "block/block_int.h"
+#include "exec/memory.h"
 
 static bool vmstate_loading;
 
@@ -934,6 +935,7 @@  void *colo_process_incoming_thread(void *opaque)
 
         stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
         qemu_mutex_lock_iothread();
+        memory_region_transaction_begin();
         qemu_system_reset(VMRESET_SILENT);
         stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
         timed_average_account(&mis->colo_state.time_reset,
@@ -947,6 +949,7 @@  void *colo_process_incoming_thread(void *opaque)
                           stage_time_end - stage_time_start);
         stage_time_start = stage_time_end;
         ret = qemu_load_device_state(fb);
+        memory_region_transaction_commit();
         if (ret < 0) {
             error_report("COLO: load device state failed\n");
             vmstate_loading = false;