Patchwork [RFC,10/16,v6] run dump at the background

login
register
mail settings
Submitter Wen Congyang
Date Feb. 9, 2012, 3:28 a.m.
Message ID <4F333D77.8040104@cn.fujitsu.com>
Download mbox | patch
Permalink /patch/140300/
State New
Headers show

Comments

Wen Congyang - Feb. 9, 2012, 3:28 a.m.
The new monitor command dump may take long time to finish. So we need run it
at the background.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c |  155 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 136 insertions(+), 19 deletions(-)
Jan Kiszka - Feb. 14, 2012, 6:05 p.m.
On 2012-02-09 04:28, Wen Congyang wrote:
> The new monitor command dump may take long time to finish. So we need run it
> at the background.

How does it work? Like live migration, i.e. you retransmit (overwrite)
already written but then dirtied pages? Hmm... no.

What does background mean then? What is the use case? What if the user
decides to resume the vm?

Jan
Jan Kiszka - Feb. 14, 2012, 6:27 p.m.
On 2012-02-14 19:05, Jan Kiszka wrote:
> On 2012-02-09 04:28, Wen Congyang wrote:
>> The new monitor command dump may take long time to finish. So we need run it
>> at the background.
> 
> How does it work? Like live migration, i.e. you retransmit (overwrite)
> already written but then dirtied pages? Hmm... no.
> 
> What does background mean then? What is the use case? What if the user
> decides to resume the vm?

OK, that is addressed in patch 15! I would suggest merging it into this
patch. It makes sense to handle that case gracefully right from the
beginning.

OK, now I have some other question: What is the point of rate-limiting
the dump? The guest is not running, thus not competing for bandwidth.

Jan
Wen Congyang - Feb. 15, 2012, 3:47 a.m.
At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
> On 2012-02-14 19:05, Jan Kiszka wrote:
>> On 2012-02-09 04:28, Wen Congyang wrote:
>>> The new monitor command dump may take long time to finish. So we need run it
>>> at the background.
>>
>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>> already written but then dirtied pages? Hmm... no.
>>
>> What does background mean then? What is the use case? What if the user
>> decides to resume the vm?
> 
> OK, that is addressed in patch 15! I would suggest merging it into this
> patch. It makes sense to handle that case gracefully right from the
> beginning.

OK, I will merge it.

> 
> OK, now I have some other question: What is the point of rate-limiting
> the dump? The guest is not running, thus not competing for bandwidth.

I use bandwidth to try to control the writing speed. If we write the vmcore
to disk in a high speed, it may affect some other appilications which use
the same disk too.

Thanks
Wen Congyang

> 
> Jan
>
Jan Kiszka - Feb. 15, 2012, 9:07 a.m.
On 2012-02-15 04:47, Wen Congyang wrote:
> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>> The new monitor command dump may take long time to finish. So we need run it
>>>> at the background.
>>>
>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>> already written but then dirtied pages? Hmm... no.
>>>
>>> What does background mean then? What is the use case? What if the user
>>> decides to resume the vm?
>>
>> OK, that is addressed in patch 15! I would suggest merging it into this
>> patch. It makes sense to handle that case gracefully right from the
>> beginning.
> 
> OK, I will merge it.
> 
>>
>> OK, now I have some other question: What is the point of rate-limiting
>> the dump? The guest is not running, thus not competing for bandwidth.
> 
> I use bandwidth to try to control the writing speed. If we write the vmcore
> to disk in a high speed, it may affect some other appilications which use
> the same disk too.

Just like the guest of that particular VM can do. I don't think we need
this level of control here, it will be provided (if required) at a
different level, affecting the whole QEMU process. Removing the vmcore
bandwidth control will simplify code and user interface.

Jan
Jan Kiszka - Feb. 15, 2012, 9:21 a.m.
On 2012-02-15 10:22, Wen Congyang wrote:
> At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
>> On 2012-02-15 04:47, Wen Congyang wrote:
>>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>>> at the background.
>>>>>
>>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>>> already written but then dirtied pages? Hmm... no.
>>>>>
>>>>> What does background mean then? What is the use case? What if the user
>>>>> decides to resume the vm?
>>>>
>>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>>> patch. It makes sense to handle that case gracefully right from the
>>>> beginning.
>>>
>>> OK, I will merge it.
>>>
>>>>
>>>> OK, now I have some other question: What is the point of rate-limiting
>>>> the dump? The guest is not running, thus not competing for bandwidth.
>>>
>>> I use bandwidth to try to control the writing speed. If we write the vmcore
>>> to disk in a high speed, it may affect some other appilications which use
>>> the same disk too.
>>
>> Just like the guest of that particular VM can do. I don't think we need
>> this level of control here, it will be provided (if required) at a
>> different level, affecting the whole QEMU process. Removing the vmcore
>> bandwidth control will simplify code and user interface.
> 
> OK. I will implementing it like this:
> 1. write 100ms
> 2. sleep 100ms(allow qemu to do the other things)
> 3. goto 1

Why? Just write as fast as possible.

Jan
Wen Congyang - Feb. 15, 2012, 9:22 a.m.
At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
> On 2012-02-15 04:47, Wen Congyang wrote:
>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>> at the background.
>>>>
>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>> already written but then dirtied pages? Hmm... no.
>>>>
>>>> What does background mean then? What is the use case? What if the user
>>>> decides to resume the vm?
>>>
>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>> patch. It makes sense to handle that case gracefully right from the
>>> beginning.
>>
>> OK, I will merge it.
>>
>>>
>>> OK, now I have some other question: What is the point of rate-limiting
>>> the dump? The guest is not running, thus not competing for bandwidth.
>>
>> I use bandwidth to try to control the writing speed. If we write the vmcore
>> to disk in a high speed, it may affect some other appilications which use
>> the same disk too.
> 
> Just like the guest of that particular VM can do. I don't think we need
> this level of control here, it will be provided (if required) at a
> different level, affecting the whole QEMU process. Removing the vmcore
> bandwidth control will simplify code and user interface.

OK. I will implementing it like this:
1. write 100ms
2. sleep 100ms(allow qemu to do the other things)
3. goto 1

Thanks
Wen Congyang
> 
> Jan
>
Wen Congyang - Feb. 15, 2012, 9:35 a.m.
At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
> On 2012-02-15 10:22, Wen Congyang wrote:
>> At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
>>> On 2012-02-15 04:47, Wen Congyang wrote:
>>>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>>>> at the background.
>>>>>>
>>>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>>>> already written but then dirtied pages? Hmm... no.
>>>>>>
>>>>>> What does background mean then? What is the use case? What if the user
>>>>>> decides to resume the vm?
>>>>>
>>>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>>>> patch. It makes sense to handle that case gracefully right from the
>>>>> beginning.
>>>>
>>>> OK, I will merge it.
>>>>
>>>>>
>>>>> OK, now I have some other question: What is the point of rate-limiting
>>>>> the dump? The guest is not running, thus not competing for bandwidth.
>>>>
>>>> I use bandwidth to try to control the writing speed. If we write the vmcore
>>>> to disk in a high speed, it may affect some other appilications which use
>>>> the same disk too.
>>>
>>> Just like the guest of that particular VM can do. I don't think we need
>>> this level of control here, it will be provided (if required) at a
>>> different level, affecting the whole QEMU process. Removing the vmcore
>>> bandwidth control will simplify code and user interface.
>>
>> OK. I will implementing it like this:
>> 1. write 100ms
>> 2. sleep 100ms(allow qemu to do the other things)
>> 3. goto 1
> 
> Why? Just write as fast as possible.

If the memory is too big, the command will take too long time. 
Eric said:
  It sounds like it is long-running, which
  means it probably needs to be asynchronous, as well as issue an event
  upon completion, so that other monitor commands can be issued in the
  meantime.

Thanks
Wen Congyang
> 
> Jan
>
Jan Kiszka - Feb. 15, 2012, 10:16 a.m.
On 2012-02-15 10:35, Wen Congyang wrote:
> At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
>> On 2012-02-15 10:22, Wen Congyang wrote:
>>> At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
>>>> On 2012-02-15 04:47, Wen Congyang wrote:
>>>>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>>>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>>>>> at the background.
>>>>>>>
>>>>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>>>>> already written but then dirtied pages? Hmm... no.
>>>>>>>
>>>>>>> What does background mean then? What is the use case? What if the user
>>>>>>> decides to resume the vm?
>>>>>>
>>>>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>>>>> patch. It makes sense to handle that case gracefully right from the
>>>>>> beginning.
>>>>>
>>>>> OK, I will merge it.
>>>>>
>>>>>>
>>>>>> OK, now I have some other question: What is the point of rate-limiting
>>>>>> the dump? The guest is not running, thus not competing for bandwidth.
>>>>>
>>>>> I use bandwidth to try to control the writing speed. If we write the vmcore
>>>>> to disk in a high speed, it may affect some other appilications which use
>>>>> the same disk too.
>>>>
>>>> Just like the guest of that particular VM can do. I don't think we need
>>>> this level of control here, it will be provided (if required) at a
>>>> different level, affecting the whole QEMU process. Removing the vmcore
>>>> bandwidth control will simplify code and user interface.
>>>
>>> OK. I will implementing it like this:
>>> 1. write 100ms
>>> 2. sleep 100ms(allow qemu to do the other things)
>>> 3. goto 1
>>
>> Why? Just write as fast as possible.
> 
> If the memory is too big, the command will take too long time. 
> Eric said:
>   It sounds like it is long-running, which
>   means it probably needs to be asynchronous, as well as issue an event
>   upon completion, so that other monitor commands can be issued in the
>   meantime.

Asynchronous doesn't mean throttled. It means not waiting for
potentially long-running I/O in the context of the monitor, but becoming
interactive again.

Jan

Patch

diff --git a/dump.c b/dump.c
index a0e8b86..cb33495 100644
--- a/dump.c
+++ b/dump.c
@@ -77,12 +77,20 @@  typedef struct DumpState {
     char *error;
     int fd;
     target_phys_addr_t memory_offset;
+    int64_t bandwidth;
+    RAMBlock *block;
+    ram_addr_t start;
+    target_phys_addr_t offset;
+    QEMUTimer *timer;
 } DumpState;
 
+#define DEFAULT_THROTTLE  (32 << 20)      /* Default dump speed throttling */
+
 static DumpState *dump_get_current(void)
 {
     static DumpState current_dump = {
         .state = DUMP_STATE_SETUP,
+        .bandwidth = DEFAULT_THROTTLE,
     };
 
     return &current_dump;
@@ -93,11 +101,19 @@  static int dump_cleanup(DumpState *s)
     int ret = 0;
 
     free_memory_mapping_list(&s->list);
+
     if (s->fd != -1) {
         close(s->fd);
         s->fd = -1;
     }
 
+    if (s->timer) {
+        qemu_del_timer(s->timer);
+        qemu_free_timer(s->timer);
+    }
+
+    qemu_resume_monitor();
+
     return ret;
 }
 
@@ -332,25 +348,40 @@  static int write_data(DumpState *s, void *buf, int length,
 }
 
 /* write the memroy to vmcore. 1 page per I/O. */
-static int write_memory(DumpState *s, RAMBlock *block,
-                        target_phys_addr_t *offset)
+static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start,
+                        target_phys_addr_t *offset, int64_t *size,
+                        int64_t deadline)
 {
     int i, ret;
+    int64_t writen_size = 0;
+    int64_t time;
 
-    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
-        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
+    for (i = 0; i < *size / TARGET_PAGE_SIZE; i++) {
+        ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
                          TARGET_PAGE_SIZE, offset);
         if (ret < 0) {
             return -1;
         }
+        writen_size += TARGET_PAGE_SIZE;
+        time = qemu_get_clock_ms(rt_clock);
+        if (time >= deadline) {
+            /* time out */
+            *size = writen_size;
+            return 1;
+        }
     }
 
-    if ((block->length % TARGET_PAGE_SIZE) != 0) {
-        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
-                         block->length % TARGET_PAGE_SIZE, offset);
+    if ((*size % TARGET_PAGE_SIZE) != 0) {
+        ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
+                         *size % TARGET_PAGE_SIZE, offset);
         if (ret < 0) {
             return -1;
         }
+        time = qemu_get_clock_ms(rt_clock);
+        if (time >= deadline) {
+            /* time out */
+            return 1;
+        }
     }
 
     return 0;
@@ -379,6 +410,7 @@  static DumpState *dump_init(int fd, Error **errp)
     CPUState *env;
     DumpState *s = dump_get_current();
     int ret;
+    const char *msg = NULL;
 
     vm_stop(RUN_STATE_PAUSED);
     s->state = DUMP_STATE_SETUP;
@@ -387,6 +419,9 @@  static DumpState *dump_init(int fd, Error **errp)
         s->error = NULL;
     }
     s->fd = fd;
+    s->block = QLIST_FIRST(&ram_list.blocks);
+    s->start = 0;
+    s->timer = NULL;
 
     /*
      * get dump info: endian, class and architecture.
@@ -429,6 +464,9 @@  static DumpState *dump_init(int fd, Error **errp)
         s->phdr_num += s->list.num;
     }
 
+    msg = "terminal does not allow synchronous dumping, continuing detached\n";
+    qemu_suspend_monitor("%s", msg);
+
     return s;
 }
 
@@ -486,6 +524,7 @@  static int dump_begin(DumpState *s)
     }
 
     s->memory_offset = offset;
+    s->offset = offset;
     return 0;
 }
 
@@ -513,38 +552,116 @@  static int dump_completed(DumpState *s)
     return 0;
 }
 
-/* write all memory to vmcore */
-static int dump_iterate(DumpState *s)
+/*
+ * write memory to vmcore.
+ *
+ * this function has three return values:
+ *  -1 : there was one error
+ *   0 : We haven't finished, caller have to go again
+ *   1 : We have finished, we can go to complete phase
+ */
+static int dump_iterate(DumpState *s, int64_t deadline)
 {
-    RAMBlock *block;
-    target_phys_addr_t offset = s->memory_offset;
+    RAMBlock *block = s->block;
+    target_phys_addr_t offset = s->offset;
+    int64_t size, remain, writen_size;
+    int64_t total = s->bandwidth / 10;
     int ret;
 
-    /* write all memory to vmcore */
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        ret = write_memory(s, block, &offset);
+    if ((block->length - s->start) >= total) {
+        size = total;
+    } else {
+        size = block->length - s->start;
+    }
+
+    ret = write_memory(s, block, s->start, &offset, &size, deadline);
+    if (ret < 0) {
+        return -1;
+    }
+
+    if (size == total || ret == 1) {
+        if ((size + s->start) == block->length) {
+            s->block = QLIST_NEXT(block, next);
+            s->start = 0;
+        } else {
+            s->start += size;
+        }
+        goto end;
+    }
+
+    while (size < total) {
+        block = QLIST_NEXT(block, next);
+        if (!block) {
+            /* we have finished */
+            return 1;
+        }
+
+        remain = total - size;
+        if (remain >= block->length) {
+            writen_size = block->length;
+        } else {
+            writen_size = remain;
+        }
+        ret = write_memory(s, block, 0, &offset, &writen_size, deadline);
         if (ret < 0) {
             return -1;
+        } else if (ret == 1) {
+            break;
         }
+        size += writen_size;
+    }
+    if (writen_size == block->length) {
+        s->block = QLIST_NEXT(block, next);
+        s->start = 0;
+    } else {
+        s->block = block;
+        s->start = writen_size;
+    }
+
+end:
+    s->offset = offset;
+    if (!s->block) {
+        /* we have finished */
+        return 1;
     }
 
-    return dump_completed(s);
+    return 0;
 }
 
-static int create_vmcore(DumpState *s)
+static void dump_rate_tick(void *opaque)
 {
+    DumpState *s = opaque;
+    int64_t begin, end;
     int ret;
 
-    ret = dump_begin(s);
+    begin = qemu_get_clock_ms(rt_clock);
+    ret = dump_iterate(s, begin + 100);
     if (ret < 0) {
-        return -1;
+        return;
+    } else if (ret == 1) {
+        dump_completed(s);
+        return;
     }
+    end = qemu_get_clock_ms(rt_clock);
+    if (end - begin >= 100) {
+        qemu_mod_timer(s->timer, end + 10);
+    } else {
+        qemu_mod_timer(s->timer, begin + 100);
+    }
+}
 
-    ret = dump_iterate(s);
+static int create_vmcore(DumpState *s)
+{
+    int ret;
+
+    ret = dump_begin(s);
     if (ret < 0) {
         return -1;
     }
 
+    s->timer = qemu_new_timer_ms(rt_clock, dump_rate_tick, s);
+    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+
     return 0;
 }