Message ID | 20171205065307.21853-1-peterx@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Dec 05, 2017 at 02:52:39PM +0800, Peter Xu wrote: > Tree is pushed here for better reference and testing (online tree > includes monitor OOB series): > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all Sorry, now this series is depending on the OOB series. Hello, Patchew, hope you see this soon enough: Based-on: <20171205055200.16305-1-peterx@redhat.com>
* Peter Xu (peterx@redhat.com) wrote: > Tree is pushed here for better reference and testing (online tree > includes monitor OOB series): > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > This version removed quite a few patches related to migrate-incoming, > instead I introduced a new command "migrate-recover" to trigger the > recovery channel on destination side to simplify the code. > > To test this two series altogether, please checkout above tree and > build. Note: to test on small and single host, one need to disable > full bandwidth postcopy migration otherwise it'll complete very fast. > Basically a simple patch like this would help: > > diff --git a/migration/migration.c b/migration/migration.c > index 4de3b551fe..c0206023d7 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) > * will notice we're in POSTCOPY_ACTIVE and not actually > * wrap their state up here > */ > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > if (migrate_postcopy_ram()) { > /* Ping just for debugging, helps line traces up */ > qemu_savevm_send_ping(ms->to_dst_file, 2); > > This patch is included already in above github tree. Please feel free > to drop this patch when want to test on big machines and between real > hosts. > > Detailed Test Procedures (QMP only) > =================================== > > 1. start source QEMU. > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > -smp 4 -m 1G -qmp stdio \ > -name peter-vm,debug-threads=on \ > -netdev user,id=net0 \ > -device e1000,netdev=net0 \ > -global migration.x-max-bandwidth=4096 \ > -global migration.x-postcopy-ram=on \ > /images/fedora-25.qcow2 I suspect -snapshot isn't doing the right thing to the storage when combined with the migration - I'm assuming the destination isn't using the same temporary file. (Also any reason for specifying split irqchip?) > 2. start destination QEMU. > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > -smp 4 -m 1G -qmp stdio \ > -name peter-vm,debug-threads=on \ > -netdev user,id=net0 \ > -device e1000,netdev=net0 \ > -global migration.x-max-bandwidth=4096 \ > -global migration.x-postcopy-ram=on \ > -incoming tcp:0.0.0.0:5555 \ > /images/fedora-25.qcow2 > > 3. On source, do QMP handshake as normal: > > {"execute": "qmp_capabilities"} > {"return": {}} > > 4. On destination, do QMP handshake to enable OOB: > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > {"return": {}} > > 5. On source, trigger initial migrate command, switch to postcopy: > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > {"return": {}} > {"execute": "query-migrate"} > {"return": {"expected-downtime": 300, "status": "active", ...}} > {"execute": "migrate-start-postcopy"} > {"return": {}} > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"} > {"execute": "query-migrate"} > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > 6. On source, manually trigger a "fake network down" using > "migrate-cancel" command: > > {"execute": "migrate_cancel"} > {"return": {}} > > During postcopy, it'll not really cancel the migration, but pause > it. On both sides, we should see this on stderr: > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > It means now both sides are in postcopy-pause state. > > 7. (Optional) On destination side, let's try to hang the main thread > using the new x-oob-test command, providing a "lock=true" param: > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > "arguments": { "lock": true } } > > After sending this command, we should not see any "return", because > main thread is blocked already. But we can still use the monitor > since the monitor now has dedicated IOThread. > > 8. On destination side, provide a new incoming port using the new > command "migrate-recover" (note that if step 7 is carried out, we > _must_ use OOB form, otherwise the command will hang. With OOB, > this command will return immediately): > > {"execute": "migrate-recover", "id": "recover-cmd", > "arguments": { "uri": "tcp:localhost:5556" }, > "control": { "run-oob": true } } > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > "event": "MIGRATION", "data": {"status": "setup"}} > {"return": {}, "id": "recover-cmd"} > > We can see that the command will success even if main thread is > locked up. > > 9. (Optional) This step is only needed if step 7 is carried out. On > destination, let's unlock the main thread before resuming the > migration, this time with "lock=false" to unlock the main thread > (since system running needs the main thread). Note that we _must_ > use OOB command here too: > > {"execute": "x-oob-test", "id": "unlock-dispatcher", > "arguments": { "lock": false }, "control": { "run-oob": true } } > {"return": {}, "id": "unlock-dispatcher"} > {"return": {}, "id": "lock-dispatcher-cmd"} > > Here the first "return" is the reply to the unlock command, the > second "return" is the reply to the lock command. After this > command, main thread is released. > > 10. On source, resume the postcopy migration: > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }} > {"return": {}} > {"execute": "query-migrate"} > {"return": {"status": "completed", ...}} The use of x-oob-test to lock things is a bit different to reality and that means the ordering is different. When the destination is blocked by a page request, that page won't become unstuck until sometime after (10) happens and delivers the page to the target. You could try an 'info cpu' on the destination at (7) - although it's not guaranteed to lock, depending whether the page needed has arrived. Dave > Here's the change log: > > v5: > - add some more r-bs > - fix error path in ram_load_postcopy to always check on "ret" [Dave] > - move init/destroy of three new sems into migration object > init/finalize functions > - dropped patch "migration: delay the postcopy-active state switch", > meanwhile touch up patch 6 to check against > POSTCOPY_INCOMING_RUNNING state when trying to switch to > postcopy-pause state. [Dave] > - drop two patches that introduce qmp/hmp of migrate-pause, instead > re-use migrate-cancel to do manual trigger of postcopy recovery. > - add a new patch to let migrate_cancel to pause migration if it's > already in postcopy phase. > - add a new command "migrate-recover" to re-assign the incoming port, > instead of reusing migrate-incoming. > - since now I used migrate-recover command instead of migrate-incoming > itself, I dropped quite a few patches that are not really relevant > now, so the series got smaller: > migration: return incoming task tag for sockets > migration: return incoming task tag for exec > migration: return incoming task tag for fd > migration: store listen task tag > migration: allow migrate_incoming for paused VM > > v4: > - fix two compile errors that patchew reported > - for QMP: do s/2.11/2.12/g > - fix migrate-incoming logic to be more strict > > v3: > - add r-bs correspondingly > - in ram_load_postcopy() capture error if postcopy_place_page() failed > [Dave] > - remove "break" if there is a "goto" before that [Dave] > - ram_dirty_bitmap_reload(): use PRIx64 where needed, add some more > print sizes [Dave] > - remove RAMState.ramblock_to_sync, instead use local counter [Dave] > - init tag in tcp_start_incoming_migration() [Dave] > - more traces when transmiting the recv bitmap [Dave] > - postcopy_pause_incoming(): do shutdown before taking rp lock [Dave] > - add one more patch to postpone the state switch of postcopy-active [Dave] > - refactor the migrate_incoming handling according to the email > discussion [Dave] > - add manual trigger to pause postcopy (two new patches added to > introduce "migrate-pause" command for QMP/HMP). [Dave] > > v2: > - rebased to alexey's received bitmap v9 > - add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21 > - patch 1: use target page size to calc bitmap [Dave] > - patch 3: move trace_*() after EINTR check [Dave] > - patch 4: dropped since I can use bitmap_complement() [Dave] > - patch 7: check file error right after data is read in both > qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(), > meanwhile also check in check_section_footer() [Dave] > - patch 8/9: fix error_report/commit message in both patches [Dave] > - patch 10: dropped (new parameter "x-postcopy-fast") > - patch 11: split the "postcopy-paused" patch into two, one to > introduce the new state, the other to implement the logic. Also, > print something when paused [Dave] > - patch 17: removed do_resume label, introduced migration_prepare() > [Dave] > - patch 18: removed do_pause label using a new loop [Dave] > - patch 20: removed incorrect comment [Dave] > - patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add > trace in loadvm_handle_recv_bitmap() [Dave] > - patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit > machines. More info in the commit message update. > - patch 23: add one check on migration state [Dave] > - patch 24: use macro instead of magic 1 [Dave] > - patch 26: use more trace_*() instead of one, and use one sem to > replace mutex+cond. [Dave] > - move sem init/destroy into migration_instance_init() and > migration_instance_finalize (new function after rebase). > - patch 29: squashed this patch most into: > "migration: implement "postcopy-pause" src logic" [Dave] > - split the two fix patches out of the series > - fixed two places where I misused "wake/woke/woken". [Dave] > - add new patch "bitmap: provide to_le/from_le helpers" to solve the > bitmap endianess issue [Dave] > - appended migrate_incoming series to this series, since that one is > depending on the paused state. Using explicit g_source_remove() for > listening ports [Dan] > > FUTURE TODO LIST > - support migrate_cancel during PAUSED/RECOVER state > - when anything wrong happens during PAUSED/RECOVER, switching back to > PAUSED state on both sides > > As we all know that postcopy migration has a potential risk to lost > the VM if the network is broken during the migration. This series > tries to solve the problem by allowing the migration to pause at the > failure point, and do recovery after the link is reconnected. > > There was existing work on this issue from Md Haris Iqbal: > > https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html > > This series is a totally re-work of the issue, based on Alexey > Perevalov's recved bitmap v8 series: > > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html > > Two new status are added to support the migration (used on both > sides): > > MIGRATION_STATUS_POSTCOPY_PAUSED > MIGRATION_STATUS_POSTCOPY_RECOVER > > The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the > network failure is detected. It is a phase that we'll be in for a long > time as long as the failure is detected, and we'll be there until a > recovery is triggered. In this state, all the threads (on source: > send thread, return-path thread; destination: ram-load thread, > page-fault thread) will be halted. > > The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered > a recovery, both source/destination VM will jump into this stage, do > whatever it needs to prepare the recovery (e.g., currently the most > important thing is to synchronize the dirty bitmap, please see commit > messages for more information). After the preparation is ready, the > source will do the final handshake with destination, then both sides > will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again. > > New commands/messages are defined as well to satisfy the need: > > MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for > delivering received bitmaps > > MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final > handshake of postcopy recovery. > > Here's some more details on how the whole failure/recovery routine is > happened: > > - start migration > - ... (switch from precopy to postcopy) > - both sides are in "postcopy-active" state > - ... (failure happened, e.g., network unplugged) > - both sides switch to "postcopy-paused" state > - all the migration threads are stopped on both sides > - ... (both VMs hanged) > - ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on > source side, "-r" means "recover") > - both sides switch to "postcopy-recover" state > - on source: send-thread, return-path-thread will be waked up > - on dest: ram-load-thread waked up, fault-thread still paused > - source calls new savevmhandler hook resume_prepare() (currently, > only ram is providing the hook): > - ram_resume_prepare(): for each ramblock, fetch recved bitmap by: > - src sends MIG_CMD_RECV_BITMAP to dst > - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data > - src uses the recved bitmap to rebuild dirty bitmap > - source do final handshake with destination > - src sends MIG_CMD_RESUME to dst, telling "src is ready" > - when dst receives the command, fault thread will be waked up, > meanwhile, dst switch back to "postcopy-active" > - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready" > - when src receives the ack, state switch to "postcopy-active" > - postcopy migration continued > > Testing: > > As I said, it's still an extremely simple test. I used socat to create > a socket bridge: > > socat tcp-listen:6666 tcp-connect:localhost:5555 & > > Then do the migration via the bridge. I emulated the network failure > by killing the socat process (bridge down), then tries to recover the > migration using the other channel (default dst channel). It looks > like: > > port:6666 +------------------+ > +----------> | socat bridge [1] |-------+ > | +------------------+ | > | (Original channel) | > | | port: 5555 > +---------+ (Recovery channel) +--->+---------+ > | src VM |------------------------------------>| dst VM | > +---------+ +---------+ > > Known issues/notes: > > - currently destination listening port still cannot change. E.g., the > recovery should be using the same port on destination for > simplicity. (on source, we can specify new URL) > > - the patch: "migration: let dst listen on port always" is still > hacky, it just kept the incoming accept open forever for now... > > - some migration numbers might still be inaccurate, like total > migration time, etc. (But I don't really think that matters much > now) > > - the patches are very lightly tested. > > - Dave reported one problem that may hang destination main loop thread > (one vcpu thread holds the BQL) and the rest. I haven't encountered > it yet, but it does not mean this series can survive with it. > > - other potential issues that I may have forgotten or unnoticed... > > Anyway, the work is still in preliminary stage. Any suggestions and > comments are greatly welcomed. Thanks. > > Peter Xu (28): > migration: better error handling with QEMUFile > migration: reuse mis->userfault_quit_fd > migration: provide postcopy_fault_thread_notify() > migration: new postcopy-pause state > migration: implement "postcopy-pause" src logic > migration: allow dst vm pause on postcopy > migration: allow src return path to pause > migration: allow send_rq to fail > migration: allow fault thread to pause > qmp: hmp: add migrate "resume" option > migration: pass MigrationState to migrate_init() > migration: rebuild channel on source > migration: new state "postcopy-recover" > migration: wakeup dst ram-load-thread for recover > migration: new cmd MIG_CMD_RECV_BITMAP > migration: new message MIG_RP_MSG_RECV_BITMAP > migration: new cmd MIG_CMD_POSTCOPY_RESUME > migration: new message MIG_RP_MSG_RESUME_ACK > migration: introduce SaveVMHandlers.resume_prepare > migration: synchronize dirty bitmap for resume > migration: setup ramstate for resume > migration: final handshake for the resume > migration: free SocketAddress where allocated > migration: init dst in migration_object_init too > io: let watcher of the channel run in same ctx > migration: allow migrate_cancel to pause postcopy > qmp/migration: new command migrate-recover > hmp/migration: add migrate_recover command > > hmp-commands.hx | 28 ++- > hmp.c | 14 +- > hmp.h | 1 + > include/migration/register.h | 2 + > io/channel.c | 2 +- > migration/migration.c | 549 ++++++++++++++++++++++++++++++++++++++----- > migration/migration.h | 24 +- > migration/postcopy-ram.c | 110 +++++++-- > migration/postcopy-ram.h | 2 + > migration/ram.c | 247 ++++++++++++++++++- > migration/ram.h | 3 + > migration/savevm.c | 233 +++++++++++++++++- > migration/savevm.h | 3 + > migration/socket.c | 4 +- > migration/trace-events | 21 ++ > qapi/migration.json | 35 ++- > 16 files changed, 1172 insertions(+), 106 deletions(-) > > -- > 2.14.3 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Dec 05, 2017 at 06:43:42PM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > Tree is pushed here for better reference and testing (online tree > > includes monitor OOB series): > > > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > > > This version removed quite a few patches related to migrate-incoming, > > instead I introduced a new command "migrate-recover" to trigger the > > recovery channel on destination side to simplify the code. > > > > To test this two series altogether, please checkout above tree and > > build. Note: to test on small and single host, one need to disable > > full bandwidth postcopy migration otherwise it'll complete very fast. > > Basically a simple patch like this would help: > > > > diff --git a/migration/migration.c b/migration/migration.c > > index 4de3b551fe..c0206023d7 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) > > * will notice we're in POSTCOPY_ACTIVE and not actually > > * wrap their state up here > > */ > > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > if (migrate_postcopy_ram()) { > > /* Ping just for debugging, helps line traces up */ > > qemu_savevm_send_ping(ms->to_dst_file, 2); > > > > This patch is included already in above github tree. Please feel free > > to drop this patch when want to test on big machines and between real > > hosts. > > > > Detailed Test Procedures (QMP only) > > =================================== > > > > 1. start source QEMU. > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > -smp 4 -m 1G -qmp stdio \ > > -name peter-vm,debug-threads=on \ > > -netdev user,id=net0 \ > > -device e1000,netdev=net0 \ > > -global migration.x-max-bandwidth=4096 \ > > -global migration.x-postcopy-ram=on \ > > /images/fedora-25.qcow2 > > I suspect -snapshot isn't doing the right thing to the storage when > combined with the migration - I'm assuming the destination isn't using > the same temporary file. > (Also any reason for specifying split irqchip?) Ah yes. Sorry we should not use "-snapshot" here. Please remove it. I think my smoke test just didn't try to fetch anything on that temp storage so nothing went wrong. And, no reason for split irqchip - I just fetched this command line somewhere where I was testing IOMMUs. :-) Please feel free to remove it too if you want. (so basically I was just pasting my smoke test command lines, not really command line required to run the tests) > > > 2. start destination QEMU. > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > -smp 4 -m 1G -qmp stdio \ > > -name peter-vm,debug-threads=on \ > > -netdev user,id=net0 \ > > -device e1000,netdev=net0 \ > > -global migration.x-max-bandwidth=4096 \ > > -global migration.x-postcopy-ram=on \ > > -incoming tcp:0.0.0.0:5555 \ > > /images/fedora-25.qcow2 > > > > 3. On source, do QMP handshake as normal: > > > > {"execute": "qmp_capabilities"} > > {"return": {}} > > > > 4. On destination, do QMP handshake to enable OOB: > > > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > {"return": {}} > > > > 5. On source, trigger initial migrate command, switch to postcopy: > > > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > > {"return": {}} > > {"execute": "query-migrate"} > > {"return": {"expected-downtime": 300, "status": "active", ...}} > > {"execute": "migrate-start-postcopy"} > > {"return": {}} > > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"} > > {"execute": "query-migrate"} > > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > > > 6. On source, manually trigger a "fake network down" using > > "migrate-cancel" command: > > > > {"execute": "migrate_cancel"} > > {"return": {}} > > > > During postcopy, it'll not really cancel the migration, but pause > > it. On both sides, we should see this on stderr: > > > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > > > It means now both sides are in postcopy-pause state. > > > > 7. (Optional) On destination side, let's try to hang the main thread > > using the new x-oob-test command, providing a "lock=true" param: > > > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > > "arguments": { "lock": true } } > > > > After sending this command, we should not see any "return", because > > main thread is blocked already. But we can still use the monitor > > since the monitor now has dedicated IOThread. > > > > 8. On destination side, provide a new incoming port using the new > > command "migrate-recover" (note that if step 7 is carried out, we > > _must_ use OOB form, otherwise the command will hang. With OOB, > > this command will return immediately): > > > > {"execute": "migrate-recover", "id": "recover-cmd", > > "arguments": { "uri": "tcp:localhost:5556" }, > > "control": { "run-oob": true } } > > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > > "event": "MIGRATION", "data": {"status": "setup"}} > > {"return": {}, "id": "recover-cmd"} > > > > We can see that the command will success even if main thread is > > locked up. > > > > 9. (Optional) This step is only needed if step 7 is carried out. On > > destination, let's unlock the main thread before resuming the > > migration, this time with "lock=false" to unlock the main thread > > (since system running needs the main thread). Note that we _must_ > > use OOB command here too: > > > > {"execute": "x-oob-test", "id": "unlock-dispatcher", > > "arguments": { "lock": false }, "control": { "run-oob": true } } > > {"return": {}, "id": "unlock-dispatcher"} > > {"return": {}, "id": "lock-dispatcher-cmd"} > > > > Here the first "return" is the reply to the unlock command, the > > second "return" is the reply to the lock command. After this > > command, main thread is released. > > > > 10. On source, resume the postcopy migration: > > > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }} > > {"return": {}} > > {"execute": "query-migrate"} > > {"return": {"status": "completed", ...}} > > The use of x-oob-test to lock things is a bit different to reality > and that means the ordering is different. > When the destination is blocked by a page request, that page won't > become unstuck until sometime after (10) happens and delivers the page > to the target. > > You could try an 'info cpu' on the destination at (7) - although it's > not guaranteed to lock, depending whether the page needed has arrived. Yes info cpus (or say "query-cpus", in QMP) would work too. The "return" will be delayed until sending the resuming command, but it's the same thing - here I just want to make sure main thread is totally hang death, so I can know whether the new accept() port and the whole workflow will work even with that. Btw, IMHO "info cpus" should guarantee a block, if not, we just do something in guest to make sure guest hangs, then at least one vcpu must be waiting for a page. Thanks!
* Peter Xu (peterx@redhat.com) wrote: > Tree is pushed here for better reference and testing (online tree > includes monitor OOB series): > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > This version removed quite a few patches related to migrate-incoming, > instead I introduced a new command "migrate-recover" to trigger the > recovery channel on destination side to simplify the code. I've got this setup on a couple of my test hosts, and I'm using iptables to try breaking the connection. See below for where I got stuck. > To test this two series altogether, please checkout above tree and > build. Note: to test on small and single host, one need to disable > full bandwidth postcopy migration otherwise it'll complete very fast. > Basically a simple patch like this would help: > > diff --git a/migration/migration.c b/migration/migration.c > index 4de3b551fe..c0206023d7 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) > * will notice we're in POSTCOPY_ACTIVE and not actually > * wrap their state up here > */ > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > if (migrate_postcopy_ram()) { > /* Ping just for debugging, helps line traces up */ > qemu_savevm_send_ping(ms->to_dst_file, 2); > > This patch is included already in above github tree. Please feel free > to drop this patch when want to test on big machines and between real > hosts. > > Detailed Test Procedures (QMP only) > =================================== > > 1. start source QEMU. > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > -smp 4 -m 1G -qmp stdio \ > -name peter-vm,debug-threads=on \ > -netdev user,id=net0 \ > -device e1000,netdev=net0 \ > -global migration.x-max-bandwidth=4096 \ > -global migration.x-postcopy-ram=on \ > /images/fedora-25.qcow2 > > 2. start destination QEMU. > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > -smp 4 -m 1G -qmp stdio \ > -name peter-vm,debug-threads=on \ > -netdev user,id=net0 \ > -device e1000,netdev=net0 \ > -global migration.x-max-bandwidth=4096 \ > -global migration.x-postcopy-ram=on \ > -incoming tcp:0.0.0.0:5555 \ > /images/fedora-25.qcow2 I'm using: ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh and I've got both the HMP on the stdio, and the QMP via a telnet > > 3. On source, do QMP handshake as normal: > > {"execute": "qmp_capabilities"} > {"return": {}} > > 4. On destination, do QMP handshake to enable OOB: > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > {"return": {}} > > 5. On source, trigger initial migrate command, switch to postcopy: > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > {"return": {}} > {"execute": "query-migrate"} > {"return": {"expected-downtime": 300, "status": "active", ...}} > {"execute": "migrate-start-postcopy"} > {"return": {}} > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"} > {"execute": "query-migrate"} > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > 6. On source, manually trigger a "fake network down" using > "migrate-cancel" command: > > {"execute": "migrate_cancel"} > {"return": {}} Before I do that, I'm breaking the network connection by running on the source: iptables -A INPUT -p tcp --source-port 8888 -j DROP iptables -A INPUT -p tcp --destination-port 8888 -j DROP > During postcopy, it'll not really cancel the migration, but pause > it. On both sides, we should see this on stderr: > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > It means now both sides are in postcopy-pause state. Now, here we start to have a problem; I do the migrate-cancel on the source, that works and goes into pause; but remember the network is broken, so the destination hasn't received the news. > 7. (Optional) On destination side, let's try to hang the main thread > using the new x-oob-test command, providing a "lock=true" param: > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > "arguments": { "lock": true } } > > After sending this command, we should not see any "return", because > main thread is blocked already. But we can still use the monitor > since the monitor now has dedicated IOThread. > > 8. On destination side, provide a new incoming port using the new > command "migrate-recover" (note that if step 7 is carried out, we > _must_ use OOB form, otherwise the command will hang. With OOB, > this command will return immediately): > > {"execute": "migrate-recover", "id": "recover-cmd", > "arguments": { "uri": "tcp:localhost:5556" }, > "control": { "run-oob": true } } > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > "event": "MIGRATION", "data": {"status": "setup"}} > {"return": {}, "id": "recover-cmd"} > > We can see that the command will success even if main thread is > locked up. Because the destination didn't get the news of the pause, I get: {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}} and I can't explicitly cause a cancel on the destination: {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}} So I think we need a way out of this on the destination. Dave > 9. (Optional) This step is only needed if step 7 is carried out. On > destination, let's unlock the main thread before resuming the > migration, this time with "lock=false" to unlock the main thread > (since system running needs the main thread). Note that we _must_ > use OOB command here too: > > {"execute": "x-oob-test", "id": "unlock-dispatcher", > "arguments": { "lock": false }, "control": { "run-oob": true } } > {"return": {}, "id": "unlock-dispatcher"} > {"return": {}, "id": "lock-dispatcher-cmd"} > > Here the first "return" is the reply to the unlock command, the > second "return" is the reply to the lock command. After this > command, main thread is released. > > 10. On source, resume the postcopy migration: > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }} > {"return": {}} > {"execute": "query-migrate"} > {"return": {"status": "completed", ...}} > > Here's the change log: > > v5: > - add some more r-bs > - fix error path in ram_load_postcopy to always check on "ret" [Dave] > - move init/destroy of three new sems into migration object > init/finalize functions > - dropped patch "migration: delay the postcopy-active state switch", > meanwhile touch up patch 6 to check against > POSTCOPY_INCOMING_RUNNING state when trying to switch to > postcopy-pause state. [Dave] > - drop two patches that introduce qmp/hmp of migrate-pause, instead > re-use migrate-cancel to do manual trigger of postcopy recovery. > - add a new patch to let migrate_cancel to pause migration if it's > already in postcopy phase. > - add a new command "migrate-recover" to re-assign the incoming port, > instead of reusing migrate-incoming. > - since now I used migrate-recover command instead of migrate-incoming > itself, I dropped quite a few patches that are not really relevant > now, so the series got smaller: > migration: return incoming task tag for sockets > migration: return incoming task tag for exec > migration: return incoming task tag for fd > migration: store listen task tag > migration: allow migrate_incoming for paused VM > > v4: > - fix two compile errors that patchew reported > - for QMP: do s/2.11/2.12/g > - fix migrate-incoming logic to be more strict > > v3: > - add r-bs correspondingly > - in ram_load_postcopy() capture error if postcopy_place_page() failed > [Dave] > - remove "break" if there is a "goto" before that [Dave] > - ram_dirty_bitmap_reload(): use PRIx64 where needed, add some more > print sizes [Dave] > - remove RAMState.ramblock_to_sync, instead use local counter [Dave] > - init tag in tcp_start_incoming_migration() [Dave] > - more traces when transmiting the recv bitmap [Dave] > - postcopy_pause_incoming(): do shutdown before taking rp lock [Dave] > - add one more patch to postpone the state switch of postcopy-active [Dave] > - refactor the migrate_incoming handling according to the email > discussion [Dave] > - add manual trigger to pause postcopy (two new patches added to > introduce "migrate-pause" command for QMP/HMP). [Dave] > > v2: > - rebased to alexey's received bitmap v9 > - add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21 > - patch 1: use target page size to calc bitmap [Dave] > - patch 3: move trace_*() after EINTR check [Dave] > - patch 4: dropped since I can use bitmap_complement() [Dave] > - patch 7: check file error right after data is read in both > qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(), > meanwhile also check in check_section_footer() [Dave] > - patch 8/9: fix error_report/commit message in both patches [Dave] > - patch 10: dropped (new parameter "x-postcopy-fast") > - patch 11: split the "postcopy-paused" patch into two, one to > introduce the new state, the other to implement the logic. Also, > print something when paused [Dave] > - patch 17: removed do_resume label, introduced migration_prepare() > [Dave] > - patch 18: removed do_pause label using a new loop [Dave] > - patch 20: removed incorrect comment [Dave] > - patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add > trace in loadvm_handle_recv_bitmap() [Dave] > - patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit > machines. More info in the commit message update. > - patch 23: add one check on migration state [Dave] > - patch 24: use macro instead of magic 1 [Dave] > - patch 26: use more trace_*() instead of one, and use one sem to > replace mutex+cond. [Dave] > - move sem init/destroy into migration_instance_init() and > migration_instance_finalize (new function after rebase). > - patch 29: squashed this patch most into: > "migration: implement "postcopy-pause" src logic" [Dave] > - split the two fix patches out of the series > - fixed two places where I misused "wake/woke/woken". [Dave] > - add new patch "bitmap: provide to_le/from_le helpers" to solve the > bitmap endianess issue [Dave] > - appended migrate_incoming series to this series, since that one is > depending on the paused state. Using explicit g_source_remove() for > listening ports [Dan] > > FUTURE TODO LIST > - support migrate_cancel during PAUSED/RECOVER state > - when anything wrong happens during PAUSED/RECOVER, switching back to > PAUSED state on both sides > > As we all know that postcopy migration has a potential risk to lost > the VM if the network is broken during the migration. This series > tries to solve the problem by allowing the migration to pause at the > failure point, and do recovery after the link is reconnected. > > There was existing work on this issue from Md Haris Iqbal: > > https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html > > This series is a totally re-work of the issue, based on Alexey > Perevalov's recved bitmap v8 series: > > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html > > Two new status are added to support the migration (used on both > sides): > > MIGRATION_STATUS_POSTCOPY_PAUSED > MIGRATION_STATUS_POSTCOPY_RECOVER > > The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the > network failure is detected. It is a phase that we'll be in for a long > time as long as the failure is detected, and we'll be there until a > recovery is triggered. In this state, all the threads (on source: > send thread, return-path thread; destination: ram-load thread, > page-fault thread) will be halted. > > The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered > a recovery, both source/destination VM will jump into this stage, do > whatever it needs to prepare the recovery (e.g., currently the most > important thing is to synchronize the dirty bitmap, please see commit > messages for more information). After the preparation is ready, the > source will do the final handshake with destination, then both sides > will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again. > > New commands/messages are defined as well to satisfy the need: > > MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for > delivering received bitmaps > > MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final > handshake of postcopy recovery. > > Here's some more details on how the whole failure/recovery routine is > happened: > > - start migration > - ... (switch from precopy to postcopy) > - both sides are in "postcopy-active" state > - ... (failure happened, e.g., network unplugged) > - both sides switch to "postcopy-paused" state > - all the migration threads are stopped on both sides > - ... (both VMs hanged) > - ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on > source side, "-r" means "recover") > - both sides switch to "postcopy-recover" state > - on source: send-thread, return-path-thread will be waked up > - on dest: ram-load-thread waked up, fault-thread still paused > - source calls new savevmhandler hook resume_prepare() (currently, > only ram is providing the hook): > - ram_resume_prepare(): for each ramblock, fetch recved bitmap by: > - src sends MIG_CMD_RECV_BITMAP to dst > - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data > - src uses the recved bitmap to rebuild dirty bitmap > - source do final handshake with destination > - src sends MIG_CMD_RESUME to dst, telling "src is ready" > - when dst receives the command, fault thread will be waked up, > meanwhile, dst switch back to "postcopy-active" > - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready" > - when src receives the ack, state switch to "postcopy-active" > - postcopy migration continued > > Testing: > > As I said, it's still an extremely simple test. I used socat to create > a socket bridge: > > socat tcp-listen:6666 tcp-connect:localhost:5555 & > > Then do the migration via the bridge. I emulated the network failure > by killing the socat process (bridge down), then tries to recover the > migration using the other channel (default dst channel). It looks > like: > > port:6666 +------------------+ > +----------> | socat bridge [1] |-------+ > | +------------------+ | > | (Original channel) | > | | port: 5555 > +---------+ (Recovery channel) +--->+---------+ > | src VM |------------------------------------>| dst VM | > +---------+ +---------+ > > Known issues/notes: > > - currently destination listening port still cannot change. E.g., the > recovery should be using the same port on destination for > simplicity. (on source, we can specify new URL) > > - the patch: "migration: let dst listen on port always" is still > hacky, it just kept the incoming accept open forever for now... > > - some migration numbers might still be inaccurate, like total > migration time, etc. (But I don't really think that matters much > now) > > - the patches are very lightly tested. > > - Dave reported one problem that may hang destination main loop thread > (one vcpu thread holds the BQL) and the rest. I haven't encountered > it yet, but it does not mean this series can survive with it. > > - other potential issues that I may have forgotten or unnoticed... > > Anyway, the work is still in preliminary stage. Any suggestions and > comments are greatly welcomed. Thanks. > > Peter Xu (28): > migration: better error handling with QEMUFile > migration: reuse mis->userfault_quit_fd > migration: provide postcopy_fault_thread_notify() > migration: new postcopy-pause state > migration: implement "postcopy-pause" src logic > migration: allow dst vm pause on postcopy > migration: allow src return path to pause > migration: allow send_rq to fail > migration: allow fault thread to pause > qmp: hmp: add migrate "resume" option > migration: pass MigrationState to migrate_init() > migration: rebuild channel on source > migration: new state "postcopy-recover" > migration: wakeup dst ram-load-thread for recover > migration: new cmd MIG_CMD_RECV_BITMAP > migration: new message MIG_RP_MSG_RECV_BITMAP > migration: new cmd MIG_CMD_POSTCOPY_RESUME > migration: new message MIG_RP_MSG_RESUME_ACK > migration: introduce SaveVMHandlers.resume_prepare > migration: synchronize dirty bitmap for resume > migration: setup ramstate for resume > migration: final handshake for the resume > migration: free SocketAddress where allocated > migration: init dst in migration_object_init too > io: let watcher of the channel run in same ctx > migration: allow migrate_cancel to pause postcopy > qmp/migration: new command migrate-recover > hmp/migration: add migrate_recover command > > hmp-commands.hx | 28 ++- > hmp.c | 14 +- > hmp.h | 1 + > include/migration/register.h | 2 + > io/channel.c | 2 +- > migration/migration.c | 549 ++++++++++++++++++++++++++++++++++++++----- > migration/migration.h | 24 +- > migration/postcopy-ram.c | 110 +++++++-- > migration/postcopy-ram.h | 2 + > migration/ram.c | 247 ++++++++++++++++++- > migration/ram.h | 3 + > migration/savevm.c | 233 +++++++++++++++++- > migration/savevm.h | 3 + > migration/socket.c | 4 +- > migration/trace-events | 21 ++ > qapi/migration.json | 35 ++- > 16 files changed, 1172 insertions(+), 106 deletions(-) > > -- > 2.14.3 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > Tree is pushed here for better reference and testing (online tree > > includes monitor OOB series): > > > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > > > This version removed quite a few patches related to migrate-incoming, > > instead I introduced a new command "migrate-recover" to trigger the > > recovery channel on destination side to simplify the code. > > I've got this setup on a couple of my test hosts, and I'm using > iptables to try breaking the connection. > > See below for where I got stuck. > > > To test this two series altogether, please checkout above tree and > > build. Note: to test on small and single host, one need to disable > > full bandwidth postcopy migration otherwise it'll complete very fast. > > Basically a simple patch like this would help: > > > > diff --git a/migration/migration.c b/migration/migration.c > > index 4de3b551fe..c0206023d7 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) > > * will notice we're in POSTCOPY_ACTIVE and not actually > > * wrap their state up here > > */ > > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > if (migrate_postcopy_ram()) { > > /* Ping just for debugging, helps line traces up */ > > qemu_savevm_send_ping(ms->to_dst_file, 2); > > > > This patch is included already in above github tree. Please feel free > > to drop this patch when want to test on big machines and between real > > hosts. > > > > Detailed Test Procedures (QMP only) > > =================================== > > > > 1. start source QEMU. > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > -smp 4 -m 1G -qmp stdio \ > > -name peter-vm,debug-threads=on \ > > -netdev user,id=net0 \ > > -device e1000,netdev=net0 \ > > -global migration.x-max-bandwidth=4096 \ > > -global migration.x-postcopy-ram=on \ > > /images/fedora-25.qcow2 > > > > 2. start destination QEMU. > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > -smp 4 -m 1G -qmp stdio \ > > -name peter-vm,debug-threads=on \ > > -netdev user,id=net0 \ > > -device e1000,netdev=net0 \ > > -global migration.x-max-bandwidth=4096 \ > > -global migration.x-postcopy-ram=on \ > > -incoming tcp:0.0.0.0:5555 \ > > /images/fedora-25.qcow2 > > I'm using: > ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh > and I've got both the HMP on the stdio, and the QMP via a telnet > > > > > 3. On source, do QMP handshake as normal: > > > > {"execute": "qmp_capabilities"} > > {"return": {}} > > > > 4. On destination, do QMP handshake to enable OOB: > > > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > {"return": {}} > > > > 5. On source, trigger initial migrate command, switch to postcopy: > > > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > > {"return": {}} > > {"execute": "query-migrate"} > > {"return": {"expected-downtime": 300, "status": "active", ...}} > > {"execute": "migrate-start-postcopy"} > > {"return": {}} > > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"} > > {"execute": "query-migrate"} > > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > > > 6. On source, manually trigger a "fake network down" using > > "migrate-cancel" command: > > > > {"execute": "migrate_cancel"} > > {"return": {}} > > Before I do that, I'm breaking the network connection by running on the > source: > iptables -A INPUT -p tcp --source-port 8888 -j DROP > iptables -A INPUT -p tcp --destination-port 8888 -j DROP This is tricky... I think tcp keepalive may help, but for sure I think we do need a way to cancel the migration on both side. Please see below comment. > > > During postcopy, it'll not really cancel the migration, but pause > > it. On both sides, we should see this on stderr: > > > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > > > It means now both sides are in postcopy-pause state. > > Now, here we start to have a problem; I do the migrate-cancel on the > source, that works and goes into pause; but remember the network is > broken, so the destination hasn't received the news. > > > 7. (Optional) On destination side, let's try to hang the main thread > > using the new x-oob-test command, providing a "lock=true" param: > > > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > > "arguments": { "lock": true } } > > > > After sending this command, we should not see any "return", because > > main thread is blocked already. But we can still use the monitor > > since the monitor now has dedicated IOThread. > > > > 8. On destination side, provide a new incoming port using the new > > command "migrate-recover" (note that if step 7 is carried out, we > > _must_ use OOB form, otherwise the command will hang. With OOB, > > this command will return immediately): > > > > {"execute": "migrate-recover", "id": "recover-cmd", > > "arguments": { "uri": "tcp:localhost:5556" }, > > "control": { "run-oob": true } } > > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > > "event": "MIGRATION", "data": {"status": "setup"}} > > {"return": {}, "id": "recover-cmd"} > > > > We can see that the command will success even if main thread is > > locked up. > > Because the destination didn't get the news of the pause, I get: > {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}} This is normal since we didn't fail on destination, while... > > and I can't explicitly cause a cancel on the destination: > {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}} ... this is not normal. I have two questions: 1. Have you provided "control": {"run-oob": true} field when sending command "migrate_cancel"? Just to mention that we shouldn't do it in oob way for migrate_cancel. Or it can be a monitor-oob bug. 2. Do we need to support "migrate_cancel" on destination? For (2), I think we need it, but for now it only works on source for sure. So I think maybe I should add that support. > > So I think we need a way out of this on the destination. So that's my 2nd question. How about we do this: migrate_cancel will cancel incoming migration if: a. there is one incoming migration in progress, and b. postcopy is enabled Thanks,
* Peter Xu (peterx@redhat.com) wrote: > On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote: > > * Peter Xu (peterx@redhat.com) wrote: > > > Tree is pushed here for better reference and testing (online tree > > > includes monitor OOB series): > > > > > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > > > > > This version removed quite a few patches related to migrate-incoming, > > > instead I introduced a new command "migrate-recover" to trigger the > > > recovery channel on destination side to simplify the code. > > > > I've got this setup on a couple of my test hosts, and I'm using > > iptables to try breaking the connection. > > > > See below for where I got stuck. > > > > > To test this two series altogether, please checkout above tree and > > > build. Note: to test on small and single host, one need to disable > > > full bandwidth postcopy migration otherwise it'll complete very fast. > > > Basically a simple patch like this would help: > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > index 4de3b551fe..c0206023d7 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) > > > * will notice we're in POSTCOPY_ACTIVE and not actually > > > * wrap their state up here > > > */ > > > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > > if (migrate_postcopy_ram()) { > > > /* Ping just for debugging, helps line traces up */ > > > qemu_savevm_send_ping(ms->to_dst_file, 2); > > > > > > This patch is included already in above github tree. Please feel free > > > to drop this patch when want to test on big machines and between real > > > hosts. > > > > > > Detailed Test Procedures (QMP only) > > > =================================== > > > > > > 1. start source QEMU. > > > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > > -smp 4 -m 1G -qmp stdio \ > > > -name peter-vm,debug-threads=on \ > > > -netdev user,id=net0 \ > > > -device e1000,netdev=net0 \ > > > -global migration.x-max-bandwidth=4096 \ > > > -global migration.x-postcopy-ram=on \ > > > /images/fedora-25.qcow2 > > > > > > 2. start destination QEMU. > > > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > > -smp 4 -m 1G -qmp stdio \ > > > -name peter-vm,debug-threads=on \ > > > -netdev user,id=net0 \ > > > -device e1000,netdev=net0 \ > > > -global migration.x-max-bandwidth=4096 \ > > > -global migration.x-postcopy-ram=on \ > > > -incoming tcp:0.0.0.0:5555 \ > > > /images/fedora-25.qcow2 > > > > I'm using: > > ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh > > and I've got both the HMP on the stdio, and the QMP via a telnet > > > > > > > > 3. On source, do QMP handshake as normal: > > > > > > {"execute": "qmp_capabilities"} > > > {"return": {}} > > > > > > 4. On destination, do QMP handshake to enable OOB: > > > > > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > > {"return": {}} > > > > > > 5. On source, trigger initial migrate command, switch to postcopy: > > > > > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > > > {"return": {}} > > > {"execute": "query-migrate"} > > > {"return": {"expected-downtime": 300, "status": "active", ...}} > > > {"execute": "migrate-start-postcopy"} > > > {"return": {}} > > > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"} > > > {"execute": "query-migrate"} > > > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > > > > > 6. On source, manually trigger a "fake network down" using > > > "migrate-cancel" command: > > > > > > {"execute": "migrate_cancel"} > > > {"return": {}} > > > > Before I do that, I'm breaking the network connection by running on the > > source: > > iptables -A INPUT -p tcp --source-port 8888 -j DROP > > iptables -A INPUT -p tcp --destination-port 8888 -j DROP > > This is tricky... I think tcp keepalive may help, but for sure I > think we do need a way to cancel the migration on both side. Please > see below comment. > > > > > > During postcopy, it'll not really cancel the migration, but pause > > > it. On both sides, we should see this on stderr: > > > > > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > > > > > It means now both sides are in postcopy-pause state. > > > > Now, here we start to have a problem; I do the migrate-cancel on the > > source, that works and goes into pause; but remember the network is > > broken, so the destination hasn't received the news. > > > > > 7. (Optional) On destination side, let's try to hang the main thread > > > using the new x-oob-test command, providing a "lock=true" param: > > > > > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > > > "arguments": { "lock": true } } > > > > > > After sending this command, we should not see any "return", because > > > main thread is blocked already. But we can still use the monitor > > > since the monitor now has dedicated IOThread. > > > > > > 8. On destination side, provide a new incoming port using the new > > > command "migrate-recover" (note that if step 7 is carried out, we > > > _must_ use OOB form, otherwise the command will hang. With OOB, > > > this command will return immediately): > > > > > > {"execute": "migrate-recover", "id": "recover-cmd", > > > "arguments": { "uri": "tcp:localhost:5556" }, > > > "control": { "run-oob": true } } > > > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > > > "event": "MIGRATION", "data": {"status": "setup"}} > > > {"return": {}, "id": "recover-cmd"} > > > > > > We can see that the command will success even if main thread is > > > locked up. > > > > Because the destination didn't get the news of the pause, I get: > > {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}} > > This is normal since we didn't fail on destination, while... > > > > > and I can't explicitly cause a cancel on the destination: > > {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}} > > ... this is not normal. I have two questions: > > 1. Have you provided > > "control": {"run-oob": true} > > field when sending command "migrate_cancel"? Just to mention that > we shouldn't do it in oob way for migrate_cancel. Or it can be a > monitor-oob bug. Yes, I probably did and probably shouldn't have. > 2. Do we need to support "migrate_cancel" on destination? > > For (2), I think we need it, but for now it only works on source for > sure. So I think maybe I should add that support. > > > > > So I think we need a way out of this on the destination. > > So that's my 2nd question. How about we do this: migrate_cancel will > cancel incoming migration if: > > a. there is one incoming migration in progress, and > b. postcopy is enabled Yes, I think that should work; but it should only 'cancel' in the same way that it causes it to go to 'paused' mode. One other problem I've hit is that it seems easy to 'upset' the OOB monitor; for example if I do: {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } and repeat it: {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } it gives me an error, that's OK but then if I discounnect and reconnect the monitor a few times it's really upset; I've had it: a) Disconnect immediately when the telnet connects b) I've also had it not respond to any commands c) I've also seen a hang at system_powerdown where: the main thread is in: #0 0x00007f37aa4d3ef7 in pthread_join (threadid=139876803868416, thread_return=thread_return@entry=0x7ffc174367b0) at pthread_join.c:92 #1 0x000055644e5c1f5f in qemu_thread_join (thread=<optimized out>) at /home/dgilbert/peter/qemu/util/qemu-thread-posix.c:547 #2 0x000055644e30c688 in iothread_stop (iothread=<optimized out>) at /home/dgilbert/peter/qemu/iothread.c:91 #3 0x000055644e21f122 in monitor_cleanup () at /home/dgilbert/peter/qemu/monitor.c:4517 #4 0x000055644e1e1925 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dgilbert/peter/qemu/vl.c:4924 and the monitor thread is in: #0 0x00007fdd93de871f in accept4 (fd=fd@entry=10, addr=..., addr@entry=..., addr_len=addr_len@entry=0x7fdd80004430, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:37 #1 0x000055645f42d9ec in qemu_accept (s=10, addr=addr@entry=0x7fdd800043b0, addrlen=addrlen@entry=0x7fdd80004430) at /home/dgilbert/peter/qemu/util/osdep.c:431 #2 0x000055645f3ea7a1 in qio_channel_socket_accept (ioc=0x556460610f10, errp=errp@entry=0x0) at /home/dgilbert/peter/qemu/io/channel-socket.c:340 #3 0x000055645f3db6aa in tcp_chr_accept (channel=0x556460610f10, cond=<optimized out>, opaque=<optimized out>) at /home/dgilbert/peter/qemu/chardev/char-socket.c:746 #4 0x00007fdd94b2479a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #5 0x00007fdd94b24ae8 in g_main_context_iterate.isra.24 () at /lib64/libglib-2.0.so.0 #6 0x00007fdd94b24dba in g_main_loop_run () at /lib64/libglib-2.0.so.0 #7 0x000055645f17f516 in iothread_run (opaque=0x55646063e8c0) at /home/dgilbert/peter/qemu/iothread.c:69 #8 0x00007fdd9c0fcdc5 in start_thread (arg=0x7fdd8cf79700) at pthread_create.c:308 Dave > Thanks, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Jan 12, 2018 at 12:27:42PM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote: > > > * Peter Xu (peterx@redhat.com) wrote: > > > > Tree is pushed here for better reference and testing (online tree > > > > includes monitor OOB series): > > > > > > > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > > > > > > > This version removed quite a few patches related to migrate-incoming, > > > > instead I introduced a new command "migrate-recover" to trigger the > > > > recovery channel on destination side to simplify the code. > > > > > > I've got this setup on a couple of my test hosts, and I'm using > > > iptables to try breaking the connection. > > > > > > See below for where I got stuck. > > > > > > > To test this two series altogether, please checkout above tree and > > > > build. Note: to test on small and single host, one need to disable > > > > full bandwidth postcopy migration otherwise it'll complete very fast. > > > > Basically a simple patch like this would help: > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > index 4de3b551fe..c0206023d7 100644 > > > > --- a/migration/migration.c > > > > +++ b/migration/migration.c > > > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) > > > > * will notice we're in POSTCOPY_ACTIVE and not actually > > > > * wrap their state up here > > > > */ > > > > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > > > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > > > if (migrate_postcopy_ram()) { > > > > /* Ping just for debugging, helps line traces up */ > > > > qemu_savevm_send_ping(ms->to_dst_file, 2); > > > > > > > > This patch is included already in above github tree. Please feel free > > > > to drop this patch when want to test on big machines and between real > > > > hosts. > > > > > > > > Detailed Test Procedures (QMP only) > > > > =================================== > > > > > > > > 1. start source QEMU. > > > > > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > > > -smp 4 -m 1G -qmp stdio \ > > > > -name peter-vm,debug-threads=on \ > > > > -netdev user,id=net0 \ > > > > -device e1000,netdev=net0 \ > > > > -global migration.x-max-bandwidth=4096 \ > > > > -global migration.x-postcopy-ram=on \ > > > > /images/fedora-25.qcow2 > > > > > > > > 2. start destination QEMU. > > > > > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > > > -smp 4 -m 1G -qmp stdio \ > > > > -name peter-vm,debug-threads=on \ > > > > -netdev user,id=net0 \ > > > > -device e1000,netdev=net0 \ > > > > -global migration.x-max-bandwidth=4096 \ > > > > -global migration.x-postcopy-ram=on \ > > > > -incoming tcp:0.0.0.0:5555 \ > > > > /images/fedora-25.qcow2 > > > > > > I'm using: > > > ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh > > > and I've got both the HMP on the stdio, and the QMP via a telnet > > > > > > > > > > > 3. On source, do QMP handshake as normal: > > > > > > > > {"execute": "qmp_capabilities"} > > > > {"return": {}} > > > > > > > > 4. On destination, do QMP handshake to enable OOB: > > > > > > > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > > > {"return": {}} > > > > > > > > 5. On source, trigger initial migrate command, switch to postcopy: > > > > > > > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > > > > {"return": {}} > > > > {"execute": "query-migrate"} > > > > {"return": {"expected-downtime": 300, "status": "active", ...}} > > > > {"execute": "migrate-start-postcopy"} > > > > {"return": {}} > > > > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"} > > > > {"execute": "query-migrate"} > > > > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > > > > > > > 6. On source, manually trigger a "fake network down" using > > > > "migrate-cancel" command: > > > > > > > > {"execute": "migrate_cancel"} > > > > {"return": {}} > > > > > > Before I do that, I'm breaking the network connection by running on the > > > source: > > > iptables -A INPUT -p tcp --source-port 8888 -j DROP > > > iptables -A INPUT -p tcp --destination-port 8888 -j DROP > > > > This is tricky... I think tcp keepalive may help, but for sure I > > think we do need a way to cancel the migration on both side. Please > > see below comment. > > > > > > > > > During postcopy, it'll not really cancel the migration, but pause > > > > it. On both sides, we should see this on stderr: > > > > > > > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > > > > > > > It means now both sides are in postcopy-pause state. > > > > > > Now, here we start to have a problem; I do the migrate-cancel on the > > > source, that works and goes into pause; but remember the network is > > > broken, so the destination hasn't received the news. > > > > > > > 7. (Optional) On destination side, let's try to hang the main thread > > > > using the new x-oob-test command, providing a "lock=true" param: > > > > > > > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > > > > "arguments": { "lock": true } } > > > > > > > > After sending this command, we should not see any "return", because > > > > main thread is blocked already. But we can still use the monitor > > > > since the monitor now has dedicated IOThread. > > > > > > > > 8. On destination side, provide a new incoming port using the new > > > > command "migrate-recover" (note that if step 7 is carried out, we > > > > _must_ use OOB form, otherwise the command will hang. With OOB, > > > > this command will return immediately): > > > > > > > > {"execute": "migrate-recover", "id": "recover-cmd", > > > > "arguments": { "uri": "tcp:localhost:5556" }, > > > > "control": { "run-oob": true } } > > > > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > > > > "event": "MIGRATION", "data": {"status": "setup"}} > > > > {"return": {}, "id": "recover-cmd"} > > > > > > > > We can see that the command will success even if main thread is > > > > locked up. > > > > > > Because the destination didn't get the news of the pause, I get: > > > {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}} > > > > This is normal since we didn't fail on destination, while... > > > > > > > > and I can't explicitly cause a cancel on the destination: > > > {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}} > > > > ... this is not normal. I have two questions: > > > > 1. Have you provided > > > > "control": {"run-oob": true} > > > > field when sending command "migrate_cancel"? Just to mention that > > we shouldn't do it in oob way for migrate_cancel. Or it can be a > > monitor-oob bug. > > Yes, I probably did and probably shouldn't have. > > > 2. Do we need to support "migrate_cancel" on destination? > > > > For (2), I think we need it, but for now it only works on source for > > sure. So I think maybe I should add that support. > > > > > > > > So I think we need a way out of this on the destination. > > > > So that's my 2nd question. How about we do this: migrate_cancel will > > cancel incoming migration if: > > > > a. there is one incoming migration in progress, and > > b. postcopy is enabled > > Yes, I think that should work; but it should only 'cancel' in the same > way that it causes it to go to 'paused' mode. Yes. > > One other problem I've hit is that it seems easy to 'upset' the OOB > monitor; for example if I do: > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > and repeat it: > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > it gives me an error, Is the error like this? {"id": 1, "error": {"class": "CommandNotFound", "desc": "Capabilities negotiation is already complete, command ignored"}} I think an error is by design? Say, we only allow the QMP negociation to happen once for a session IMHO. > that's OK but then if I discounnect and reconnect the monitor a few > times it's really upset; I've had it: > a) Disconnect immediately when the telnet connects > b) I've also had it not respond to any commands > c) I've also seen a hang at system_powerdown where: > > the main thread is in: > #0 0x00007f37aa4d3ef7 in pthread_join (threadid=139876803868416, thread_return=thread_return@entry=0x7ffc174367b0) at pthread_join.c:92 > #1 0x000055644e5c1f5f in qemu_thread_join (thread=<optimized out>) at /home/dgilbert/peter/qemu/util/qemu-thread-posix.c:547 > #2 0x000055644e30c688 in iothread_stop (iothread=<optimized out>) at /home/dgilbert/peter/qemu/iothread.c:91 > #3 0x000055644e21f122 in monitor_cleanup () at /home/dgilbert/peter/qemu/monitor.c:4517 > #4 0x000055644e1e1925 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dgilbert/peter/qemu/vl.c:4924 > > and the monitor thread is in: > #0 0x00007fdd93de871f in accept4 (fd=fd@entry=10, addr=..., addr@entry=..., addr_len=addr_len@entry=0x7fdd80004430, flags=flags@entry=524288) > at ../sysdeps/unix/sysv/linux/accept4.c:37 > #1 0x000055645f42d9ec in qemu_accept (s=10, addr=addr@entry=0x7fdd800043b0, addrlen=addrlen@entry=0x7fdd80004430) > at /home/dgilbert/peter/qemu/util/osdep.c:431 > #2 0x000055645f3ea7a1 in qio_channel_socket_accept (ioc=0x556460610f10, errp=errp@entry=0x0) at /home/dgilbert/peter/qemu/io/channel-socket.c:340 > #3 0x000055645f3db6aa in tcp_chr_accept (channel=0x556460610f10, cond=<optimized out>, opaque=<optimized out>) > at /home/dgilbert/peter/qemu/chardev/char-socket.c:746 > #4 0x00007fdd94b2479a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 > #5 0x00007fdd94b24ae8 in g_main_context_iterate.isra.24 () at /lib64/libglib-2.0.so.0 > #6 0x00007fdd94b24dba in g_main_loop_run () at /lib64/libglib-2.0.so.0 > #7 0x000055645f17f516 in iothread_run (opaque=0x55646063e8c0) at /home/dgilbert/peter/qemu/iothread.c:69 > #8 0x00007fdd9c0fcdc5 in start_thread (arg=0x7fdd8cf79700) at pthread_create.c:308 Hmm, this seems to be another more general problem on how we do accept(). It seems that we are doing accept() synchronouslyly now even in a GMainLoop, assuming that we will always return fast enough since we have been notified of a read event of the listening port. But that can be untrue if the client disconnects very quickly, I guess. I think doing async accept() might help? Maybe Dan would know better. Thanks,
* Peter Xu (peterx@redhat.com) wrote: > On Fri, Jan 12, 2018 at 12:27:42PM +0000, Dr. David Alan Gilbert wrote: > > * Peter Xu (peterx@redhat.com) wrote: > > > On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote: > > > > * Peter Xu (peterx@redhat.com) wrote: > > > > > Tree is pushed here for better reference and testing (online tree > > > > > includes monitor OOB series): > > > > > > > > > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > > > > > > > > > This version removed quite a few patches related to migrate-incoming, > > > > > instead I introduced a new command "migrate-recover" to trigger the > > > > > recovery channel on destination side to simplify the code. > > > > > > > > I've got this setup on a couple of my test hosts, and I'm using > > > > iptables to try breaking the connection. > > > > > > > > See below for where I got stuck. > > > > > > > > > To test this two series altogether, please checkout above tree and > > > > > build. Note: to test on small and single host, one need to disable > > > > > full bandwidth postcopy migration otherwise it'll complete very fast. > > > > > Basically a simple patch like this would help: > > > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > > index 4de3b551fe..c0206023d7 100644 > > > > > --- a/migration/migration.c > > > > > +++ b/migration/migration.c > > > > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) > > > > > * will notice we're in POSTCOPY_ACTIVE and not actually > > > > > * wrap their state up here > > > > > */ > > > > > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > > > > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > > > > > if (migrate_postcopy_ram()) { > > > > > /* Ping just for debugging, helps line traces up */ > > > > > qemu_savevm_send_ping(ms->to_dst_file, 2); > > > > > > > > > > This patch is included already in above github tree. Please feel free > > > > > to drop this patch when want to test on big machines and between real > > > > > hosts. > > > > > > > > > > Detailed Test Procedures (QMP only) > > > > > =================================== > > > > > > > > > > 1. start source QEMU. > > > > > > > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > > > > -smp 4 -m 1G -qmp stdio \ > > > > > -name peter-vm,debug-threads=on \ > > > > > -netdev user,id=net0 \ > > > > > -device e1000,netdev=net0 \ > > > > > -global migration.x-max-bandwidth=4096 \ > > > > > -global migration.x-postcopy-ram=on \ > > > > > /images/fedora-25.qcow2 > > > > > > > > > > 2. start destination QEMU. > > > > > > > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > > > > > -smp 4 -m 1G -qmp stdio \ > > > > > -name peter-vm,debug-threads=on \ > > > > > -netdev user,id=net0 \ > > > > > -device e1000,netdev=net0 \ > > > > > -global migration.x-max-bandwidth=4096 \ > > > > > -global migration.x-postcopy-ram=on \ > > > > > -incoming tcp:0.0.0.0:5555 \ > > > > > /images/fedora-25.qcow2 > > > > > > > > I'm using: > > > > ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh > > > > and I've got both the HMP on the stdio, and the QMP via a telnet > > > > > > > > > > > > > > 3. On source, do QMP handshake as normal: > > > > > > > > > > {"execute": "qmp_capabilities"} > > > > > {"return": {}} > > > > > > > > > > 4. On destination, do QMP handshake to enable OOB: > > > > > > > > > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > > > > {"return": {}} > > > > > > > > > > 5. On source, trigger initial migrate command, switch to postcopy: > > > > > > > > > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > > > > > {"return": {}} > > > > > {"execute": "query-migrate"} > > > > > {"return": {"expected-downtime": 300, "status": "active", ...}} > > > > > {"execute": "migrate-start-postcopy"} > > > > > {"return": {}} > > > > > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"} > > > > > {"execute": "query-migrate"} > > > > > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > > > > > > > > > 6. On source, manually trigger a "fake network down" using > > > > > "migrate-cancel" command: > > > > > > > > > > {"execute": "migrate_cancel"} > > > > > {"return": {}} > > > > > > > > Before I do that, I'm breaking the network connection by running on the > > > > source: > > > > iptables -A INPUT -p tcp --source-port 8888 -j DROP > > > > iptables -A INPUT -p tcp --destination-port 8888 -j DROP > > > > > > This is tricky... I think tcp keepalive may help, but for sure I > > > think we do need a way to cancel the migration on both side. Please > > > see below comment. > > > > > > > > > > > > During postcopy, it'll not really cancel the migration, but pause > > > > > it. On both sides, we should see this on stderr: > > > > > > > > > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > > > > > > > > > It means now both sides are in postcopy-pause state. > > > > > > > > Now, here we start to have a problem; I do the migrate-cancel on the > > > > source, that works and goes into pause; but remember the network is > > > > broken, so the destination hasn't received the news. > > > > > > > > > 7. (Optional) On destination side, let's try to hang the main thread > > > > > using the new x-oob-test command, providing a "lock=true" param: > > > > > > > > > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > > > > > "arguments": { "lock": true } } > > > > > > > > > > After sending this command, we should not see any "return", because > > > > > main thread is blocked already. But we can still use the monitor > > > > > since the monitor now has dedicated IOThread. > > > > > > > > > > 8. On destination side, provide a new incoming port using the new > > > > > command "migrate-recover" (note that if step 7 is carried out, we > > > > > _must_ use OOB form, otherwise the command will hang. With OOB, > > > > > this command will return immediately): > > > > > > > > > > {"execute": "migrate-recover", "id": "recover-cmd", > > > > > "arguments": { "uri": "tcp:localhost:5556" }, > > > > > "control": { "run-oob": true } } > > > > > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > > > > > "event": "MIGRATION", "data": {"status": "setup"}} > > > > > {"return": {}, "id": "recover-cmd"} > > > > > > > > > > We can see that the command will success even if main thread is > > > > > locked up. > > > > > > > > Because the destination didn't get the news of the pause, I get: > > > > {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}} > > > > > > This is normal since we didn't fail on destination, while... > > > > > > > > > > > and I can't explicitly cause a cancel on the destination: > > > > {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}} > > > > > > ... this is not normal. I have two questions: > > > > > > 1. Have you provided > > > > > > "control": {"run-oob": true} > > > > > > field when sending command "migrate_cancel"? Just to mention that > > > we shouldn't do it in oob way for migrate_cancel. Or it can be a > > > monitor-oob bug. > > > > Yes, I probably did and probably shouldn't have. > > > > > 2. Do we need to support "migrate_cancel" on destination? > > > > > > For (2), I think we need it, but for now it only works on source for > > > sure. So I think maybe I should add that support. > > > > > > > > > > > So I think we need a way out of this on the destination. > > > > > > So that's my 2nd question. How about we do this: migrate_cancel will > > > cancel incoming migration if: > > > > > > a. there is one incoming migration in progress, and > > > b. postcopy is enabled > > > > Yes, I think that should work; but it should only 'cancel' in the same > > way that it causes it to go to 'paused' mode. > > Yes. > > > > > One other problem I've hit is that it seems easy to 'upset' the OOB > > monitor; for example if I do: > > > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > and repeat it: > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > > > it gives me an error, > > Is the error like this? > > {"id": 1, "error": {"class": "CommandNotFound", > "desc": "Capabilities negotiation is already complete, command ignored"}} > > I think an error is by design? Say, we only allow the QMP negociation > to happen once for a session IMHO. I can't remember which error it was, but yes I'm OK with that error happening, it's what happens next that's the problem (teh a,b,c below) Dave > > that's OK but then if I discounnect and reconnect the monitor a few > > times it's really upset; I've had it: > > a) Disconnect immediately when the telnet connects > > b) I've also had it not respond to any commands > > c) I've also seen a hang at system_powerdown where: > > > > the main thread is in: > > #0 0x00007f37aa4d3ef7 in pthread_join (threadid=139876803868416, thread_return=thread_return@entry=0x7ffc174367b0) at pthread_join.c:92 > > #1 0x000055644e5c1f5f in qemu_thread_join (thread=<optimized out>) at /home/dgilbert/peter/qemu/util/qemu-thread-posix.c:547 > > #2 0x000055644e30c688 in iothread_stop (iothread=<optimized out>) at /home/dgilbert/peter/qemu/iothread.c:91 > > #3 0x000055644e21f122 in monitor_cleanup () at /home/dgilbert/peter/qemu/monitor.c:4517 > > #4 0x000055644e1e1925 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dgilbert/peter/qemu/vl.c:4924 > > > > and the monitor thread is in: > > #0 0x00007fdd93de871f in accept4 (fd=fd@entry=10, addr=..., addr@entry=..., addr_len=addr_len@entry=0x7fdd80004430, flags=flags@entry=524288) > > at ../sysdeps/unix/sysv/linux/accept4.c:37 > > #1 0x000055645f42d9ec in qemu_accept (s=10, addr=addr@entry=0x7fdd800043b0, addrlen=addrlen@entry=0x7fdd80004430) > > at /home/dgilbert/peter/qemu/util/osdep.c:431 > > #2 0x000055645f3ea7a1 in qio_channel_socket_accept (ioc=0x556460610f10, errp=errp@entry=0x0) at /home/dgilbert/peter/qemu/io/channel-socket.c:340 > > #3 0x000055645f3db6aa in tcp_chr_accept (channel=0x556460610f10, cond=<optimized out>, opaque=<optimized out>) > > at /home/dgilbert/peter/qemu/chardev/char-socket.c:746 > > #4 0x00007fdd94b2479a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 > > #5 0x00007fdd94b24ae8 in g_main_context_iterate.isra.24 () at /lib64/libglib-2.0.so.0 > > #6 0x00007fdd94b24dba in g_main_loop_run () at /lib64/libglib-2.0.so.0 > > #7 0x000055645f17f516 in iothread_run (opaque=0x55646063e8c0) at /home/dgilbert/peter/qemu/iothread.c:69 > > #8 0x00007fdd9c0fcdc5 in start_thread (arg=0x7fdd8cf79700) at pthread_create.c:308 > > Hmm, this seems to be another more general problem on how we do > accept(). It seems that we are doing accept() synchronouslyly now > even in a GMainLoop, assuming that we will always return fast enough > since we have been notified of a read event of the listening port. > But that can be untrue if the client disconnects very quickly, I > guess. > I think doing async accept() might help? Maybe Dan would know > better. > > Thanks, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/migration/migration.c b/migration/migration.c index 4de3b551fe..c0206023d7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) * will notice we're in POSTCOPY_ACTIVE and not actually * wrap their state up here */ - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); if (migrate_postcopy_ram()) { /* Ping just for debugging, helps line traces up */ qemu_savevm_send_ping(ms->to_dst_file, 2);