Show a cover letter.

GET /api/covers/807500/?format=api
HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "id": 807500,
    "url": "http://patchwork.ozlabs.org/api/covers/807500/?format=api",
    "web_url": "http://patchwork.ozlabs.org/project/qemu-devel/cover/1504081950-2528-1-git-send-email-peterx@redhat.com/",
    "project": {
        "id": 14,
        "url": "http://patchwork.ozlabs.org/api/projects/14/?format=api",
        "name": "QEMU Development",
        "link_name": "qemu-devel",
        "list_id": "qemu-devel.nongnu.org",
        "list_email": "qemu-devel@nongnu.org",
        "web_url": "",
        "scm_url": "",
        "webscm_url": "",
        "list_archive_url": "",
        "list_archive_url_format": "",
        "commit_url_format": ""
    },
    "msgid": "<1504081950-2528-1-git-send-email-peterx@redhat.com>",
    "list_archive_url": null,
    "date": "2017-08-30T08:31:57",
    "name": "[RFC,v2,00/33] Migration: postcopy failure recovery",
    "submitter": {
        "id": 67717,
        "url": "http://patchwork.ozlabs.org/api/people/67717/?format=api",
        "name": "Peter Xu",
        "email": "peterx@redhat.com"
    },
    "mbox": "http://patchwork.ozlabs.org/project/qemu-devel/cover/1504081950-2528-1-git-send-email-peterx@redhat.com/mbox/",
    "series": [
        {
            "id": 552,
            "url": "http://patchwork.ozlabs.org/api/series/552/?format=api",
            "web_url": "http://patchwork.ozlabs.org/project/qemu-devel/list/?series=552",
            "date": "2017-08-30T08:31:59",
            "name": "Migration: postcopy failure recovery",
            "version": 2,
            "mbox": "http://patchwork.ozlabs.org/series/552/mbox/"
        }
    ],
    "comments": "http://patchwork.ozlabs.org/api/covers/807500/comments/",
    "headers": {
        "Return-Path": "<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>",
        "X-Original-To": "incoming@patchwork.ozlabs.org",
        "Delivered-To": "patchwork-incoming@bilbo.ozlabs.org",
        "Authentication-Results": [
            "ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)",
            "ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com",
            "ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"
        ],
        "Received": [
            "from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xhzZB0N5Dz9t2Q\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 30 Aug 2017 18:44:02 +1000 (AEST)",
            "from localhost ([::1]:49059 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dmybP-0007Tx-NY\n\tfor incoming@patchwork.ozlabs.org; Wed, 30 Aug 2017 04:43:59 -0400",
            "from eggs.gnu.org ([2001:4830:134:3::10]:33870)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dmyQY-0006h7-9x\n\tfor qemu-devel@nongnu.org; Wed, 30 Aug 2017 04:32:48 -0400",
            "from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dmyQT-00035Y-PC\n\tfor qemu-devel@nongnu.org; Wed, 30 Aug 2017 04:32:46 -0400",
            "from mx1.redhat.com ([209.132.183.28]:46958)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1dmyQT-00035C-Fh\n\tfor qemu-devel@nongnu.org; Wed, 30 Aug 2017 04:32:41 -0400",
            "from smtp.corp.redhat.com\n\t(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 5789334C0;\n\tWed, 30 Aug 2017 08:32:40 +0000 (UTC)",
            "from pxdev.xzpeter.org.com (dhcp-14-103.nay.redhat.com\n\t[10.66.14.103])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 7C233871E6;\n\tWed, 30 Aug 2017 08:32:35 +0000 (UTC)"
        ],
        "DMARC-Filter": "OpenDMARC Filter v1.3.2 mx1.redhat.com 5789334C0",
        "From": "Peter Xu <peterx@redhat.com>",
        "To": "qemu-devel@nongnu.org",
        "Date": "Wed, 30 Aug 2017 16:31:57 +0800",
        "Message-Id": "<1504081950-2528-1-git-send-email-peterx@redhat.com>",
        "X-Scanned-By": "MIMEDefang 2.79 on 10.5.11.15",
        "X-Greylist": "Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.29]);\n\tWed, 30 Aug 2017 08:32:40 +0000 (UTC)",
        "X-detected-operating-system": "by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]",
        "X-Received-From": "209.132.183.28",
        "Subject": "[Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery",
        "X-BeenThere": "qemu-devel@nongnu.org",
        "X-Mailman-Version": "2.1.21",
        "Precedence": "list",
        "List-Id": "<qemu-devel.nongnu.org>",
        "List-Unsubscribe": "<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>",
        "List-Archive": "<http://lists.nongnu.org/archive/html/qemu-devel/>",
        "List-Post": "<mailto:qemu-devel@nongnu.org>",
        "List-Help": "<mailto:qemu-devel-request@nongnu.org?subject=help>",
        "List-Subscribe": "<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>",
        "Cc": "Laurent Vivier <lvivier@redhat.com>,\n\tAndrea Arcangeli <aarcange@redhat.com>, \n\tJuan Quintela <quintela@redhat.com>,\n\tAlexey Perevalov <a.perevalov@samsung.com>, peterx@redhat.com,\n\t\"Dr . David Alan Gilbert\" <dgilbert@redhat.com>",
        "Errors-To": "qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org",
        "Sender": "\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"
    },
    "content": "v2 note (the coarse-grained changelog):\n\n- I appended the migrate-incoming re-use series into this one, since\n  that one depends on this one, and it's really for the recovery\n\n- I haven't yet added (actually I just added them but removed) the\n  per-monitor thread related patches into this one, basically to setup\n  \"need-bql\"=\"false\" patches - the solution for the monitor hang issue\n  is still during discussion in the other thread.  I'll add them in\n  when settled.\n\n- Quite a lot of other changes and additions regarding to v1 review\n  comments.  I think I settled all the comments, but the God knows\n  better.\n\nFeel free to skip this ugly longer changelog (it's too long to be\nmeaningful I'm afraid).\n\nv2:\n- rebased to alexey's received bitmap v9\n- add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21\n- patch 1: use target page size to calc bitmap [Dave]\n- patch 3: move trace_*() after EINTR check [Dave]\n- patch 4: dropped since I can use bitmap_complement() [Dave]\n- patch 7: check file error right after data is read in both\n  qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(),\n  meanwhile also check in check_section_footer() [Dave]\n- patch 8/9: fix error_report/commit message in both patches [Dave]\n- patch 10: dropped (new parameter \"x-postcopy-fast\")\n- patch 11: split the \"postcopy-paused\" patch into two, one to\n  introduce the new state, the other to implement the logic. Also,\n  print something when paused [Dave]\n- patch 17: removed do_resume label, introduced migration_prepare()\n  [Dave]\n- patch 18: removed do_pause label using a new loop [Dave]\n- patch 20: removed incorrect comment [Dave]\n- patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add\n  trace in loadvm_handle_recv_bitmap() [Dave]\n- patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit\n  machines. More info in the commit message update.\n- patch 23: add one check on migration state [Dave]\n- patch 24: use macro instead of magic 1 [Dave]\n- patch 26: use more trace_*() instead of one, and use one sem to\n  replace mutex+cond. [Dave]\n- move sem init/destroy into migration_instance_init() and\n  migration_instance_finalize (new function after rebase).\n- patch 29: squashed this patch most into:\n  \"migration: implement \"postcopy-pause\" src logic\" [Dave]\n- split the two fix patches out of the series\n- fixed two places where I misused \"wake/woke/woken\". [Dave]\n- add new patch \"bitmap: provide to_le/from_le helpers\" to solve the\n  bitmap endianess issue [Dave]\n- appended migrate_incoming series to this series, since that one is\n  depending on the paused state.  Using explicit g_source_remove() for\n  listening ports [Dan]\n\nFUTURE TODO LIST\n- support manual switch source into PAUSED state\n- support migrate_cancel during PAUSED/RECOVER state\n- when anything wrong happens during PAUSED/RECOVER, switching back to\n  PAUSED state on both sides\n\nAs we all know that postcopy migration has a potential risk to lost\nthe VM if the network is broken during the migration. This series\ntries to solve the problem by allowing the migration to pause at the\nfailure point, and do recovery after the link is reconnected.\n\nThere was existing work on this issue from Md Haris Iqbal:\n\nhttps://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html\n\nThis series is a totally re-work of the issue, based on Alexey\nPerevalov's recved bitmap v8 series:\n\nhttps://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html\n\nTwo new status are added to support the migration (used on both\nsides):\n\n  MIGRATION_STATUS_POSTCOPY_PAUSED\n  MIGRATION_STATUS_POSTCOPY_RECOVER\n\nThe MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the\nnetwork failure is detected. It is a phase that we'll be in for a long\ntime as long as the failure is detected, and we'll be there until a\nrecovery is triggered.  In this state, all the threads (on source:\nsend thread, return-path thread; destination: ram-load thread,\npage-fault thread) will be halted.\n\nThe MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered\na recovery, both source/destination VM will jump into this stage, do\nwhatever it needs to prepare the recovery (e.g., currently the most\nimportant thing is to synchronize the dirty bitmap, please see commit\nmessages for more information). After the preparation is ready, the\nsource will do the final handshake with destination, then both sides\nwill switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.\n\nNew commands/messages are defined as well to satisfy the need:\n\nMIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for\ndelivering received bitmaps\n\nMIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final\nhandshake of postcopy recovery.\n\nHere's some more details on how the whole failure/recovery routine is\nhappened:\n\n- start migration\n- ... (switch from precopy to postcopy)\n- both sides are in \"postcopy-active\" state\n- ... (failure happened, e.g., network unplugged)\n- both sides switch to \"postcopy-paused\" state\n  - all the migration threads are stopped on both sides\n- ... (both VMs hanged)\n- ... (user triggers recovery using \"migrate -r -d tcp:HOST:PORT\" on\n  source side, \"-r\" means \"recover\")\n- both sides switch to \"postcopy-recover\" state\n  - on source: send-thread, return-path-thread will be waked up\n  - on dest: ram-load-thread waked up, fault-thread still paused\n- source calls new savevmhandler hook resume_prepare() (currently,\n  only ram is providing the hook):\n  - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:\n    - src sends MIG_CMD_RECV_BITMAP to dst\n    - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data\n      - src uses the recved bitmap to rebuild dirty bitmap\n- source do final handshake with destination\n  - src sends MIG_CMD_RESUME to dst, telling \"src is ready\"\n    - when dst receives the command, fault thread will be waked up,\n      meanwhile, dst switch back to \"postcopy-active\"\n  - dst sends MIG_RP_MSG_RESUME_ACK to src, telling \"dst is ready\"\n    - when src receives the ack, state switch to \"postcopy-active\"\n- postcopy migration continued\n\nTesting:\n\nAs I said, it's still an extremely simple test. I used socat to create\na socket bridge:\n\n  socat tcp-listen:6666 tcp-connect:localhost:5555 &\n\nThen do the migration via the bridge. I emulated the network failure\nby killing the socat process (bridge down), then tries to recover the\nmigration using the other channel (default dst channel). It looks\nlike:\n\n        port:6666    +------------------+\n        +----------> | socat bridge [1] |-------+\n        |            +------------------+       |\n        |         (Original channel)            |\n        |                                       | port: 5555\n     +---------+  (Recovery channel)            +--->+---------+\n     | src VM  |------------------------------------>| dst VM  |\n     +---------+                                     +---------+\n\nKnown issues/notes:\n\n- currently destination listening port still cannot change. E.g., the\n  recovery should be using the same port on destination for\n  simplicity. (on source, we can specify new URL)\n\n- the patch: \"migration: let dst listen on port always\" is still\n  hacky, it just kept the incoming accept open forever for now...\n\n- some migration numbers might still be inaccurate, like total\n  migration time, etc. (But I don't really think that matters much\n  now)\n\n- the patches are very lightly tested.\n\n- Dave reported one problem that may hang destination main loop thread\n  (one vcpu thread holds the BQL) and the rest. I haven't encountered\n  it yet, but it does not mean this series can survive with it.\n\n- other potential issues that I may have forgotten or unnoticed...\n\nAnyway, the work is still in preliminary stage. Any suggestions and\ncomments are greatly welcomed.  Thanks.\n\nPeter Xu (33):\n  bitmap: remove BITOP_WORD()\n  bitmap: introduce bitmap_count_one()\n  bitmap: provide to_le/from_le helpers\n  migration: dump str in migrate_set_state trace\n  migration: better error handling with QEMUFile\n  migration: reuse mis->userfault_quit_fd\n  migration: provide postcopy_fault_thread_notify()\n  migration: new postcopy-pause state\n  migration: implement \"postcopy-pause\" src logic\n  migration: allow dst vm pause on postcopy\n  migration: allow src return path to pause\n  migration: allow send_rq to fail\n  migration: allow fault thread to pause\n  qmp: hmp: add migrate \"resume\" option\n  migration: pass MigrationState to migrate_init()\n  migration: rebuild channel on source\n  migration: new state \"postcopy-recover\"\n  migration: wakeup dst ram-load-thread for recover\n  migration: new cmd MIG_CMD_RECV_BITMAP\n  migration: new message MIG_RP_MSG_RECV_BITMAP\n  migration: new cmd MIG_CMD_POSTCOPY_RESUME\n  migration: new message MIG_RP_MSG_RESUME_ACK\n  migration: introduce SaveVMHandlers.resume_prepare\n  migration: synchronize dirty bitmap for resume\n  migration: setup ramstate for resume\n  migration: final handshake for the resume\n  migration: free SocketAddress where allocated\n  migration: return incoming task tag for sockets\n  migration: return incoming task tag for exec\n  migration: return incoming task tag for fd\n  migration: store listen task tag\n  migration: allow migrate_incoming for paused VM\n  migration: init dst in migration_object_init too\n\n hmp-commands.hx              |   7 +-\n hmp.c                        |   4 +-\n include/migration/register.h |   2 +\n include/qemu/bitmap.h        |  17 ++\n migration/exec.c             |  20 +-\n migration/exec.h             |   2 +-\n migration/fd.c               |  20 +-\n migration/fd.h               |   2 +-\n migration/migration.c        | 578 ++++++++++++++++++++++++++++++++++++++-----\n migration/migration.h        |  26 +-\n migration/postcopy-ram.c     | 107 ++++++--\n migration/postcopy-ram.h     |   2 +\n migration/ram.c              | 265 +++++++++++++++++++-\n migration/ram.h              |   3 +\n migration/savevm.c           | 229 ++++++++++++++++-\n migration/savevm.h           |   3 +\n migration/socket.c           |  42 ++--\n migration/socket.h           |   4 +-\n migration/trace-events       |  21 +-\n qapi-schema.json             |  12 +-\n util/bitmap.c                |  47 ++++\n util/bitops.c                |   6 +-\n 22 files changed, 1266 insertions(+), 153 deletions(-)"
}