Message ID | 20220707185342.26794-1-peterx@redhat.com |
---|---|
Headers | show |
Series | migration: Postcopy Preemption | expand |
* Peter Xu (peterx@redhat.com) wrote: > Based on: > [PATCH] tests: migration-test: Allow test to run without uffd > https://lore.kernel.org/qemu-devel/20220707184600.24164-1-peterx@redhat.com/ > > This is v8 of postcopy preempt series. It can also be found here: > https://github.com/xzpeter/qemu/tree/postcopy-preempt > > RFC: https://lore.kernel.org/qemu-devel/20220119080929.39485-1-peterx@redhat.com > V1: https://lore.kernel.org/qemu-devel/20220216062809.57179-1-peterx@redhat.com > V2: https://lore.kernel.org/qemu-devel/20220301083925.33483-1-peterx@redhat.com > V3: https://lore.kernel.org/qemu-devel/20220330213908.26608-1-peterx@redhat.com > V4: https://lore.kernel.org/qemu-devel/20220331150857.74406-1-peterx@redhat.com > V5: https://lore.kernel.org/qemu-devel/20220425233847.10393-1-peterx@redhat.com > V6: https://lore.kernel.org/qemu-devel/20220517195730.32312-1-peterx@redhat.com > V7: https://lore.kernel.org/qemu-devel/20220524221151.18225-1-peterx@redhat.com > V8: https://lore.kernel.org/qemu-devel/20220622204920.79061-1-peterx@redhat.com > > v9: > - Rebase upon latest master (plus the test fix above on "tests: > migration-test: Allow test to run without uffd") > - Added missing R-bs in v7 Queued, took some minor rework in the tests > Abstract > ======== > > This series added a new migration capability called "postcopy-preempt". It can > be enabled when postcopy is enabled, and it'll simply (but greatly) speed up > postcopy page requests handling process. > > Below are some initial postcopy page request latency measurements after the > new series applied. > > For each page size, I measured page request latency for three cases: > > (a) Vanilla: the old postcopy > (b) Preempt no-break-huge: preempt enabled, x-postcopy-preempt-break-huge=off > (c) Preempt full: preempt enabled, x-postcopy-preempt-break-huge=on > (this is the default option when preempt enabled) > > Here x-postcopy-preempt-break-huge parameter is just added in v2 so as to > conditionally disable the behavior to break sending a precopy huge page for > debugging purpose. So when it's off, postcopy will not preempt precopy > sending a huge page, but still postcopy will use its own channel. > > I tested it separately to give a rough idea on which part of the change > helped how much of it. The overall benefit should be the comparison > between case (a) and (c). > > |-----------+---------+-----------------------+--------------| > | Page size | Vanilla | Preempt no-break-huge | Preempt full | > |-----------+---------+-----------------------+--------------| > | 4K | 10.68 | N/A [*] | 0.57 | > | 2M | 10.58 | 5.49 | 5.02 | > | 1G | 2046.65 | 933.185 | 649.445 | > |-----------+---------+-----------------------+--------------| > [*]: This case is N/A because 4K page does not contain huge page at all > > [1] https://github.com/xzpeter/small-stuffs/blob/master/tools/huge_vm/uffd-latency.bpf > > TODO List > ========= > > Avoid precopy write() blocks postcopy > ------------------------------------- > > I didn't prove this, but I always think the write() syscalls being blocked > for precopy pages can affect postcopy services. If we can solve this > problem then my wild guess is we can further reduce the average page > latency. > > Two solutions at least in mind: (1) we could have made the write side of > the migration channel NON_BLOCK too, or (2) multi-threads on send side, > just like multifd, but we may use lock to protect which page to send too > (e.g., the core idea is we should _never_ rely anything on the main thread, > multifd has that dependency on queuing pages only on main thread). > > That can definitely be done and thought about later. > > Multi-channel for preemption threads > ------------------------------------ > > Currently the postcopy preempt feature use only one extra channel and one > extra thread on dest (no new thread on src QEMU). It should be mostly good > enough for major use cases, but when the postcopy queue is long enough > (e.g. hundreds of vCPUs faulted on different pages) logically we could > still observe more delays in average. Whether growing threads/channels can > solve it is debatable, but sounds worthwhile a try. That's yet another > thing we can think about after this patchset lands. > > Logically the design provides space for that - the receiving postcopy > preempt thread can understand all ram-layer migration protocol, and for > multi channel and multi threads we could simply grow that into multile > threads handling the same protocol (with multiple PostcopyTmpPage). The > source needs more thoughts on synchronizations, though, but it shouldn't > affect the whole protocol layer, so should be easy to keep compatible. > > Please review, thanks. > > Peter Xu (14): > migration: Add postcopy-preempt capability > migration: Postcopy preemption preparation on channel creation > migration: Postcopy preemption enablement > migration: Postcopy recover with preempt enabled > migration: Create the postcopy preempt channel asynchronously > migration: Add property x-postcopy-preempt-break-huge > migration: Add helpers to detect TLS capability > migration: Export tls-[creds|hostname|authz] params to cmdline too > migration: Enable TLS for preempt channel > migration: Respect postcopy request order in preemption mode > tests: Move MigrateCommon upper > tests: Add postcopy tls migration test > tests: Add postcopy tls recovery migration test > tests: Add postcopy preempt tests > > migration/channel.c | 9 +- > migration/migration.c | 134 ++++++++++++-- > migration/migration.h | 44 ++++- > migration/multifd.c | 4 +- > migration/postcopy-ram.c | 186 +++++++++++++++++++- > migration/postcopy-ram.h | 11 ++ > migration/qemu-file.c | 27 +++ > migration/qemu-file.h | 1 + > migration/ram.c | 326 +++++++++++++++++++++++++++++++++-- > migration/ram.h | 4 +- > migration/savevm.c | 46 +++-- > migration/socket.c | 22 ++- > migration/socket.h | 1 + > migration/tls.c | 9 + > migration/tls.h | 4 + > migration/trace-events | 15 +- > qapi/migration.json | 7 +- > tests/qtest/migration-test.c | 286 +++++++++++++++++++++--------- > 18 files changed, 990 insertions(+), 146 deletions(-) > > -- > 2.32.0 > >