From patchwork Fri Apr 14 13:17:18 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexey Perevalov <a.perevalov@samsung.com>
X-Patchwork-Id: 750852
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3w4JF51PcLz9s8C
	for <incoming@patchwork.ozlabs.org>;
	Fri, 14 Apr 2017 23:20:41 +1000 (AEST)
Received: from localhost ([::1]:53284 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)
	id 1cz19S-0000p7-N4
	for incoming@patchwork.ozlabs.org; Fri, 14 Apr 2017 09:20:38 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49182)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <a.perevalov@samsung.com>) id 1cz16i-0007Rz-UV
	for qemu-devel@nongnu.org; Fri, 14 Apr 2017 09:17:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <a.perevalov@samsung.com>) id 1cz16f-0008EM-LM
	for qemu-devel@nongnu.org; Fri, 14 Apr 2017 09:17:48 -0400
Received: from mailout1.w1.samsung.com ([210.118.77.11]:24106)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <a.perevalov@samsung.com>)
	id 1cz16f-0008DO-AH
	for qemu-devel@nongnu.org; Fri, 14 Apr 2017 09:17:45 -0400
Received: from eucas1p2.samsung.com (unknown [182.198.249.207])
	by mailout1.w1.samsung.com
	(Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5
	2014)) with ESMTP id <0OOE00FYQI9HA800@mailout1.w1.samsung.com> for
	qemu-devel@nongnu.org; Fri, 14 Apr 2017 14:17:41 +0100 (BST)
Received: from eusmges1.samsung.com (unknown [203.254.199.239])
	by	eucas1p2.samsung.com (KnoxPortal) with ESMTP
	id
	20170414131741eucas1p2135df2d85b1c87ad0d35e426f64916ce~1Rgo1KGx92464124641eucas1p2D;
	Fri, 14 Apr 2017 13:17:41 +0000 (GMT)
Received: from eucas1p1.samsung.com ( [182.198.249.206])
	by	eusmges1.samsung.com  (EUCPMTA) with SMTP id EC.E6.14140.7FBC0F85;
	Fri, 14	Apr 2017 14:17:43 +0100 (BST)
Received: from eusmgms1.samsung.com (unknown [182.198.249.179])
	by	eucas1p2.samsung.com (KnoxPortal) with ESMTP
	id
	20170414131740eucas1p27eba648b990a93a627265c740e7ff118~1Rgn0o4xi1945719457eucas1p2t;
	Fri, 14 Apr 2017 13:17:40 +0000 (GMT)
X-AuditID: cbfec7ef-f796a6d00000373c-9e-58f0cbf72bd0
Received: from eusync2.samsung.com ( [203.254.199.212])
	by	eusmgms1.samsung.com (EUCPMTA) with SMTP id 2A.2D.17452.17CC0F85;
	Fri, 14	Apr 2017 14:19:45 +0100 (BST)
Received: from aperevalov-ubuntu.rnd.samsung.ru ([106.109.129.199])
	by	eusync2.samsung.com
	(Oracle Communications Messaging Server 7.0.5.31.0 64bit	(built May 5
	2014)) with ESMTPA id <0OOE00KVHI9AY150@eusync2.samsung.com>;	Fri,
	14 Apr 2017 14:17:40 +0100 (BST)
From: Alexey Perevalov <a.perevalov@samsung.com>
To: dgilbert@redhat.com, qemu-devel@nongnu.org
Date: Fri, 14 Apr 2017 16:17:18 +0300
Message-id: <1492175840-5021-5-git-send-email-a.perevalov@samsung.com>
X-Mailer: git-send-email 1.9.1
In-reply-to: <1492175840-5021-1-git-send-email-a.perevalov@samsung.com>
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFlrHIsWRmVeSWpSXmKPExsWy7djPc7rfT3+IMFjQqWMx9+55FovebffY
	La60/2S3ON67g8WBxePJtc1MHu/3XWXz6NuyijGAOYrLJiU1J7MstUjfLoEr4/8G6YKjExgr
	ji08yNjA2JjbxcjBISFgInF3j2YXIyeQKSZx4d56ti5GLg4hgWWMEl9mrYJyPjNKPN63gRmi
	ykRiVscGFhAbrKrncwZEUTeTxNTLL9hBprIJGEjsu2cLUiMioCdx5VsnI4jNLGAs0fLzOTuI
	LSxgJ9G5ZTUjSDmLgKrE7osZIGFeATeJrasnMUGskpM4eWwyK4jNKeAu8erHShaQVRICZ9gk
	Vt4/ygbxgKzEpgPMEKaLxJrJohCtwhKvjm9hh7BlJDo7DjJBtLYzSnTv7GSFcCYwSpyZ/heq
	yl7i1M2rTBB38klM2jYdaiivREebEESJh8SnnTtZIWxHidPfN7BCvD6LUaLv6mWWCYwyCxgZ
	VjGKpJYW56anFhvqFSfmFpfmpesl5+duYgTG4ul/x9/vYHzaHHKIUYCDUYmH98LxDxFCrIll
	xZW5hxglOJiVRHijTwGFeFMSK6tSi/Lji0pzUosPMUpzsCiJ8/KeuhYhJJCeWJKanZpakFoE
	k2Xi4JRqYDTs+V+dONN9wWvRHWJzBUNbuR0jp5dyue57MHnv5WpHyUnmn2VWC1oc76pN9LGz
	WFtZzSAUfuGYxcfm/RwWOskHsnRtnk4v/nNgxXP307eUDX8WfPNNbG6PPsA+PefNtTmPTwg8
	1IzItWo2vjo38qvQj6O/Xm/atNCUO3avvOv94vdif2ZUJimxFGckGmoxFxUnAgAUHq5lwQIA	AA==
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFjrPLMWRmVeSWpSXmKPExsVy+t/xK7qFZz5EGKxqVbeYe/c8i0Xvtnvs
	Flfaf7JbHO/dweLA4vHk2mYmj/f7rrJ59G1ZxRjAHOVmk5GamJJapJCal5yfkpmXbqsUGuKm
	a6GkkJeYm2qrFKHrGxKkpFCWmFMK5BkZoAEH5wD3YCV9uwS3jP8bpAuOTmCsOLbwIGMDY2Nu
	FyMnh4SAicSsjg0sELaYxIV769m6GLk4hASWMEq0TbwO5fQySRzYtJapi5GDg03AQGLfPVuQ
	BhEBPYkr3zoZQWxmAWOJlp/P2UFsYQE7ic4tqxlBylkEVCV2X8wACfMKuElsXT2JCWKXnMTJ
	Y5NZQWxOAXeJVz9Wgt0gBFSz8vR7pgmMvAsYGVYxiqSWFuem5xYb6hUn5haX5qXrJefnbmIE
	Bua2Yz8372C8tDH4EKMAB6MSD2/F0Q8RQqyJZcWVuYcYJTiYlUR4o08BhXhTEiurUovy44tK
	c1KLDzGaAt00kVlKNDkfGDV5JfGGJobmloZGxhYW5kZGSuK8JR+uhAsJpCeWpGanphakFsH0
	MXFwSjUw6jfW5zavsfrRd03hhvbs2aJzr2keztvqo9VpdESlIrhm3pTb2r/F2mNbpum3eH2p
	0O1QX177qM7lokvcdUGBvct2SF7hvLPKMXzTJ7mcd3PXTmbtV/mydn6ome/M8B2GGyZe9dl+
	7Lm10ZTQaHOj9S0VBvrvq2ZveRO9zih0i/6X9PWxy8IilViKMxINtZiLihMBpnPyr2ICAAA=
X-MTR: 20000000000000000@CPGS
X-CMS-MailID: 20170414131740eucas1p27eba648b990a93a627265c740e7ff118
X-Msg-Generator: CA
X-Sender-IP: 182.198.249.179
X-Local-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?=
	=?UTF-8?B?TGFiG+yCvOyEseyghOyekBtTZW5pb3IgRW5naW5lZXI=?=
X-Global-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?=
	=?UTF-8?B?TGFiG1NhbXN1bmcgRWxlY3Ryb25pY3MbU2VuaW9yIEVuZ2luZWVy?=
X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?=
CMS-TYPE: 201P
X-HopCount: 7
X-CMS-RootMailID: 20170414131740eucas1p27eba648b990a93a627265c740e7ff118
X-RootMTR: 20170414131740eucas1p27eba648b990a93a627265c740e7ff118
References: <1492175840-5021-1-git-send-email-a.perevalov@samsung.com>
	<CGME20170414131740eucas1p27eba648b990a93a627265c740e7ff118@eucas1p2.samsung.com>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 210.118.77.11
Subject: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: i.maximets@samsung.com, a.perevalov@samsung.com
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

This patch provides downtime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach just keeps tree with page fault addr as a key,
and t1-t2 interval of pagefault time and page copy time, with
affected vCPU bit mask.
For more implementation details please see comment to
get_postcopy_total_downtime function.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/migration.h |  14 +++
 migration/migration.c         | 280 +++++++++++++++++++++++++++++++++++++++++-
 migration/postcopy-ram.c      |  24 +++-
 migration/qemu-file.c         |   1 -
 migration/trace-events        |   9 +-
 5 files changed, 323 insertions(+), 5 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5720c88..5d2c628 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -123,10 +123,24 @@ struct MigrationIncomingState {
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
+
+    /*
+     *  Tree for keeping postcopy downtime,
+     *  necessary to calculate correct downtime, during multiple
+     *  vm suspends, it keeps host page address as a key and
+     *  DowntimeDuration as a data
+     *  NULL means kernel couldn't provide process thread id,
+     *  and QEMU couldn't identify which vCPU raise page fault
+     */
+    GTree *postcopy_downtime;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
+void mark_postcopy_downtime_end(uint64_t addr);
+uint64_t get_postcopy_total_downtime(void);
+void destroy_downtime_duration(gpointer data);
 
 /*
  * An outstanding page request, on the source, having been received
diff --git a/migration/migration.c b/migration/migration.c
index 79f6425..5bac434 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -38,6 +38,8 @@
 #include "io/channel-tls.h"
 #include "migration/colo.h"
 
+#define DEBUG_VCPU_DOWNTIME 1
+
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
 /* Amount of time to allocate to each "chunk" of bandwidth-throttled
@@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
 
 static bool deferred_incoming;
 
+typedef struct {
+    int64_t begin;
+    int64_t end;
+    uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
+     bit operation on memory regions, but doesn't check out of range */
+} DowntimeDuration;
+
+typedef struct {
+    int64_t tp; /* point in time */
+    bool is_end;
+    uint64_t *cpus;
+} OverlapDowntime;
+
 /*
  * Current state of incoming postcopy; note this is not part of
  * MigrationIncomingState since it's state is used during cleanup
@@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+void destroy_downtime_duration(gpointer data)
+{
+    DowntimeDuration *dd = (DowntimeDuration *)data;
+    g_free(dd->cpus);
+    g_free(data);
+}
+
 MigrationIncomingState *migration_incoming_get_current(void)
 {
     static bool once;
@@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
     struct MigrationIncomingState *mis = migration_incoming_get_current();
 
     qemu_event_destroy(&mis->main_thread_load_event);
+    if (mis->postcopy_downtime) {
+        g_tree_destroy(mis->postcopy_downtime);
+        mis->postcopy_downtime = NULL;
+    }
     loadvm_free_handlers(mis);
 }
 
-
 typedef struct {
     bool optional;
     uint32_t size;
@@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      */
     ms->postcopy_after_devices = true;
     notifier_list_notify(&migration_state_notifiers, ms);
-
     ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
 
     qemu_mutex_unlock_iothread();
@@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
     return atomic_xchg(&incoming_postcopy_state, new_state);
 }
 
+#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
+
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    DowntimeDuration *dd;
+    if (!mis->postcopy_downtime) {
+        return;
+    }
+
+    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr); /* !!! cast */
+    if (!dd) {
+        dd = (DowntimeDuration *)g_new0(DowntimeDuration, 1);
+        dd->cpus = g_new0(guint64, SIZE_TO_KEEP_CPUBITS);
+        g_tree_insert(mis->postcopy_downtime, (gpointer)addr, (gpointer)dd);
+    }
+
+    if (cpu < 0) {
+        /* assume in this situation all vCPUs are sleeping */
+        int i;
+        for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
+            dd->cpus[i] = ~(uint64_t)0u;
+        }
+    } else
+        set_bit(cpu, dd->cpus);
+
+    /*
+     *  overwrite previously set dd->begin, if that page already was
+     *     faulted on another cpu
+     */
+    dd->begin = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    trace_mark_postcopy_downtime_begin(addr, dd, dd->begin, cpu);
+}
+
+void mark_postcopy_downtime_end(uint64_t addr)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    DowntimeDuration *dd;
+    if (!mis->postcopy_downtime) {
+        return;
+    }
+
+    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr);
+    if (!dd) {
+        /* error_report("Could not populate downtime duration completion time \n\
+                        There is no downtime duration for 0x%"PRIx64, addr); */
+        return;
+    }
+
+    dd->end = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    trace_mark_postcopy_downtime_end(addr, dd, dd->end);
+}
+
+struct downtime_overlay_cxt {
+    GPtrArray *downtime_points;
+    size_t number_of_points;
+};
+/*
+ * This function split each DowntimeDuration, which represents as start/end
+ * pointand makes a points of it, then fill array with points,
+ * to sort it in future.
+ */
+static gboolean split_duration_and_fill_points(gpointer key, gpointer value,
+                                        gpointer data)
+{
+    struct downtime_overlay_cxt *ctx = (struct downtime_overlay_cxt *)data;
+    DowntimeDuration *dd = (DowntimeDuration *)value;
+    GPtrArray *interval = ctx->downtime_points;
+    if (dd->begin) {
+        OverlapDowntime *od_begin = g_new0(OverlapDowntime, 1);
+        od_begin->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
+        od_begin->tp = dd->begin;
+        od_begin->is_end = false;
+        g_ptr_array_add(interval, od_begin);
+        ctx->number_of_points += 1;
+    }
+
+    if (dd->end) {
+        OverlapDowntime *od_end = g_new0(OverlapDowntime, 1);
+        od_end->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
+        od_end->tp = dd->end;
+        od_end->is_end = true;
+        g_ptr_array_add(interval, od_end);
+        ctx->number_of_points += 1;
+    }
+
+    if (dd->end && dd->begin)
+        trace_split_duration_and_fill_points(dd->end - dd->begin, (uint64_t)key);
+    return FALSE;
+}
+
+#ifdef DEBUG_VCPU_DOWNTIME
+static gboolean calculate_per_cpu(gpointer key, gpointer value,
+                                  gpointer data)
+{
+    int *downtime_cpu = (int *)data;
+    DowntimeDuration *dd = (DowntimeDuration *)value;
+    int cpu_iter;
+    for (cpu_iter = 0; cpu_iter < smp_cpus; cpu_iter++) {
+        if (test_bit(cpu_iter, dd->cpus) && dd->end && dd->begin)
+            downtime_cpu[cpu_iter] += dd->end - dd->begin;
+    }
+    return FALSE;
+}
+#endif /* DEBUG_VCPU_DOWNTIME */
+
+static gint compare_downtime(gconstpointer a, gconstpointer b)
+{
+    DowntimeDuration *dda = (DowntimeDuration *)a;
+    DowntimeDuration *ddb = (DowntimeDuration *)b;
+    return dda->begin - ddb->begin;
+}
+
+static void destroy_overlap_downtime(gpointer data)
+{
+    OverlapDowntime *od = (OverlapDowntime *)data;
+    g_free(od->cpus);
+    g_free(data);
+}
+
+static int check_overlap(uint64_t *b)
+{
+    unsigned long zero_bit = find_first_zero_bit(b, BITS_PER_LONG * SIZE_TO_KEEP_CPUBITS);
+    return zero_bit >= smp_cpus;
+}
+
+/*
+ * This function calculates downtime per cpu and trace it
+ *
+ *  Also it calculates total downtime as an interval's overlap,
+ *  for many vCPU.
+ *
+ *  The approach is following:
+ *  Initially intervals are represented in tree where key is
+ *  pagefault address, and values:
+ *   begin - page fault time
+ *   end   - page load time
+ *   cpus  - bit mask shows affected cpus
+ *
+ *  To calculate overlap on all cpus, intervals converted into
+ *  array of points in time (downtime_points), the size of
+ *  array is 2 * number of nodes in tree of intervals (2 array
+ *  elements per one in element of interval).
+ *  Each element is marked as end (E) or as start (S) of interval.
+ *  The overlap downtime will be calculated for SE, only in case
+ *  there is sequence S(0..N)E(M) for every vCPU.
+ *
+ * As example we have 3 CPU
+ *
+ *      S1        E1           S1               E1
+ * -----***********------------xxx***************------------------------> CPU1
+ *
+ *             S2                E2
+ * ------------****************xxx---------------------------------------> CPU2
+ *
+ *                         S3            E3
+ * ------------------------****xxx********-------------------------------> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
+ * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
+ * Legend of picture is following: * - means downtime per vCPU
+ *                                 x - means overlapped downtime
+ */
+uint64_t get_postcopy_total_downtime(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    uint64_t total_downtime = 0; /* for total overlapped downtime */
+    const int intervals = g_tree_nnodes(mis->postcopy_downtime);
+    int point_iter, start_point_iter, i;
+    struct downtime_overlay_cxt dp_ctx = { 0 };
+    /*
+     * array will contain 2 * interval points or less, if
+     * it was not page fault finalization for page,
+     * real count will be in ctx.number_of_points
+     */
+    dp_ctx.downtime_points = g_ptr_array_new_full(2 * intervals,
+                                                     destroy_overlap_downtime);
+    if (!mis->postcopy_downtime) {
+        goto out;
+    }
+
+#ifdef DEBUG_VCPU_DOWNTIME
+    {
+        gint *downtime_cpu = g_new0(int, smp_cpus);
+        g_tree_foreach(mis->postcopy_downtime, calculate_per_cpu, downtime_cpu);
+        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
+        {
+            trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
+        }
+        g_free(downtime_cpu);
+    }
+#endif /* DEBUG_VCPU_DOWNTIME */
+
+    /* make downtime points S/E from interval */
+    g_tree_foreach(mis->postcopy_downtime, split_duration_and_fill_points,
+                   &dp_ctx);
+    g_ptr_array_sort(dp_ctx.downtime_points, compare_downtime);
+
+    for (point_iter = 1; point_iter < dp_ctx.number_of_points;
+         point_iter++) {
+        OverlapDowntime *od = g_ptr_array_index(dp_ctx.downtime_points,
+                point_iter);
+        uint64_t *cur_cpus;
+        int smp_cpus_i = smp_cpus;
+        OverlapDowntime *prev_od = g_ptr_array_index(dp_ctx.downtime_points,
+                                                     point_iter - 1);
+        if (!od || !prev_od)
+            continue;
+        /* we need sequence SE */
+        if (!od->is_end || prev_od->is_end)
+            continue;
+
+        cur_cpus = g_memdup(od->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
+        for (start_point_iter = point_iter - 1;
+             start_point_iter >= 0 && smp_cpus_i;
+             start_point_iter--, smp_cpus_i--) {
+            OverlapDowntime *t_od = g_ptr_array_index(dp_ctx.downtime_points,
+                                                      start_point_iter);
+            if (!t_od)
+                break;
+            /* should be S */
+            if (t_od->is_end)
+                break;
+
+            /* points were sorted, it's possible when
+             * end is not occured, but this points were ommited
+             * in split_duration_and_fill_points */
+            if (od->tp <= prev_od->tp) {
+                break;
+            }
+
+            for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
+                cur_cpus[i] |= t_od->cpus[i];
+            }
+
+            /* check_overlap - just count number of bits in cur_cpus,
+             * and compare it with smp_cpus */
+            if (check_overlap(cur_cpus)) {
+                total_downtime += od->tp - prev_od->tp;
+                /* situation when one S point represents all vCPU is possible */
+                break;
+            }
+        }
+        g_free(cur_cpus);
+    }
+    trace_get_postcopy_total_downtime(g_tree_nnodes(mis->postcopy_downtime),
+        total_downtime);
+out:
+    g_ptr_array_free(dp_ctx.downtime_points, TRUE);
+    return total_downtime;
+}
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 70f0480..ea89f4e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -23,8 +23,10 @@
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
+#include <sys/param.h>
 #include "qemu/error-report.h"
 #include "trace.h"
+#include "glib/glib-helper.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -81,6 +83,11 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
         return false;
     }
 
+    if (mis && UFFD_FEATURE_THREAD_ID & api_struct.features) {
+        mis->postcopy_downtime = g_tree_new_full(g_int_cmp64,
+                                         NULL, NULL, destroy_downtime_duration);
+    }
+
     if (getpagesize() != ram_pagesize_summary()) {
         bool have_hp = false;
         /* We've got a huge page */
@@ -404,6 +411,18 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+    CPUState *cpu_iter;
+
+    CPU_FOREACH(cpu_iter) {
+        if (cpu_iter->thread_id == pid)
+           return cpu_iter->cpu_index;
+    }
+    trace_get_mem_fault_cpu_index(pid);
+    return -1;
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -481,8 +500,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
         rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
         trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
-                                                rb_offset);
+                                                rb_offset, msg.arg.pagefault.feat.ptid);
 
+        mark_postcopy_downtime_begin(msg.arg.pagefault.address,
+                            get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
@@ -577,6 +598,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
         return -e;
     }
+    mark_postcopy_downtime_end((uint64_t)host);
 
     trace_postcopy_place_page(host);
     return 0;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 195fa94..c9f3e47 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -547,7 +547,6 @@ size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
 int qemu_peek_byte(QEMUFile *f, int offset)
 {
     int index = f->buf_index + offset;
-
     assert(!qemu_file_is_writable(f));
     assert(offset < IO_BUF_SIZE);
 
diff --git a/migration/trace-events b/migration/trace-events
index 7372ce2..ab2e1e4 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -110,6 +110,12 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
+mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
+get_postcopy_total_downtime(int num, uint64_t total) "faults %d, total downtime %" PRIu64
+split_duration_and_fill_points(int64_t downtime, uint64_t addr) "downtime %" PRId64 " addr 0x%" PRIx64
+downtime_per_cpu(int cpu_index, int downtime) "downtime cpu[%d]=%d"
+source_return_path_thread_downtime(uint64_t downtime) "downtime %" PRIu64
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
@@ -186,7 +192,7 @@ postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_quit(void) ""
-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, int pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %d"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
@@ -195,6 +201,7 @@ save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
+get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"