diff mbox series

[1/1] migration: calculate expected_downtime considering redirtied ram

Message ID 20190122150543.16889-2-bala24@linux.vnet.ibm.com
State New
Headers show
Series migration: calculate expected_downtime considering redirtied ram | expand

Commit Message

Balamuruhan S Jan. 22, 2019, 3:05 p.m. UTC
From: Balamuruhan S <bala24@linux.vnet.ibm.com>

currently we calculate expected_downtime by time taken to transfer
remaining ram, but during the time we had transferred remaining ram
few pages of ram might be redirtied and we need to retransfer it,
so it is better to consider them for calculating expected_downtime
for getting more accurate values.

Total ram to be transferred = remaining ram + (redirtied ram at the
                                               time when the remaining
                                               ram gets transferred)

redirtied ram = dirty_pages_rate * time taken to transfer remaining ram

redirtied ram = dirty_pages_rate * (remaining ram / bandwidth)

expected_downtime = (remaining ram + redirtied ram) / bandwidth

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
---
 migration/migration.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Juan Quintela Jan. 23, 2019, 4:35 p.m. UTC | #1
bala24@linux.vnet.ibm.com wrote:
> From: Balamuruhan S <bala24@linux.vnet.ibm.com>
>
> currently we calculate expected_downtime by time taken to transfer
> remaining ram, but during the time we had transferred remaining ram
> few pages of ram might be redirtied and we need to retransfer it,
> so it is better to consider them for calculating expected_downtime
> for getting more accurate values.
>
> Total ram to be transferred = remaining ram + (redirtied ram at the
>                                                time when the remaining
>                                                ram gets transferred)
>
> redirtied ram = dirty_pages_rate * time taken to transfer remaining ram
>
> redirtied ram = dirty_pages_rate * (remaining ram / bandwidth)
>
> expected_downtime = (remaining ram + redirtied ram) / bandwidth
>
> Suggested-by: David Gibson <david@gibson.dropbear.id.au>
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> ---
>  migration/migration.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index ffc4d9e556..dc38e9a380 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s,
>       * recalculate. 10000 is a small enough number for our purposes
>       */
>      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> -        s->expected_downtime = ram_counters.remaining / bandwidth;
> +        /* Time required to transfer remaining ram */
> +        remaining_ram_transfer_time = ram_counters.remaining / bandwidth

missing semicolon

> +
> +        /* redirty of ram at the time remaining ram gets transferred*/
> +        newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time

the same.

Declaration of the new variables is also missing.

> +        s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth;
>      }
>  
>      qemu_file_reset_rate_limit(s->to_dst_file);

About the numbers, I am not against it.  It is an heuristic.  Without
numbers (and it is very load dependent) it is not clear that this one is
going to be much worse/better than previous one (this should be a bit
better, though).

Thanks, Juan.
Peter Xu Jan. 24, 2019, 7:18 a.m. UTC | #2
On Wed, Jan 23, 2019 at 05:35:03PM +0100, Juan Quintela wrote:
> bala24@linux.vnet.ibm.com wrote:
> > From: Balamuruhan S <bala24@linux.vnet.ibm.com>
> >
> > currently we calculate expected_downtime by time taken to transfer
> > remaining ram, but during the time we had transferred remaining ram
> > few pages of ram might be redirtied and we need to retransfer it,
> > so it is better to consider them for calculating expected_downtime
> > for getting more accurate values.
> >
> > Total ram to be transferred = remaining ram + (redirtied ram at the
> >                                                time when the remaining
> >                                                ram gets transferred)
> >
> > redirtied ram = dirty_pages_rate * time taken to transfer remaining ram
> >
> > redirtied ram = dirty_pages_rate * (remaining ram / bandwidth)
> >
> > expected_downtime = (remaining ram + redirtied ram) / bandwidth
> >
> > Suggested-by: David Gibson <david@gibson.dropbear.id.au>
> > Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > ---
> >  migration/migration.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index ffc4d9e556..dc38e9a380 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s,
> >       * recalculate. 10000 is a small enough number for our purposes
> >       */
> >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > -        s->expected_downtime = ram_counters.remaining / bandwidth;
> > +        /* Time required to transfer remaining ram */
> > +        remaining_ram_transfer_time = ram_counters.remaining / bandwidth
> 
> missing semicolon
> 
> > +
> > +        /* redirty of ram at the time remaining ram gets transferred*/
> > +        newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time
> 
> the same.
> 
> Declaration of the new variables is also missing.
> 
> > +        s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth;
> >      }
> >  
> >      qemu_file_reset_rate_limit(s->to_dst_file);
> 
> About the numbers, I am not against it.  It is an heuristic.  Without
> numbers (and it is very load dependent) it is not clear that this one is
> going to be much worse/better than previous one (this should be a bit
> better, though).

Actually I have had a question on how expected_downtime is defined and
how it will be used by users.

My understanding is that the expected_downtime is defined as: how long
time the guest will be down if we stop the VM right now and migrate
all the rest of pages.

This definition makes sense in that it helps the customer to
dynamically decide whether it's a good point to go into the last phase
of migration.  Currently we should be able to achieve that by setting
a very high target downtime.

And if that definition is the thing we want, the current calculation
seems exactly the number we want, since if we stop the VM right now
then there won't be any more data to be dirtied as well.

Regards,
Balamuruhan S Jan. 30, 2019, 9:16 a.m. UTC | #3
On Thu, Jan 24, 2019 at 03:18:22PM +0800, Peter Xu wrote:
> On Wed, Jan 23, 2019 at 05:35:03PM +0100, Juan Quintela wrote:
> > bala24@linux.vnet.ibm.com wrote:
> > > From: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > >
> > > currently we calculate expected_downtime by time taken to transfer
> > > remaining ram, but during the time we had transferred remaining ram
> > > few pages of ram might be redirtied and we need to retransfer it,
> > > so it is better to consider them for calculating expected_downtime
> > > for getting more accurate values.
> > >
> > > Total ram to be transferred = remaining ram + (redirtied ram at the
> > >                                                time when the remaining
> > >                                                ram gets transferred)
> > >
> > > redirtied ram = dirty_pages_rate * time taken to transfer remaining ram
> > >
> > > redirtied ram = dirty_pages_rate * (remaining ram / bandwidth)
> > >
> > > expected_downtime = (remaining ram + redirtied ram) / bandwidth
> > >
> > > Suggested-by: David Gibson <david@gibson.dropbear.id.au>
> > > Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > ---
> > >  migration/migration.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index ffc4d9e556..dc38e9a380 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s,
> > >       * recalculate. 10000 is a small enough number for our purposes
> > >       */
> > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > -        s->expected_downtime = ram_counters.remaining / bandwidth;
> > > +        /* Time required to transfer remaining ram */
> > > +        remaining_ram_transfer_time = ram_counters.remaining / bandwidth
> > 
> > missing semicolon
> > 
> > > +
> > > +        /* redirty of ram at the time remaining ram gets transferred*/
> > > +        newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time
> > 
> > the same.
> > 
> > Declaration of the new variables is also missing.
> > 
> > > +        s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth;
> > >      }
> > >  
> > >      qemu_file_reset_rate_limit(s->to_dst_file);
> > 
> > About the numbers, I am not against it.  It is an heuristic.  Without
> > numbers (and it is very load dependent) it is not clear that this one is
> > going to be much worse/better than previous one (this should be a bit
> > better, though).
> 
> Actually I have had a question on how expected_downtime is defined and
> how it will be used by users.
> 
> My understanding is that the expected_downtime is defined as: how long
> time the guest will be down if we stop the VM right now and migrate
> all the rest of pages.
> 
> This definition makes sense in that it helps the customer to
> dynamically decide whether it's a good point to go into the last phase
> of migration.  Currently we should be able to achieve that by setting
> a very high target downtime.
> 
> And if that definition is the thing we want, the current calculation
> seems exactly the number we want, since if we stop the VM right now
> then there won't be any more data to be dirtied as well.

Thank you Peter, I thought about your definition and it makes sense
with your definition that existing calculation is appropriate and
correct.

-- Bala
> 
> Regards,
> 
> -- 
> Peter Xu
>
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index ffc4d9e556..dc38e9a380 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2903,7 +2903,13 @@  static void migration_update_counters(MigrationState *s,
      * recalculate. 10000 is a small enough number for our purposes
      */
     if (ram_counters.dirty_pages_rate && transferred > 10000) {
-        s->expected_downtime = ram_counters.remaining / bandwidth;
+        /* Time required to transfer remaining ram */
+        remaining_ram_transfer_time = ram_counters.remaining / bandwidth
+
+        /* redirty of ram at the time remaining ram gets transferred*/
+        newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time
+
+        s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth;
     }
 
     qemu_file_reset_rate_limit(s->to_dst_file);