diff mbox

[RFC,v2,11/12] mc: introduce new capabilities to control micro-checkpointing

Message ID 1392713429-18201-12-git-send-email-mrhines@linux.vnet.ibm.com
State New
Headers show

Commit Message

mrhines@linux.vnet.ibm.com Feb. 18, 2014, 8:50 a.m. UTC
From: "Michael R. Hines" <mrhines@us.ibm.com>

New capabilities include the use of RDMA acceleration,
use of network buffering, and keepalive support, as documented
in patch #1.

Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
---
 qapi-schema.json | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

Comments

Eric Blake March 11, 2014, 9:57 p.m. UTC | #1
On 02/18/2014 01:50 AM, mrhines@linux.vnet.ibm.com wrote:
> From: "Michael R. Hines" <mrhines@us.ibm.com>
> 
> New capabilities include the use of RDMA acceleration,
> use of network buffering, and keepalive support, as documented
> in patch #1.
> 
> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
> ---
>  qapi-schema.json | 36 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 

> +#          Only for performance testing. (Since 2.x)
> +#
> +# @mc-rdma-copy: MC requires creating a local-memory checkpoint before
> +#          transmission to the destination. This requires heavy use of 
> +#          memcpy() which dominates the processor pipeline. This option 
> +#          makes use of *local* RDMA to perform the copy instead of the CPU.
> +#          Enabled by default only if the migration transport is RDMA.
> +#          Disabled by default otherwise. (Since 2.x)

How does that work?  If I query migration capabilities before requesting
a migration, what state am I going to read?  Is there coupling where I
would observe the state of this flag change merely because I did some
other action?  And if so, then how do I know that explicitly setting
this flag won't be undone by similar coupling?

It sounds like you are describing a tri-state option (unspecified so
default to migration transport, explicitly disabled, explicitly
enabled); but that doesn't work for something that only lists boolean
capabilities.  The only way around that is to have 2 separate
capabilities (one on whether to base decision on transport or to honor
override, and the other to provide the override value which is ignored
when defaulting by transport).

> +#
> +# @rdma-keepalive: RDMA connections do not timeout by themselves if a peer
> +#         has disconnected prematurely or failed. User-level keepalives
> +#         allow the migration to abort cleanly if there is a problem with the
> +#         destination host. For debugging, this can be problematic as
> +#         the keepalive may cause the peer to abort prematurely if we are
> +#         at a GDB breakpoint, for example.
> +#         Enabled by default. (Since 2.x)

Enabled-by-default is an interesting choice, but I suppose it is okay.


> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> -  'data': ['xbzrle', 'x-rdma-pin-all', 'auto-converge', 'zero-blocks'] }
> +  'data': ['xbzrle', 
> +           'rdma-pin-all', 
> +           'auto-converge', 
> +           'zero-blocks',
> +           'mc', 
> +           'mc-net-disable',
> +           'mc-rdma-copy',
> +           'rdma-keepalive'
> +          ] }
>  
>  ##
>  # @MigrationCapabilityStatus
>
Juan Quintela March 11, 2014, 10:02 p.m. UTC | #2
mrhines@linux.vnet.ibm.com wrote:
> From: "Michael R. Hines" <mrhines@us.ibm.com>
>
> New capabilities include the use of RDMA acceleration,
> use of network buffering, and keepalive support, as documented
> in patch #1.
>
> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
> ---
>  qapi-schema.json | 36 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 98abdac..1fdf208 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -720,10 +720,44 @@
>  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
>  #          to speed up convergence of RAM migration. (since 1.6)
>  #
> +# @mc: The migration will never end, and the VM will instead be continuously
> +#          micro-checkpointed (MC). Use the command migrate-set-mc-delay to 
> +#          control the frequency at which the checkpoints occur. 
> +#          Disabled by default. (Since 2.x)
> +#
> +# @mc-net-disable: Deactivate network buffering against outbound network 
> +#          traffic while Micro-Checkpointing (@mc) is active.
> +#          Enabled by default. Disabling will make the MC protocol inconsistent
> +#          and potentially break network connections upon an actual failure.
> +#          Only for performance testing. (Since 2.x)

If it is dangerous, can we put dangerous/unsafe on the name?  Having an option that
can corrupt things make me nervous.

> +#
> +# @mc-rdma-copy: MC requires creating a local-memory checkpoint before
> +#          transmission to the destination. This requires heavy use of 
> +#          memcpy() which dominates the processor pipeline. This option 
> +#          makes use of *local* RDMA to perform the copy instead of the CPU.
> +#          Enabled by default only if the migration transport is RDMA.
> +#          Disabled by default otherwise. (Since 2.x)
> +#
> +# @rdma-keepalive: RDMA connections do not timeout by themselves if a peer
> +#         has disconnected prematurely or failed. User-level keepalives
> +#         allow the migration to abort cleanly if there is a problem with the
> +#         destination host. For debugging, this can be problematic as
> +#         the keepalive may cause the peer to abort prematurely if we are
> +#         at a GDB breakpoint, for example.
> +#         Enabled by default. (Since 2.x)
> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> -  'data': ['xbzrle', 'x-rdma-pin-all', 'auto-converge', 'zero-blocks'] }
> +  'data': ['xbzrle', 
> +           'rdma-pin-all', 
> +           'auto-converge', 
> +           'zero-blocks',
> +           'mc', 
> +           'mc-net-disable',
> +           'mc-rdma-copy',
> +           'rdma-keepalive'
> +          ] }
>  
>  ##
>  # @MigrationCapabilityStatus

Thask, Juan.
Eric Blake March 11, 2014, 10:07 p.m. UTC | #3
On 03/11/2014 04:02 PM, Juan Quintela wrote:
> mrhines@linux.vnet.ibm.com wrote:
>> From: "Michael R. Hines" <mrhines@us.ibm.com>
>>

>> +# @mc-net-disable: Deactivate network buffering against outbound network 
>> +#          traffic while Micro-Checkpointing (@mc) is active.
>> +#          Enabled by default. Disabling will make the MC protocol inconsistent
>> +#          and potentially break network connections upon an actual failure.
>> +#          Only for performance testing. (Since 2.x)
> 
> If it is dangerous, can we put dangerous/unsafe on the name?  Having an option that
> can corrupt things make me nervous.

Or even name it x-mc-net-disable, so that we reserve the right to remove
it, as well as make it obvious that management must not try to tune it,
only developers.
mrhines@linux.vnet.ibm.com April 4, 2014, 3:38 a.m. UTC | #4
On 03/12/2014 05:57 AM, Eric Blake wrote:
> ---
>   qapi-schema.json | 36 +++++++++++++++++++++++++++++++++++-
>   1 file changed, 35 insertions(+), 1 deletion(-)
>
>> +#          Only for performance testing. (Since 2.x)
>> +#
>> +# @mc-rdma-copy: MC requires creating a local-memory checkpoint before
>> +#          transmission to the destination. This requires heavy use of
>> +#          memcpy() which dominates the processor pipeline. This option
>> +#          makes use of *local* RDMA to perform the copy instead of the CPU.
>> +#          Enabled by default only if the migration transport is RDMA.
>> +#          Disabled by default otherwise. (Since 2.x)
> How does that work?  If I query migration capabilities before requesting
> a migration, what state am I going to read?  Is there coupling where I
> would observe the state of this flag change merely because I did some
> other action?  And if so, then how do I know that explicitly setting
> this flag won't be undone by similar coupling?
>
> It sounds like you are describing a tri-state option (unspecified so
> default to migration transport, explicitly disabled, explicitly
> enabled); but that doesn't work for something that only lists boolean
> capabilities.  The only way around that is to have 2 separate
> capabilities (one on whether to base decision on transport or to honor
> override, and the other to provide the override value which is ignored
> when defaulting by transport).

Yes, now that I think about it, this 'tri-state' possibility is indeed
confusing to the management software. I'll stop this behavior
and instead require that it be manually enabled when needed.

>> +#
>> +# @rdma-keepalive: RDMA connections do not timeout by themselves if a peer
>> +#         has disconnected prematurely or failed. User-level keepalives
>> +#         allow the migration to abort cleanly if there is a problem with the
>> +#         destination host. For debugging, this can be problematic as
>> +#         the keepalive may cause the peer to abort prematurely if we are
>> +#         at a GDB breakpoint, for example.
>> +#         Enabled by default. (Since 2.x)
> Enabled-by-default is an interesting choice, but I suppose it is okay.

I'll rename the command to "rdma-disable-keepalive" and change
the default to "disabled".

- Michael
mrhines@linux.vnet.ibm.com April 4, 2014, 3:56 a.m. UTC | #5
On 03/12/2014 06:02 AM, Juan Quintela wrote:
> mrhines@linux.vnet.ibm.com wrote:
>> From: "Michael R. Hines" <mrhines@us.ibm.com>
>>
>> New capabilities include the use of RDMA acceleration,
>> use of network buffering, and keepalive support, as documented
>> in patch #1.
>>
>> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
>> ---
>>   qapi-schema.json | 36 +++++++++++++++++++++++++++++++++++-
>>   1 file changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 98abdac..1fdf208 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -720,10 +720,44 @@
>>   # @auto-converge: If enabled, QEMU will automatically throttle down the guest
>>   #          to speed up convergence of RAM migration. (since 1.6)
>>   #
>> +# @mc: The migration will never end, and the VM will instead be continuously
>> +#          micro-checkpointed (MC). Use the command migrate-set-mc-delay to
>> +#          control the frequency at which the checkpoints occur.
>> +#          Disabled by default. (Since 2.x)
>> +#
>> +# @mc-net-disable: Deactivate network buffering against outbound network
>> +#          traffic while Micro-Checkpointing (@mc) is active.
>> +#          Enabled by default. Disabling will make the MC protocol inconsistent
>> +#          and potentially break network connections upon an actual failure.
>> +#          Only for performance testing. (Since 2.x)
> If it is dangerous, can we put dangerous/unsafe on the name?  Having an option that
> can corrupt things make me nervous.

You got it =)

- Michael
mrhines@linux.vnet.ibm.com April 4, 2014, 3:57 a.m. UTC | #6
On 03/12/2014 06:07 AM, Eric Blake wrote:
> On 03/11/2014 04:02 PM, Juan Quintela wrote:
>> mrhines@linux.vnet.ibm.com wrote:
>>> From: "Michael R. Hines" <mrhines@us.ibm.com>
>>>
>>> +# @mc-net-disable: Deactivate network buffering against outbound network
>>> +#          traffic while Micro-Checkpointing (@mc) is active.
>>> +#          Enabled by default. Disabling will make the MC protocol inconsistent
>>> +#          and potentially break network connections upon an actual failure.
>>> +#          Only for performance testing. (Since 2.x)
>> If it is dangerous, can we put dangerous/unsafe on the name?  Having an option that
>> can corrupt things make me nervous.
> Or even name it x-mc-net-disable, so that we reserve the right to remove
> it, as well as make it obvious that management must not try to tune it,
> only developers.
>

Good idea..... will do.

- Michael
Eric Blake April 4, 2014, 4:25 a.m. UTC | #7
On 04/03/2014 09:38 PM, Michael R. Hines wrote:

>>> +# @rdma-keepalive: RDMA connections do not timeout by themselves if
>>> a peer
>>> +#         has disconnected prematurely or failed. User-level keepalives
>>> +#         allow the migration to abort cleanly if there is a problem
>>> with the
>>> +#         destination host. For debugging, this can be problematic as
>>> +#         the keepalive may cause the peer to abort prematurely if
>>> we are
>>> +#         at a GDB breakpoint, for example.
>>> +#         Enabled by default. (Since 2.x)
>> Enabled-by-default is an interesting choice, but I suppose it is okay.
> 
> I'll rename the command to "rdma-disable-keepalive" and change
> the default to "disabled".

Hopefully this doesn't lead to awkward double-negative interpretation
questions.
diff mbox

Patch

diff --git a/qapi-schema.json b/qapi-schema.json
index 98abdac..1fdf208 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -720,10 +720,44 @@ 
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @mc: The migration will never end, and the VM will instead be continuously
+#          micro-checkpointed (MC). Use the command migrate-set-mc-delay to 
+#          control the frequency at which the checkpoints occur. 
+#          Disabled by default. (Since 2.x)
+#
+# @mc-net-disable: Deactivate network buffering against outbound network 
+#          traffic while Micro-Checkpointing (@mc) is active.
+#          Enabled by default. Disabling will make the MC protocol inconsistent
+#          and potentially break network connections upon an actual failure.
+#          Only for performance testing. (Since 2.x)
+#
+# @mc-rdma-copy: MC requires creating a local-memory checkpoint before
+#          transmission to the destination. This requires heavy use of 
+#          memcpy() which dominates the processor pipeline. This option 
+#          makes use of *local* RDMA to perform the copy instead of the CPU.
+#          Enabled by default only if the migration transport is RDMA.
+#          Disabled by default otherwise. (Since 2.x)
+#
+# @rdma-keepalive: RDMA connections do not timeout by themselves if a peer
+#         has disconnected prematurely or failed. User-level keepalives
+#         allow the migration to abort cleanly if there is a problem with the
+#         destination host. For debugging, this can be problematic as
+#         the keepalive may cause the peer to abort prematurely if we are
+#         at a GDB breakpoint, for example.
+#         Enabled by default. (Since 2.x)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'x-rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 
+           'rdma-pin-all', 
+           'auto-converge', 
+           'zero-blocks',
+           'mc', 
+           'mc-net-disable',
+           'mc-rdma-copy',
+           'rdma-keepalive'
+          ] }
 
 ##
 # @MigrationCapabilityStatus