Patchwork [PULL,v4,07/11] rdma: introduce capability for chunk registration

login
register
mail settings
Submitter mrhines@linux.vnet.ibm.com
Date April 17, 2013, 11:07 p.m.
Message ID <1366240040-10730-8-git-send-email-mrhines@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/237380/
State New
Headers show

Comments

mrhines@linux.vnet.ibm.com - April 17, 2013, 11:07 p.m.
From: "Michael R. Hines" <mrhines@us.ibm.com>

This capability allows you to disable dynamic chunk registration
for better throughput on high-performance links.

It is enabled by default.

Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
---
 migration.c      |   10 ++++++++++
 qapi-schema.json |    8 +++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)
Eric Blake - April 18, 2013, 10:07 p.m.
On 04/17/2013 05:07 PM, mrhines@linux.vnet.ibm.com wrote:
> From: "Michael R. Hines" <mrhines@us.ibm.com>
> 
> This capability allows you to disable dynamic chunk registration
> for better throughput on high-performance links.
> 
> It is enabled by default.
> 
> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
> ---
>  migration.c      |   10 ++++++++++
>  qapi-schema.json |    8 +++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)

>  #
> +# @x-chunk-register-destination: (since 1.5) RDMA option which controls whether
> +#          or not the entire VM memory footprint is mlock() on demand or all at once.
> +#          Refer to docs/rdma.txt for more advice on when to take advantage option.

s/take advantage/use this/

> +#          Enabled by default, and will be renamed to 'chunk-register-destination' 
> +#          after experimental testing is complete.

I wouldn't promise a rename - after all, testing may prove that we can
settle on enough heuristics to set this appropriately without needing a
user option, even for the workloads where it makes a difference.  Thus,
I think better wording might be:

Enabled by default.  Experimental: may be renamed or removed after
further testing is complete.

Sorry for not thinking about this earlier, but typically you want a
capability bit to default to 0 - it's much easier to assume that a bit
not present behaves the same as a bit that is present and 0.  Or put
another way, a older management app that asks for all enabled
capabilities on a newer qemu has an easier time ignoring 0 bits that it
doesn't recognize (oh, some new feature I don't know about, but it isn't
on, so it can't hurt) than it does ignoring 1 bits (oh, a feature I
don't recognize, but it's enabled - will it mess up my migration?).
Since this is a bool, I would much rather can we rename the capability
to express the opposite sense, and default it to 0.  I'm not even sure
from your description here whether 'true' means 'mlock() on demand' or
'all at once', just that I'm supposed to read rdma.txt to decide if I
want to move away from the default.

/me reads patch 11 again... and wonders why the docs came last instead
of first in the series...

I guess the opposite sense could be named 'x-rdma-pin-all'; default
false means to do chunk registration and release, true means to pin all
memory up front.
mrhines@linux.vnet.ibm.com - April 19, 2013, 12:34 a.m.
On 04/18/2013 06:07 PM, Eric Blake wrote:
> On 04/17/2013 05:07 PM, mrhines@linux.vnet.ibm.com wrote:
>> From: "Michael R. Hines" <mrhines@us.ibm.com>
>>
>> This capability allows you to disable dynamic chunk registration
>> for better throughput on high-performance links.
>>
>> It is enabled by default.
>>
>> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
>> ---
>>   migration.c      |   10 ++++++++++
>>   qapi-schema.json |    8 +++++++-
>>   2 files changed, 17 insertions(+), 1 deletion(-)
>>   #
>> +# @x-chunk-register-destination: (since 1.5) RDMA option which controls whether
>> +#          or not the entire VM memory footprint is mlock() on demand or all at once.
>> +#          Refer to docs/rdma.txt for more advice on when to take advantage option.
> s/take advantage/use this/
>
>> +#          Enabled by default, and will be renamed to 'chunk-register-destination'
>> +#          after experimental testing is complete.
> I wouldn't promise a rename - after all, testing may prove that we can
> settle on enough heuristics to set this appropriately without needing a
> user option, even for the workloads where it makes a difference.  Thus,
> I think better wording might be:
>
> Enabled by default.  Experimental: may be renamed or removed after
> further testing is complete.

Acknowledged.

> Sorry for not thinking about this earlier, but typically you want a
> capability bit to default to 0 - it's much easier to assume that a bit
> not present behaves the same as a bit that is present and 0.  Or put
> another way, a older management app that asks for all enabled
> capabilities on a newer qemu has an easier time ignoring 0 bits that it
> doesn't recognize (oh, some new feature I don't know about, but it isn't
> on, so it can't hurt) than it does ignoring 1 bits (oh, a feature I
> don't recognize, but it's enabled - will it mess up my migration?).
> Since this is a bool, I would much rather can we rename the capability
> to express the opposite sense, and default it to 0.  I'm not even sure
> from your description here whether 'true' means 'mlock() on demand' or
> 'all at once', just that I'm supposed to read rdma.txt to decide if I
> want to move away from the default.
> /me reads patch 11 again... and wonders why the docs came last instead
> of first in the series...
I'll move patch 11 to the top next time. No problem.
> I guess the opposite sense could be named 'x-rdma-pin-all'; default
> false means to do chunk registration and release, true means to pin all
> memory up front.
>
Yes, I would be happy to rename the capability and default to false.

Is everybody else comfortable with that?

- Michael
Michael S. Tsirkin - April 20, 2013, 5:02 p.m.
On Thu, Apr 18, 2013 at 04:07:24PM -0600, Eric Blake wrote:
> On 04/17/2013 05:07 PM, mrhines@linux.vnet.ibm.com wrote:
> > From: "Michael R. Hines" <mrhines@us.ibm.com>
> > 
> > This capability allows you to disable dynamic chunk registration
> > for better throughput on high-performance links.
> > 
> > It is enabled by default.
> > 
> > Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
> > ---
> >  migration.c      |   10 ++++++++++
> >  qapi-schema.json |    8 +++++++-
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> >  #
> > +# @x-chunk-register-destination: (since 1.5) RDMA option which controls whether
> > +#          or not the entire VM memory footprint is mlock() on demand or all at once.
> > +#          Refer to docs/rdma.txt for more advice on when to take advantage option.
> 
> s/take advantage/use this/
> 
> > +#          Enabled by default, and will be renamed to 'chunk-register-destination' 
> > +#          after experimental testing is complete.
> 
> I wouldn't promise a rename - after all, testing may prove that we can
> settle on enough heuristics to set this appropriately without needing a
> user option, even for the workloads where it makes a difference.  Thus,
> I think better wording might be:
> 
> Enabled by default.  Experimental: may be renamed or removed after
> further testing is complete.
> 
> Sorry for not thinking about this earlier, but typically you want a
> capability bit to default to 0 - it's much easier to assume that a bit
> not present behaves the same as a bit that is present and 0.  Or put
> another way, a older management app that asks for all enabled
> capabilities on a newer qemu has an easier time ignoring 0 bits that it
> doesn't recognize (oh, some new feature I don't know about, but it isn't
> on, so it can't hurt) than it does ignoring 1 bits (oh, a feature I
> don't recognize, but it's enabled - will it mess up my migration?).
> Since this is a bool, I would much rather can we rename the capability
> to express the opposite sense, and default it to 0.  I'm not even sure
> from your description here whether 'true' means 'mlock() on demand' or
> 'all at once', just that I'm supposed to read rdma.txt to decide if I
> want to move away from the default.
> 
> /me reads patch 11 again... and wonders why the docs came last instead
> of first in the series...
> 
> I guess the opposite sense could be named 'x-rdma-pin-all'; default
> false means to do chunk registration and release,

chunk release only happens after migration is complete unfortunately.
This means that eventually all initialized memory is pinned, regardless
of the setting (this is fixable but there's no plan to fix this, at this
point). So pin-all might be misleading to some.

I agree 'chunk' is unnecessarily low level though.
The only difference ATM is pinning of uninitialized memory so I think a
better name would be 'x-rdma-pin-uninitialized' or some such.

> true means to pin all
> memory up front.
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Paolo Bonzini - April 21, 2013, 1:19 p.m.
Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
>> > I guess the opposite sense could be named 'x-rdma-pin-all'; default
>> > false means to do chunk registration and release,
> chunk release only happens after migration is complete unfortunately.
> This means that eventually all initialized memory is pinned, regardless
> of the setting (this is fixable but there's no plan to fix this, at this
> point). So pin-all might be misleading to some.
> 
> I agree 'chunk' is unnecessarily low level though.
> The only difference ATM is pinning of uninitialized memory so I think a
> better name would be 'x-rdma-pin-uninitialized' or some such.
> 

x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
low level.

Since this series is likely to miss 1.5 at this point, we could
implement the unregistration part of the protocol in the destination.
This way, any heuristic we add to the source will not break backwards
compatibility.

Paolo
Michael S. Tsirkin - April 21, 2013, 2:17 p.m.
On Sun, Apr 21, 2013 at 03:19:21PM +0200, Paolo Bonzini wrote:
> Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
> >> > I guess the opposite sense could be named 'x-rdma-pin-all'; default
> >> > false means to do chunk registration and release,
> > chunk release only happens after migration is complete unfortunately.
> > This means that eventually all initialized memory is pinned, regardless
> > of the setting (this is fixable but there's no plan to fix this, at this
> > point). So pin-all might be misleading to some.
> > 
> > I agree 'chunk' is unnecessarily low level though.
> > The only difference ATM is pinning of uninitialized memory so I think a
> > better name would be 'x-rdma-pin-uninitialized' or some such.
> > 
> 
> x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
> low level.
> 
> Since this series is likely to miss 1.5 at this point, we could
> implement the unregistration part of the protocol in the destination.
> This way, any heuristic we add to the source will not break backwards
> compatibility.
> 
> Paolo

To test, you'll have to implement it in the source too.
That's probably a good idea anyway, though doing this
efficiently might need more thought, and some of
the tricks I described earlier (pipelining,
registration cache) might be needed.
Though I'm curious what the performance impact would be
even without these tricks.
mrhines@linux.vnet.ibm.com - April 21, 2013, 4:05 p.m.
On 04/21/2013 09:19 AM, Paolo Bonzini wrote:
> Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
>>>> I guess the opposite sense could be named 'x-rdma-pin-all'; default
>>>> false means to do chunk registration and release,
>> chunk release only happens after migration is complete unfortunately.
>> This means that eventually all initialized memory is pinned, regardless
>> of the setting (this is fixable but there's no plan to fix this, at this
>> point). So pin-all might be misleading to some.
>>
>> I agree 'chunk' is unnecessarily low level though.
>> The only difference ATM is pinning of uninitialized memory so I think a
>> better name would be 'x-rdma-pin-uninitialized' or some such.
>>
> x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
> low level.
>
> Since this series is likely to miss 1.5 at this point, we could
> implement the unregistration part of the protocol in the destination.
> This way, any heuristic we add to the source will not break backwards
> compatibility.
>
> Paolo
>

The release cycles are relatively fast, according to the website, so
I don't have any problem with missing 1.5 to make sure that everybody
is happy and have had a chance to test the software.

But: Let me repeat: we have already discussed in previous emails that
ibv_reg_mr() => error + notify source + aborted migration would
be an adequate solution for merging.

Also: let me repeat: We have no intention nor plans (at least not
from IBM research) to promise to develop nor even explore the
effects of unregistration in the RDMA protocol as we have zero data
to show that it does not adversely affect the performance of the solution.

Unless someone puts in the man-hours to show hard data (even with
micro-benchmark) that migration throughput and migration latency
performance and migration convergence are not unaffected by such a
change in the protocol, then such a change would have to be implemented
as a patch by another member of the community and an option be clearly made
in the QEMU monitor so that it could be disabled if the user chose to do so.

- Michael
mrhines@linux.vnet.ibm.com - April 21, 2013, 4:06 p.m.
On 04/21/2013 09:19 AM, Paolo Bonzini wrote:
> Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
>>>> I guess the opposite sense could be named 'x-rdma-pin-all'; default
>>>> false means to do chunk registration and release,
>> chunk release only happens after migration is complete unfortunately.
>> This means that eventually all initialized memory is pinned, regardless
>> of the setting (this is fixable but there's no plan to fix this, at this
>> point). So pin-all might be misleading to some.
>>
>> I agree 'chunk' is unnecessarily low level though.
>> The only difference ATM is pinning of uninitialized memory so I think a
>> better name would be 'x-rdma-pin-uninitialized' or some such.
>>
> x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
> low level.
>
> Since this series is likely to miss 1.5 at this point, we could
> implement the unregistration part of the protocol in the destination.
> This way, any heuristic we add to the source will not break backwards
> compatibility.
>
> Paolo
>

OK, I will rename the capability....

- Michael
mrhines@linux.vnet.ibm.com - April 21, 2013, 5:19 p.m.
On 04/21/2013 10:17 AM, Michael S. Tsirkin wrote:
> On Sun, Apr 21, 2013 at 03:19:21PM +0200, Paolo Bonzini wrote:
>> Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
>>>>> I guess the opposite sense could be named 'x-rdma-pin-all'; default
>>>>> false means to do chunk registration and release,
>>> chunk release only happens after migration is complete unfortunately.
>>> This means that eventually all initialized memory is pinned, regardless
>>> of the setting (this is fixable but there's no plan to fix this, at this
>>> point). So pin-all might be misleading to some.
>>>
>>> I agree 'chunk' is unnecessarily low level though.
>>> The only difference ATM is pinning of uninitialized memory so I think a
>>> better name would be 'x-rdma-pin-uninitialized' or some such.
>>>
>> x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
>> low level.
>>
>> Since this series is likely to miss 1.5 at this point, we could
>> implement the unregistration part of the protocol in the destination.
>> This way, any heuristic we add to the source will not break backwards
>> compatibility.
>>
>> Paolo
> To test, you'll have to implement it in the source too.
> That's probably a good idea anyway, though doing this
> efficiently might need more thought, and some of
> the tricks I described earlier (pipelining,
> registration cache) might be needed.
> Though I'm curious what the performance impact would be
> even without these tricks.
>

We already had a agreement to merge with ulimit -l + ibv_reg_mr() + 
ERROR + abort migration.

We (IBM Research) will not commit to implementing this unless someone
provides hard data showing it not to adversely effect migration performance.

- Michael
Michael S. Tsirkin - April 21, 2013, 6:59 p.m.
On Sun, Apr 21, 2013 at 12:05:08PM -0400, Michael R. Hines wrote:
> On 04/21/2013 09:19 AM, Paolo Bonzini wrote:
> >Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
> >>>>I guess the opposite sense could be named 'x-rdma-pin-all'; default
> >>>>false means to do chunk registration and release,
> >>chunk release only happens after migration is complete unfortunately.
> >>This means that eventually all initialized memory is pinned, regardless
> >>of the setting (this is fixable but there's no plan to fix this, at this
> >>point). So pin-all might be misleading to some.
> >>
> >>I agree 'chunk' is unnecessarily low level though.
> >>The only difference ATM is pinning of uninitialized memory so I think a
> >>better name would be 'x-rdma-pin-uninitialized' or some such.
> >>
> >x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
> >low level.
> >
> >Since this series is likely to miss 1.5 at this point, we could
> >implement the unregistration part of the protocol in the destination.
> >This way, any heuristic we add to the source will not break backwards
> >compatibility.
> >
> >Paolo
> >
> 
> The release cycles are relatively fast, according to the website, so
> I don't have any problem with missing 1.5 to make sure that everybody
> is happy and have had a chance to test the software.
>
> But: Let me repeat: we have already discussed in previous emails that
> ibv_reg_mr() => error + notify source + aborted migration would
> be an adequate solution for merging.
>
> Also: let me repeat: We have no intention nor plans (at least not
> from IBM research) to promise to develop nor even explore the
> effects of unregistration in the RDMA protocol as we have zero data
> to show that it does not adversely affect the performance of the solution.
>
> Unless someone puts in the man-hours to show hard data (even with
> micro-benchmark) that migration throughput and migration latency
> performance and migration convergence are not unaffected by such a
> change in the protocol, then such a change would have to be implemented
> as a patch by another member of the community and an option be clearly made
> in the QEMU monitor so that it could be disabled if the user chose to do so.
> 
> - Michael

My interest was in the protocol used, so as long as you don't
intend to enhance the protocol any further, please drop me from the
Cc list on future version of these patches.

I don't take any position on whether your patches should be merged
in their current form.
Michael S. Tsirkin - April 21, 2013, 7:13 p.m.
On Sun, Apr 21, 2013 at 01:19:17PM -0400, Michael R. Hines wrote:
> On 04/21/2013 10:17 AM, Michael S. Tsirkin wrote:
> >On Sun, Apr 21, 2013 at 03:19:21PM +0200, Paolo Bonzini wrote:
> >>Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
> >>>>>I guess the opposite sense could be named 'x-rdma-pin-all'; default
> >>>>>false means to do chunk registration and release,
> >>>chunk release only happens after migration is complete unfortunately.
> >>>This means that eventually all initialized memory is pinned, regardless
> >>>of the setting (this is fixable but there's no plan to fix this, at this
> >>>point). So pin-all might be misleading to some.
> >>>
> >>>I agree 'chunk' is unnecessarily low level though.
> >>>The only difference ATM is pinning of uninitialized memory so I think a
> >>>better name would be 'x-rdma-pin-uninitialized' or some such.
> >>>
> >>x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
> >>low level.
> >>
> >>Since this series is likely to miss 1.5 at this point, we could
> >>implement the unregistration part of the protocol in the destination.
> >>This way, any heuristic we add to the source will not break backwards
> >>compatibility.
> >>
> >>Paolo
> >To test, you'll have to implement it in the source too.
> >That's probably a good idea anyway, though doing this
> >efficiently might need more thought, and some of
> >the tricks I described earlier (pipelining,
> >registration cache) might be needed.
> >Though I'm curious what the performance impact would be
> >even without these tricks.
> >
> 
> We already had a agreement to merge with ulimit -l + ibv_reg_mr() +
> ERROR + abort migration.

Not sure who is 'we' here, but for the record, I did not agree to merge
these patches, and I'm the wrong person to ask to merge them.
Juan is probably the right person to ask about merging them.

> We (IBM Research) will not commit to implementing this unless someone
> provides hard data showing it not to adversely effect migration performance.
> 
> - Michael
mrhines@linux.vnet.ibm.com - April 21, 2013, 7:55 p.m.
On 04/21/2013 02:59 PM, Michael S. Tsirkin wrote:
> On Sun, Apr 21, 2013 at 12:05:08PM -0400, Michael R. Hines wrote:
>> On 04/21/2013 09:19 AM, Paolo Bonzini wrote:
>>> Il 20/04/2013 19:02, Michael S. Tsirkin ha scritto:
>>>>>> I guess the opposite sense could be named 'x-rdma-pin-all'; default
>>>>>> false means to do chunk registration and release,
>>>> chunk release only happens after migration is complete unfortunately.
>>>> This means that eventually all initialized memory is pinned, regardless
>>>> of the setting (this is fixable but there's no plan to fix this, at this
>>>> point). So pin-all might be misleading to some.
>>>>
>>>> I agree 'chunk' is unnecessarily low level though.
>>>> The only difference ATM is pinning of uninitialized memory so I think a
>>>> better name would be 'x-rdma-pin-uninitialized' or some such.
>>>>
>>> x-rdma-pin-all is a better choice.  x-rdma-pin-uninitialized is also too
>>> low level.
>>>
>>> Since this series is likely to miss 1.5 at this point, we could
>>> implement the unregistration part of the protocol in the destination.
>>> This way, any heuristic we add to the source will not break backwards
>>> compatibility.
>>>
>>> Paolo
>>>
>> The release cycles are relatively fast, according to the website, so
>> I don't have any problem with missing 1.5 to make sure that everybody
>> is happy and have had a chance to test the software.
>>
>> But: Let me repeat: we have already discussed in previous emails that
>> ibv_reg_mr() => error + notify source + aborted migration would
>> be an adequate solution for merging.
>>
>> Also: let me repeat: We have no intention nor plans (at least not
>> from IBM research) to promise to develop nor even explore the
>> effects of unregistration in the RDMA protocol as we have zero data
>> to show that it does not adversely affect the performance of the solution.
>>
>> Unless someone puts in the man-hours to show hard data (even with
>> micro-benchmark) that migration throughput and migration latency
>> performance and migration convergence are not unaffected by such a
>> change in the protocol, then such a change would have to be implemented
>> as a patch by another member of the community and an option be clearly made
>> in the QEMU monitor so that it could be disabled if the user chose to do so.
>>
>> - Michael
> My interest was in the protocol used, so as long as you don't
> intend to enhance the protocol any further, please drop me from the
> Cc list on future version of these patches.
>
> I don't take any position on whether your patches should be merged
> in their current form.
>

Acknowledged.

Patch

diff --git a/migration.c b/migration.c
index 3b4b467..5afd9b8 100644
--- a/migration.c
+++ b/migration.c
@@ -66,6 +66,7 @@  MigrationState *migrate_get_current(void)
         .state = MIG_STATE_SETUP,
         .bandwidth_limit = MAX_THROTTLE,
         .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
+        .enabled_capabilities[MIGRATION_CAPABILITY_X_CHUNK_REGISTER_DESTINATION] = true,
     };
 
     return &current_migration;
@@ -474,6 +475,15 @@  void qmp_migrate_set_downtime(double value, Error **errp)
     max_downtime = (uint64_t)value;
 }
 
+bool migrate_chunk_register_destination(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_CHUNK_REGISTER_DESTINATION];
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 751d3c2..297707e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -599,10 +599,16 @@ 
 #          This feature allows us to minimize migration traffic for certain work
 #          loads, by sending compressed difference of the pages
 #
+# @x-chunk-register-destination: (since 1.5) RDMA option which controls whether
+#          or not the entire VM memory footprint is mlock() on demand or all at once.
+#          Refer to docs/rdma.txt for more advice on when to take advantage option.
+#          Enabled by default, and will be renamed to 'chunk-register-destination' 
+#          after experimental testing is complete.
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle'] }
+  'data': ['xbzrle', 'x-chunk-register-destination'] }
 
 ##
 # @MigrationCapabilityStatus