diff mbox

RDMA: Reduce restriction on block length match

Message ID 1436365567-27164-1-git-send-email-dgilbert@redhat.com
State New
Headers show

Commit Message

Dr. David Alan Gilbert July 8, 2015, 2:26 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

My e4d633207 patch has an over zealous sanity check that checked
the lengths of the RAM Blocks on source/destination were the same.  This
isn't true because of the 'used_length' trick for RAM blocks like the
ACPI table that vary in size.

Prior to that patch RDMA would also fail in this case, but it should
now work with the changes in the set e4d633207 is in.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Fixes: e4d633207c129dc5b7d145240ac4a1997ef3902f
---
 migration/rdma.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Juan Quintela July 8, 2015, 2:36 p.m. UTC | #1
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> My e4d633207 patch has an over zealous sanity check that checked
> the lengths of the RAM Blocks on source/destination were the same.  This
> isn't true because of the 'used_length' trick for RAM blocks like the
> ACPI table that vary in size.
>
> Prior to that patch RDMA would also fail in this case, but it should
> now work with the changes in the set e4d633207 is in.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Fixes: e4d633207c129dc5b7d145240ac4a1997ef3902f
> ---
>  migration/rdma.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index f106b2a..1d094b0 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3338,14 +3338,15 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>          for (i = 0; i < nb_dest_blocks; i++) {
>              network_to_dest_block(&rdma->dest_blocks[i]);
>  
> -            /* We require that the blocks are in the same order */
> +            /* We require that the blocks are in the same order,
> +             * but the used_length trick for acpi blocks means that
> +             * the destination can validly be larger than the source
> +             */
>              if (rdma->dest_blocks[i].length != local->block[i].length) {

Should we change the check to be that destination is bigger or equal
than source?

With your change, we only remove the check?

Thanks, Juan.


> -                ERROR(errp, "Block %s/%d has a different length %" PRIu64
> -                            "vs %" PRIu64, local->block[i].block_name, i,
> -                            local->block[i].length,
> +                fprintf(stderr, "INFO: Block %s/%d has a different length %"
> +                            PRIu64 "vs %" PRIu64, local->block[i].block_name,
> +                            i, local->block[i].length,
>                              rdma->dest_blocks[i].length);
> -                rdma->error_state = -EINVAL;
> -                return -EINVAL;
>              }
>              local->block[i].remote_host_addr =
>                      rdma->dest_blocks[i].remote_host_addr;
Dr. David Alan Gilbert July 8, 2015, 4:55 p.m. UTC | #2
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > My e4d633207 patch has an over zealous sanity check that checked
> > the lengths of the RAM Blocks on source/destination were the same.  This
> > isn't true because of the 'used_length' trick for RAM blocks like the
> > ACPI table that vary in size.
> >
> > Prior to that patch RDMA would also fail in this case, but it should
> > now work with the changes in the set e4d633207 is in.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >
> > Fixes: e4d633207c129dc5b7d145240ac4a1997ef3902f
> > ---
> >  migration/rdma.c | 13 +++++++------
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index f106b2a..1d094b0 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -3338,14 +3338,15 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >          for (i = 0; i < nb_dest_blocks; i++) {
> >              network_to_dest_block(&rdma->dest_blocks[i]);
> >  
> > -            /* We require that the blocks are in the same order */
> > +            /* We require that the blocks are in the same order,
> > +             * but the used_length trick for acpi blocks means that
> > +             * the destination can validly be larger than the source
> > +             */
> >              if (rdma->dest_blocks[i].length != local->block[i].length) {
> 
> Should we change the check to be that destination is bigger or equal
> than source?
> 
> With your change, we only remove the check?

I'm actually going to drop this change; so keep the error if they're
different.

My argument works like this (I've not yet found a good way to test it):

   1) The source sends to the destination a list of RAM blocks in the qemu-file stream
   2) The destination performs a resize on the RAM blocks to match the source
       so at this point the destination's block sizes should match.
   3) The source sends a series of RDMA block registration requests for the RAM
   4) The destination sends a list of RAM registrations back to the source
   5) This check is checking that this destination list matches the local list
   6) As long as (4) happens after (2) then the size that the destination sees
      should always match the source.
   7) I think 4 is after 2 due to a qemu_fflush

So keeping this check guards against 7 not really being true and/or
the destination populating it's list of blocks prior to (2) - which I have
a sneaky feeling might be happening, but am not sure yet.

Dave

> 
> Thanks, Juan.
> 
> 
> > -                ERROR(errp, "Block %s/%d has a different length %" PRIu64
> > -                            "vs %" PRIu64, local->block[i].block_name, i,
> > -                            local->block[i].length,
> > +                fprintf(stderr, "INFO: Block %s/%d has a different length %"
> > +                            PRIu64 "vs %" PRIu64, local->block[i].block_name,
> > +                            i, local->block[i].length,
> >                              rdma->dest_blocks[i].length);
> > -                rdma->error_state = -EINVAL;
> > -                return -EINVAL;
> >              }
> >              local->block[i].remote_host_addr =
> >                      rdma->dest_blocks[i].remote_host_addr;
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Juan Quintela July 8, 2015, 7:06 p.m. UTC | #3
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > My e4d633207 patch has an over zealous sanity check that checked
>> > the lengths of the RAM Blocks on source/destination were the same.  This
>> > isn't true because of the 'used_length' trick for RAM blocks like the
>> > ACPI table that vary in size.
>> >
>> > Prior to that patch RDMA would also fail in this case, but it should
>> > now work with the changes in the set e4d633207 is in.
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> >
>> > Fixes: e4d633207c129dc5b7d145240ac4a1997ef3902f
>> > ---
>> >  migration/rdma.c | 13 +++++++------
>> >  1 file changed, 7 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/migration/rdma.c b/migration/rdma.c
>> > index f106b2a..1d094b0 100644
>> > --- a/migration/rdma.c
>> > +++ b/migration/rdma.c
>> > @@ -3338,14 +3338,15 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>> >          for (i = 0; i < nb_dest_blocks; i++) {
>> >              network_to_dest_block(&rdma->dest_blocks[i]);
>> >  
>> > -            /* We require that the blocks are in the same order */
>> > +            /* We require that the blocks are in the same order,
>> > +             * but the used_length trick for acpi blocks means that
>> > +             * the destination can validly be larger than the source
>> > +             */
>> >              if (rdma->dest_blocks[i].length != local->block[i].length) {
>> 
>> Should we change the check to be that destination is bigger or equal
>> than source?
>> 
>> With your change, we only remove the check?
>
> I'm actually going to drop this change; so keep the error if they're
> different.
>
> My argument works like this (I've not yet found a good way to test it):
>
>    1) The source sends to the destination a list of RAM blocks in the qemu-file stream
>    2) The destination performs a resize on the RAM blocks to match the source
>        so at this point the destination's block sizes should match.

Humm, I *thought* that what the resize does is getting it bigger, but if
destination is bigger, it does nothing, no?


>    3) The source sends a series of RDMA block registration requests for the RAM
>    4) The destination sends a list of RAM registrations back to the source
>    5) This check is checking that this destination list matches the local list
>    6) As long as (4) happens after (2) then the size that the destination sees
>       should always match the source.
>    7) I think 4 is after 2 due to a qemu_fflush
>
> So keeping this check guards against 7 not really being true and/or
> the destination populating it's list of blocks prior to (2) - which I have
> a sneaky feeling might be happening, but am not sure yet.



>
> Dave
>
>> 
>> Thanks, Juan.
>> 
>> 
>> > -                ERROR(errp, "Block %s/%d has a different length %" PRIu64
>> > -                            "vs %" PRIu64, local->block[i].block_name, i,
>> > -                            local->block[i].length,
>> > +                fprintf(stderr, "INFO: Block %s/%d has a different length %"
>> > +                            PRIu64 "vs %" PRIu64, local->block[i].block_name,
>> > +                            i, local->block[i].length,
>> >                              rdma->dest_blocks[i].length);
>> > -                rdma->error_state = -EINVAL;
>> > -                return -EINVAL;
>> >              }
>> >              local->block[i].remote_host_addr =
>> >                      rdma->dest_blocks[i].remote_host_addr;
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert July 9, 2015, 8:08 a.m. UTC | #4
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> >> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> >> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> >
> >> > My e4d633207 patch has an over zealous sanity check that checked
> >> > the lengths of the RAM Blocks on source/destination were the same.  This
> >> > isn't true because of the 'used_length' trick for RAM blocks like the
> >> > ACPI table that vary in size.
> >> >
> >> > Prior to that patch RDMA would also fail in this case, but it should
> >> > now work with the changes in the set e4d633207 is in.
> >> >
> >> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> >
> >> > Fixes: e4d633207c129dc5b7d145240ac4a1997ef3902f
> >> > ---
> >> >  migration/rdma.c | 13 +++++++------
> >> >  1 file changed, 7 insertions(+), 6 deletions(-)
> >> >
> >> > diff --git a/migration/rdma.c b/migration/rdma.c
> >> > index f106b2a..1d094b0 100644
> >> > --- a/migration/rdma.c
> >> > +++ b/migration/rdma.c
> >> > @@ -3338,14 +3338,15 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >> >          for (i = 0; i < nb_dest_blocks; i++) {
> >> >              network_to_dest_block(&rdma->dest_blocks[i]);
> >> >  
> >> > -            /* We require that the blocks are in the same order */
> >> > +            /* We require that the blocks are in the same order,
> >> > +             * but the used_length trick for acpi blocks means that
> >> > +             * the destination can validly be larger than the source
> >> > +             */
> >> >              if (rdma->dest_blocks[i].length != local->block[i].length) {
> >> 
> >> Should we change the check to be that destination is bigger or equal
> >> than source?
> >> 
> >> With your change, we only remove the check?
> >
> > I'm actually going to drop this change; so keep the error if they're
> > different.
> >
> > My argument works like this (I've not yet found a good way to test it):
> >
> >    1) The source sends to the destination a list of RAM blocks in the qemu-file stream
> >    2) The destination performs a resize on the RAM blocks to match the source
> >        so at this point the destination's block sizes should match.
> 
> Humm, I *thought* that what the resize does is getting it bigger, but if
> destination is bigger, it does nothing, no?

The code in migration/ram.c calls qemu_ram_resize  for length != block->used_length
and similalry qemu_ram_resize always seems to set block->used_length.
I'm going to have to have more of a dig into this and figure out what's going on.

Dave

> 
> 
> >    3) The source sends a series of RDMA block registration requests for the RAM
> >    4) The destination sends a list of RAM registrations back to the source
> >    5) This check is checking that this destination list matches the local list
> >    6) As long as (4) happens after (2) then the size that the destination sees
> >       should always match the source.
> >    7) I think 4 is after 2 due to a qemu_fflush
> >
> > So keeping this check guards against 7 not really being true and/or
> > the destination populating it's list of blocks prior to (2) - which I have
> > a sneaky feeling might be happening, but am not sure yet.
> 
> 
> 
> >
> > Dave
> >
> >> 
> >> Thanks, Juan.
> >> 
> >> 
> >> > -                ERROR(errp, "Block %s/%d has a different length %" PRIu64
> >> > -                            "vs %" PRIu64, local->block[i].block_name, i,
> >> > -                            local->block[i].length,
> >> > +                fprintf(stderr, "INFO: Block %s/%d has a different length %"
> >> > +                            PRIu64 "vs %" PRIu64, local->block[i].block_name,
> >> > +                            i, local->block[i].length,
> >> >                              rdma->dest_blocks[i].length);
> >> > -                rdma->error_state = -EINVAL;
> >> > -                return -EINVAL;
> >> >              }
> >> >              local->block[i].remote_host_addr =
> >> >                      rdma->dest_blocks[i].remote_host_addr;
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/migration/rdma.c b/migration/rdma.c
index f106b2a..1d094b0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3338,14 +3338,15 @@  static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         for (i = 0; i < nb_dest_blocks; i++) {
             network_to_dest_block(&rdma->dest_blocks[i]);
 
-            /* We require that the blocks are in the same order */
+            /* We require that the blocks are in the same order,
+             * but the used_length trick for acpi blocks means that
+             * the destination can validly be larger than the source
+             */
             if (rdma->dest_blocks[i].length != local->block[i].length) {
-                ERROR(errp, "Block %s/%d has a different length %" PRIu64
-                            "vs %" PRIu64, local->block[i].block_name, i,
-                            local->block[i].length,
+                fprintf(stderr, "INFO: Block %s/%d has a different length %"
+                            PRIu64 "vs %" PRIu64, local->block[i].block_name,
+                            i, local->block[i].length,
                             rdma->dest_blocks[i].length);
-                rdma->error_state = -EINVAL;
-                return -EINVAL;
             }
             local->block[i].remote_host_addr =
                     rdma->dest_blocks[i].remote_host_addr;