diff mbox

Fix off-by-1 error in RAM migration code

Message ID 1351654988-13165-1-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson Oct. 31, 2012, 3:43 a.m. UTC
The code for migrating (or savevm-ing) memory pages starts off by creating
a dirty bitmap and filling it with 1s.  Except, actually, because bit
addresses are 0-based it fills every bit except bit 0 with 1s and puts an
extra 1 beyond the end of the bitmap, potentially corrupting unrelated
memory.  Oops.  This patch fixes it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch_init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Orit Wasserman Oct. 31, 2012, 11:08 a.m. UTC | #1
On 10/31/2012 05:43 AM, David Gibson wrote:
> The code for migrating (or savevm-ing) memory pages starts off by creating
> a dirty bitmap and filling it with 1s.  Except, actually, because bit
> addresses are 0-based it fills every bit except bit 0 with 1s and puts an
> extra 1 beyond the end of the bitmap, potentially corrupting unrelated
> memory.  Oops.  This patch fixes it.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch_init.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index e6effe8..b75a4c5 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -568,7 +568,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>  
>      migration_bitmap = bitmap_new(ram_pages);
> -    bitmap_set(migration_bitmap, 1, ram_pages);
> +    bitmap_set(migration_bitmap, 0, ram_pages);
>      migration_dirty_pages = ram_pages;
>  
>      bytes_transferred = 0;
> 
You are correct, good catch.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
David Gibson Nov. 2, 2012, 3:15 a.m. UTC | #2
On Wed, Oct 31, 2012 at 01:08:16PM +0200, Orit Wasserman wrote:
> On 10/31/2012 05:43 AM, David Gibson wrote:
> > The code for migrating (or savevm-ing) memory pages starts off by creating
> > a dirty bitmap and filling it with 1s.  Except, actually, because bit
> > addresses are 0-based it fills every bit except bit 0 with 1s and puts an
> > extra 1 beyond the end of the bitmap, potentially corrupting unrelated
> > memory.  Oops.  This patch fixes it.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  arch_init.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index e6effe8..b75a4c5 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -568,7 +568,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >      int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> >  
> >      migration_bitmap = bitmap_new(ram_pages);
> > -    bitmap_set(migration_bitmap, 1, ram_pages);
> > +    bitmap_set(migration_bitmap, 0, ram_pages);
> >      migration_dirty_pages = ram_pages;
> >  
> >      bytes_transferred = 0;
> > 
> You are correct, good catch.
> Reviewed-by: Orit Wasserman <owasserm@redhat.com>

Juan,

Sorry, forgot to CC you on the original mailing here, which I should
have done.  This is a serious bug in the migration code and we should
apply to mainline ASAP.
Juan Quintela Nov. 2, 2012, 10:58 a.m. UTC | #3
David Gibson <dwg@au1.ibm.com> wrote:
> On Wed, Oct 31, 2012 at 01:08:16PM +0200, Orit Wasserman wrote:
>> On 10/31/2012 05:43 AM, David Gibson wrote:
>> > The code for migrating (or savevm-ing) memory pages starts off by creating
>> > a dirty bitmap and filling it with 1s.  Except, actually, because bit
>> > addresses are 0-based it fills every bit except bit 0 with 1s and puts an
>> > extra 1 beyond the end of the bitmap, potentially corrupting unrelated
>> > memory.  Oops.  This patch fixes it.
>> > 
>> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> > ---
>> >  arch_init.c |    2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/arch_init.c b/arch_init.c
>> > index e6effe8..b75a4c5 100644
>> > --- a/arch_init.c
>> > +++ b/arch_init.c
>> > @@ -568,7 +568,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>> >      int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>> >  
>> >      migration_bitmap = bitmap_new(ram_pages);
>> > -    bitmap_set(migration_bitmap, 1, ram_pages);
>> > +    bitmap_set(migration_bitmap, 0, ram_pages);
>> >      migration_dirty_pages = ram_pages;
>> >  
>> >      bytes_transferred = 0;
>> > 
>> You are correct, good catch.
>> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
>
> Juan,
>
> Sorry, forgot to CC you on the original mailing here, which I should
> have done.  This is a serious bug in the migration code and we should
> apply to mainline ASAP.

Reviewed-by: Juan Quintela <quintela@redhat.com> 

Good catch, I missunderstood the function when fixing a different bug,
and never undrestood why it fixed it.

Thanks, Juan.
David Gibson Nov. 3, 2012, 3 p.m. UTC | #4
On Fri, Nov 02, 2012 at 11:58:32AM +0100, Juan Quintela wrote:
> David Gibson <dwg@au1.ibm.com> wrote:
> > On Wed, Oct 31, 2012 at 01:08:16PM +0200, Orit Wasserman wrote:
> >> On 10/31/2012 05:43 AM, David Gibson wrote:
> >> > The code for migrating (or savevm-ing) memory pages starts off by creating
> >> > a dirty bitmap and filling it with 1s.  Except, actually, because bit
> >> > addresses are 0-based it fills every bit except bit 0 with 1s and puts an
> >> > extra 1 beyond the end of the bitmap, potentially corrupting unrelated
> >> > memory.  Oops.  This patch fixes it.
> >> > 
> >> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> > ---
> >> >  arch_init.c |    2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> > 
> >> > diff --git a/arch_init.c b/arch_init.c
> >> > index e6effe8..b75a4c5 100644
> >> > --- a/arch_init.c
> >> > +++ b/arch_init.c
> >> > @@ -568,7 +568,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >> >      int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> >> >  
> >> >      migration_bitmap = bitmap_new(ram_pages);
> >> > -    bitmap_set(migration_bitmap, 1, ram_pages);
> >> > +    bitmap_set(migration_bitmap, 0, ram_pages);
> >> >      migration_dirty_pages = ram_pages;
> >> >  
> >> >      bytes_transferred = 0;
> >> > 
> >> You are correct, good catch.
> >> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
> >
> > Juan,
> >
> > Sorry, forgot to CC you on the original mailing here, which I should
> > have done.  This is a serious bug in the migration code and we should
> > apply to mainline ASAP.
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com> 
> 
> Good catch, I missunderstood the function when fixing a different bug,
> and never undrestood why it fixed it.

Actually.. it just occurred to me that I think there has to be another
bug here somewhere..

I haven't actually observed any effects from the memory corruption -
though it's certainly a real bug.  I found this because another effect
of this bug is that migration_dirty_pages count was set to 1 more than
the actual number of dirty bits in the bitmap.  That meant the dirty
pages count was never reaching zero and so the migration/savevm never
terminated.

Except.. that every so often the migration *did* terminate (maybe 1
time in 5).  Also I kind of hope somebody would have noticed this
earlier if migrations never terminated on x86 too.  But as far as I
can tell, if initially mismatched like this it ought to be impossible
for the dirty page count to ever reach zero.  Which suggests there is
another bug with the dirty count tracking :(.

It's possible the memory corruption could account for this, of course
- since that in theory at least, could have almost any strange effect
on the program's behavior.  But that doesn't seem particularly likely
to me.
Juan Quintela Nov. 4, 2012, 7:17 p.m. UTC | #5
David Gibson <dwg@au1.ibm.com> wrote:
> On Fri, Nov 02, 2012 at 11:58:32AM +0100, Juan Quintela wrote:
>> David Gibson <dwg@au1.ibm.com> wrote:
>> > On Wed, Oct 31, 2012 at 01:08:16PM +0200, Orit Wasserman wrote:
>> >> On 10/31/2012 05:43 AM, David Gibson wrote:
>> 
>> Reviewed-by: Juan Quintela <quintela@redhat.com> 
>> 
>> Good catch, I missunderstood the function when fixing a different bug,
>> and never undrestood why it fixed it.
>
> Actually.. it just occurred to me that I think there has to be another
> bug here somewhere..

I am at KVM Forum and LinuxCon for this week, so I can't test anything.

For some reason, I missunderstood bitmap_set() and though this was the
value that we "initiliazed" the bitmap with.  So, I changed from 0 to 1,
and ..... I was sending half the pages over the wire.  Yes, that is
right, just half of them.  So clearly we have some bug somewhere else :-()

>
> I haven't actually observed any effects from the memory corruption -
> though it's certainly a real bug.  I found this because another effect
> of this bug is that migration_dirty_pages count was set to 1 more than
> the actual number of dirty bits in the bitmap.  That meant the dirty
> pages count was never reaching zero and so the migration/savevm never
> terminated.

I wonder what is on page 0 on an x86, problably some BIOS data that
never changes?  No clue about pseries.

> Except.. that every so often the migration *did* terminate (maybe 1
> time in 5).  Also I kind of hope somebody would have noticed this
> earlier if migrations never terminated on x86 too.  But as far as I
> can tell, if initially mismatched like this it ought to be impossible
> for the dirty page count to ever reach zero.  Which suggests there is
> another bug with the dirty count tracking :(.

We use the dirty bitmap count to know how many pages are dirty, but once
that the number is low enough, we just sent "the  rest" of the pages.
So, it would always converge (or not) independent of that bug by one.
We never test for zero dirty pages, we test "are we able to send this
many pages" over "max_downtime".  So this explains why it works for you
sometimes.

>
> It's possible the memory corruption could account for this, of course
> - since that in theory at least, could have almost any strange effect
> on the program's behavior.  But that doesn't seem particularly likely
> to me.

This depends on _what_ is on page zero, if that is differente for
whatever we put there during boot, and it we ever wrote to that page
again, we would mark that pge dirty anyways, so I would put that
"corruption" problem as highly improbable.  Not that we shouldn't fix
the bug, but I doubt that you are getting memory corruption due to this
bug.

The only way that you can get memory corruption is if you write to that
page just before you do migration, and then you never wrote to it again.
What is on hardware page zero on pseries?  or it is just a normal page?

Later, Juan.
David Gibson Nov. 5, 2012, 12:17 a.m. UTC | #6
On Sun, Nov 04, 2012 at 08:17:29PM +0100, Juan Quintela wrote:
> David Gibson <dwg@au1.ibm.com> wrote:
> > On Fri, Nov 02, 2012 at 11:58:32AM +0100, Juan Quintela wrote:
> >> David Gibson <dwg@au1.ibm.com> wrote:
> >> > On Wed, Oct 31, 2012 at 01:08:16PM +0200, Orit Wasserman wrote:
> >> >> On 10/31/2012 05:43 AM, David Gibson wrote:
> >> 
> >> Reviewed-by: Juan Quintela <quintela@redhat.com> 
> >> 
> >> Good catch, I missunderstood the function when fixing a different bug,
> >> and never undrestood why it fixed it.
> >
> > Actually.. it just occurred to me that I think there has to be another
> > bug here somewhere..
> 
> I am at KVM Forum and LinuxCon for this week, so I can't test anything.
> 
> For some reason, I missunderstood bitmap_set() and though this was the
> value that we "initiliazed" the bitmap with.  So, I changed from 0 to 1,
> and ..... I was sending half the pages over the wire.  Yes, that is
> right, just half of them.  So clearly we have some bug somewhere
> else :-()

Ah.

> > I haven't actually observed any effects from the memory corruption -
> > though it's certainly a real bug.  I found this because another effect
> > of this bug is that migration_dirty_pages count was set to 1 more than
> > the actual number of dirty bits in the bitmap.  That meant the dirty
> > pages count was never reaching zero and so the migration/savevm never
> > terminated.
> 
> I wonder what is on page 0 on an x86, problably some BIOS data that
> never changes?  No clue about pseries.

On pseries it will be exception vectors.  So it will change on the
firmware->kernel transition.  And I think there may be a very few
special variables in there, so it will change later.  Note that the
migration_dirty_pages count will remain mismatched, whether or not
page 0 gets touched.

> > Except.. that every so often the migration *did* terminate (maybe 1
> > time in 5).  Also I kind of hope somebody would have noticed this
> > earlier if migrations never terminated on x86 too.  But as far as I
> > can tell, if initially mismatched like this it ought to be impossible
> > for the dirty page count to ever reach zero.  Which suggests there is
> > another bug with the dirty count tracking :(.
> 
> We use the dirty bitmap count to know how many pages are dirty, but once
> that the number is low enough, we just sent "the  rest" of the pages.
> So, it would always converge (or not) independent of that bug by one.
> We never test for zero dirty pages, we test "are we able to send this
> many pages" over "max_downtime".  So this explains why it works for you
> sometimes.

Hrm.  It explains why it works sometimes (phew the bug I though must
be there needn't be).  It doesn't explain why it mostly didn't - I
actually saw it spinning there on ram_save_iterate(), not completing
with one dirty page remaining.

But t

> > It's possible the memory corruption could account for this, of course
> > - since that in theory at least, could have almost any strange effect
> > on the program's behavior.  But that doesn't seem particularly likely
> > to me.
> 
> This depends on _what_ is on page zero, if that is differente for
> whatever we put there during boot, and it we ever wrote to that page
> again, we would mark that pge dirty anyways, so I would put that
> "corruption" problem as highly improbable.  Not that we shouldn't fix
> the bug, but I doubt that you are getting memory corruption due to this
> bug.
> 
> The only way that you can get memory corruption is if you write to that
> page just before you do migration, and then you never wrote to it again.
> What is on hardware page zero on pseries?  or it is just a normal page?

You misunderstand.  I wasn't talking about potential corruption of
guest memory because we fail to transmit page 0 (though that is also a
bug).  I'm talking about corruption of qemu internal memory because
the bitmap_set() writes one 1 bit beyond the end of the space
allocated for the migration bitmap - it has no kind of bounds checking.
David Gibson Nov. 21, 2012, 12:43 a.m. UTC | #7
On Fri, Nov 02, 2012 at 11:58:32AM +0100, Juan Quintela wrote:
> David Gibson <dwg@au1.ibm.com> wrote:
> > On Wed, Oct 31, 2012 at 01:08:16PM +0200, Orit Wasserman wrote:
> >> On 10/31/2012 05:43 AM, David Gibson wrote:
> >> > The code for migrating (or savevm-ing) memory pages starts off by creating
> >> > a dirty bitmap and filling it with 1s.  Except, actually, because bit
> >> > addresses are 0-based it fills every bit except bit 0 with 1s and puts an
> >> > extra 1 beyond the end of the bitmap, potentially corrupting unrelated
> >> > memory.  Oops.  This patch fixes it.
> >> > 
> >> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> > ---
> >> >  arch_init.c |    2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> > 
> >> > diff --git a/arch_init.c b/arch_init.c
> >> > index e6effe8..b75a4c5 100644
> >> > --- a/arch_init.c
> >> > +++ b/arch_init.c
> >> > @@ -568,7 +568,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >> >      int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> >> >  
> >> >      migration_bitmap = bitmap_new(ram_pages);
> >> > -    bitmap_set(migration_bitmap, 1, ram_pages);
> >> > +    bitmap_set(migration_bitmap, 0, ram_pages);
> >> >      migration_dirty_pages = ram_pages;
> >> >  
> >> >      bytes_transferred = 0;
> >> > 
> >> You are correct, good catch.
> >> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
> >
> > Juan,
> >
> > Sorry, forgot to CC you on the original mailing here, which I should
> > have done.  This is a serious bug in the migration code and we should
> > apply to mainline ASAP.
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com> 
> 
> Good catch, I missunderstood the function when fixing a different bug,
> and never undrestood why it fixed it.

This still hasn't been merged...

This is a fix for a serious bug potentially causing guest memory
corruption, and definitely causing qemu memory corruption, it really
wants to be in the 1.3 release.
diff mbox

Patch

diff --git a/arch_init.c b/arch_init.c
index e6effe8..b75a4c5 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -568,7 +568,7 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
     int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
     migration_bitmap = bitmap_new(ram_pages);
-    bitmap_set(migration_bitmap, 1, ram_pages);
+    bitmap_set(migration_bitmap, 0, ram_pages);
     migration_dirty_pages = ram_pages;
 
     bytes_transferred = 0;