diff mbox

[6/6] kvm: Fix dirty tracking with large kernel page size

Message ID 1330043012-30556-7-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson Feb. 24, 2012, 12:23 a.m. UTC
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

If the kernel page size is larger than TARGET_PAGE_SIZE, which
happens for example on ppc64 with kernels compiled for 64K pages,
the dirty tracking doesn't work.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tossatti <mtossatti@redhat.com>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 kvm-all.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

Comments

Benjamin Herrenschmidt Feb. 24, 2012, 1:03 a.m. UTC | #1
> diff --git a/kvm-all.c b/kvm-all.c
> index 5e188bf..3f8cfd9 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -348,10 +348,11 @@ static int kvm_set_migration_log(int enable)
>  static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
>                                           unsigned long *bitmap)
>  {
> -    unsigned int i, j;
> +  unsigned int i, j;

That hunk looks like accidental damage, you should remove it :-)

Cheers,
Ben.
Blue Swirl Feb. 26, 2012, 9:41 p.m. UTC | #2
On Fri, Feb 24, 2012 at 00:23, David Gibson <david@gibson.dropbear.id.au> wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>
> If the kernel page size is larger than TARGET_PAGE_SIZE, which
> happens for example on ppc64 with kernels compiled for 64K pages,
> the dirty tracking doesn't work.

I think a better solution would be to push this to memory API and
underlying exec.c dirty tracking so that they use the same page size
as kernel (only in this KVM case, in general dirty tracking should
match TARGET_PAGE_SIZE granularity).

> Cc: Avi Kivity <avi@redhat.com>
> Cc: Marcelo Tossatti <mtossatti@redhat.com>
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  kvm-all.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 5e188bf..3f8cfd9 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -348,10 +348,11 @@ static int kvm_set_migration_log(int enable)
>  static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
>                                          unsigned long *bitmap)
>  {
> -    unsigned int i, j;
> +  unsigned int i, j;
>     unsigned long page_number, c;
>     target_phys_addr_t addr, addr1;
>     unsigned int len = ((section->size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / HOST_LONG_BITS;
> +    unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE;
>
>     /*
>      * bitmap-traveling is faster than memory-traveling (for addr...)
> @@ -363,10 +364,10 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
>             do {
>                 j = ffsl(c) - 1;
>                 c &= ~(1ul << j);
> -                page_number = i * HOST_LONG_BITS + j;
> +                page_number = (i * HOST_LONG_BITS + j) * hpratio;
>                 addr1 = page_number * TARGET_PAGE_SIZE;
>                 addr = section->offset_within_region + addr1;
> -                memory_region_set_dirty(section->mr, addr, TARGET_PAGE_SIZE);
> +                memory_region_set_dirty(section->mr, addr, TARGET_PAGE_SIZE * hpratio);
>             } while (c != 0);
>         }
>     }
> --
> 1.7.9
>
>
David Gibson Feb. 27, 2012, 12:16 a.m. UTC | #3
On Sun, Feb 26, 2012 at 09:41:17PM +0000, Blue Swirl wrote:
> On Fri, Feb 24, 2012 at 00:23, David Gibson <david@gibson.dropbear.id.au> wrote:
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >
> > If the kernel page size is larger than TARGET_PAGE_SIZE, which
> > happens for example on ppc64 with kernels compiled for 64K pages,
> > the dirty tracking doesn't work.
> 
> I think a better solution would be to push this to memory API and
> underlying exec.c dirty tracking so that they use the same page size
> as kernel (only in this KVM case, in general dirty tracking should
> match TARGET_PAGE_SIZE granularity).

I'm having trouble reconciling the two parts of this comment.  If it
should be in terms of TARGET_PAGE_SIZE generally, why _not_ keep it
that way always, and just do a fixup when we have to send the data to
the host kernel in terms of host kernel page size?
Benjamin Herrenschmidt Feb. 27, 2012, 12:25 a.m. UTC | #4
On Mon, 2012-02-27 at 11:16 +1100, David Gibson wrote:
> > > If the kernel page size is larger than TARGET_PAGE_SIZE, which
> > > happens for example on ppc64 with kernels compiled for 64K pages,
> > > the dirty tracking doesn't work.
> > 
> > I think a better solution would be to push this to memory API and
> > underlying exec.c dirty tracking so that they use the same page size
> > as kernel (only in this KVM case, in general dirty tracking should
> > match TARGET_PAGE_SIZE granularity). 

That sounds horrible... you propose a -MUCH- more invasive change to a
nasty & complex core piece of code to deal with what is fixed by a
2-liner patch ?

Cheers,
Ben.
Alexander Graf Feb. 27, 2012, 12:36 a.m. UTC | #5
On 26.02.2012, at 22:41, Blue Swirl wrote:

> On Fri, Feb 24, 2012 at 00:23, David Gibson <david@gibson.dropbear.id.au> wrote:
>> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> 
>> If the kernel page size is larger than TARGET_PAGE_SIZE, which
>> happens for example on ppc64 with kernels compiled for 64K pages,
>> the dirty tracking doesn't work.
> 
> I think a better solution would be to push this to memory API and
> underlying exec.c dirty tracking so that they use the same page size
> as kernel (only in this KVM case, in general dirty tracking should
> match TARGET_PAGE_SIZE granularity).

Yeah, that would allow us to make sure we only align MMIO regions where we can, but I don't think it's an easy change to make. And this way the common page size throughout QEMU is TARGET_PAGE_SIZE, which other pieces of code rely on. Also, dynamically changing TARGET_PAGE_SIZE has unknown performance implications.

So for the time being, I definitely think this is the right approach. It's easy and isolated :).


Alex
Avi Kivity Feb. 28, 2012, 12:32 p.m. UTC | #6
On 02/24/2012 02:23 AM, David Gibson wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>
> If the kernel page size is larger than TARGET_PAGE_SIZE, which
> happens for example on ppc64 with kernels compiled for 64K pages,
> the dirty tracking doesn't work.
>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Marcelo Tossatti <mtossatti@redhat.com>
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  kvm-all.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 5e188bf..3f8cfd9 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -348,10 +348,11 @@ static int kvm_set_migration_log(int enable)
>  static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
>                                           unsigned long *bitmap)
>  {
> -    unsigned int i, j;
> +  unsigned int i, j;
>      unsigned long page_number, c;
>      target_phys_addr_t addr, addr1;
>      unsigned int len = ((section->size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / HOST_LONG_BITS;
> +    unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE;
>  
>

What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
Benjamin Herrenschmidt Feb. 28, 2012, 9:48 p.m. UTC | #7
On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:

> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?

We have yet to encounter such a case. It's not currently possible on
power (some old embedded chips could do 1K and 2K page sizes in the TLB
iirc but we never supported that in Linux and it's being phased out in
HW).

I suggest that gets dealt with when/if it needs to, which means probably
never :-)

Cheers,
Ben.
Alexander Graf Feb. 28, 2012, 11:32 p.m. UTC | #8
On 24.02.2012, at 01:23, David Gibson wrote:

> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> If the kernel page size is larger than TARGET_PAGE_SIZE, which
> happens for example on ppc64 with kernels compiled for 64K pages,
> the dirty tracking doesn't work.
> 
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Marcelo Tossatti <mtossatti@redhat.com>
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> kvm-all.c |    7 ++++---
> 1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 5e188bf..3f8cfd9 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -348,10 +348,11 @@ static int kvm_set_migration_log(int enable)
> static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
>                                          unsigned long *bitmap)
> {
> -    unsigned int i, j;
> +  unsigned int i, j;
>     unsigned long page_number, c;
>     target_phys_addr_t addr, addr1;
>     unsigned int len = ((section->size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / HOST_LONG_BITS;
> +    unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE;

Actually, looking at this, don't we rather want to cache hpratio? The way this is implemented, it would mean we'd do a sysctl for every call, right?


Alex
David Gibson Feb. 29, 2012, 12:22 a.m. UTC | #9
On Wed, Feb 29, 2012 at 12:32:51AM +0100, Alexander Graf wrote:
> 
> On 24.02.2012, at 01:23, David Gibson wrote:
> 
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > 
> > If the kernel page size is larger than TARGET_PAGE_SIZE, which
> > happens for example on ppc64 with kernels compiled for 64K pages,
> > the dirty tracking doesn't work.
> > 
> > Cc: Avi Kivity <avi@redhat.com>
> > Cc: Marcelo Tossatti <mtossatti@redhat.com>
> > 
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > kvm-all.c |    7 ++++---
> > 1 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kvm-all.c b/kvm-all.c
> > index 5e188bf..3f8cfd9 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -348,10 +348,11 @@ static int kvm_set_migration_log(int enable)
> > static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
> >                                          unsigned long *bitmap)
> > {
> > -    unsigned int i, j;
> > +  unsigned int i, j;
> >     unsigned long page_number, c;
> >     target_phys_addr_t addr, addr1;
> >     unsigned int len = ((section->size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / HOST_LONG_BITS;
> > +    unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE;
> 
> Actually, looking at this, don't we rather want to cache hpratio?
> The way this is implemented, it would mean we'd do a sysctl for
> every call, right?

I think libc already caches this.
Blue Swirl March 3, 2012, 3:29 p.m. UTC | #10
On Mon, Feb 27, 2012 at 00:36, Alexander Graf <agraf@suse.de> wrote:
>
> On 26.02.2012, at 22:41, Blue Swirl wrote:
>
>> On Fri, Feb 24, 2012 at 00:23, David Gibson <david@gibson.dropbear.id.au> wrote:
>>> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>
>>> If the kernel page size is larger than TARGET_PAGE_SIZE, which
>>> happens for example on ppc64 with kernels compiled for 64K pages,
>>> the dirty tracking doesn't work.
>>
>> I think a better solution would be to push this to memory API and
>> underlying exec.c dirty tracking so that they use the same page size
>> as kernel (only in this KVM case, in general dirty tracking should
>> match TARGET_PAGE_SIZE granularity).
>
> Yeah, that would allow us to make sure we only align MMIO regions where we can, but I don't think it's an easy change to make. And this way the common page size throughout QEMU is TARGET_PAGE_SIZE, which other pieces of code rely on. Also, dynamically changing TARGET_PAGE_SIZE has unknown performance implications.
>
> So for the time being, I definitely think this is the right approach. It's easy and isolated :).

Laziness ;-) This could be the initial approach though and someone
could improve upon it later.

For better performance, KVM and QEMU could even share the bit maps (if
the page sizes are kept in synch) so there would be no need to
synchronize. But if MMU page sizes are not constant (huge pages, huge
gaps), also bit map approach could be suboptimal and some other way
could be needed.

>
>
> Alex
>
Avi Kivity March 4, 2012, 10:49 a.m. UTC | #11
On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>
> > What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>
> We have yet to encounter such a case. It's not currently possible on
> power (some old embedded chips could do 1K and 2K page sizes in the TLB
> iirc but we never supported that in Linux and it's being phased out in
> HW).
>
> I suggest that gets dealt with when/if it needs to, which means probably
> never :-)

Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
on a 64k host?

Maybe I'm misremembering or misunderstanding something.
Benjamin Herrenschmidt March 4, 2012, 11:53 a.m. UTC | #12
On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
> > On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
> >
> > > What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
> >
> > We have yet to encounter such a case. It's not currently possible on
> > power (some old embedded chips could do 1K and 2K page sizes in the TLB
> > iirc but we never supported that in Linux and it's being phased out in
> > HW).
> >
> > I suggest that gets dealt with when/if it needs to, which means probably
> > never :-)
> 
> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
> on a 64k host?
> 
> Maybe I'm misremembering or misunderstanding something.

TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
#define.

The host kernel exposes the dirty bit map with a bit per -host- kernel
PAGE_SIZE (which is what getpagesize() gets you in qemu).

My patch makes thus makes things work when the host uses 64K page sizes.
In all cases, the guest page size is irrelevant, this is purely a
problem between the host kernel and qemu.

Cheers,
Ben.
Avi Kivity March 4, 2012, 12:18 p.m. UTC | #13
On 03/04/2012 01:53 PM, Benjamin Herrenschmidt wrote:
> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
> > On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
> > > On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
> > >
> > > > What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
> > >
> > > We have yet to encounter such a case. It's not currently possible on
> > > power (some old embedded chips could do 1K and 2K page sizes in the TLB
> > > iirc but we never supported that in Linux and it's being phased out in
> > > HW).
> > >
> > > I suggest that gets dealt with when/if it needs to, which means probably
> > > never :-)
> > 
> > Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
> > on a 64k host?
> > 
> > Maybe I'm misremembering or misunderstanding something.
>
> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
> #define.
>
> The host kernel exposes the dirty bit map with a bit per -host- kernel
> PAGE_SIZE (which is what getpagesize() gets you in qemu).
>
> My patch makes thus makes things work when the host uses 64K page sizes.
> In all cases, the guest page size is irrelevant, this is purely a
> problem between the host kernel and qemu.

Right (and I actually knew all this stuff before :( ).
Andreas Färber March 4, 2012, 4:46 p.m. UTC | #14
Am 04.03.2012 12:53, schrieb Benjamin Herrenschmidt:
> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
>> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>>>
>>>> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>>>
>>> We have yet to encounter such a case. It's not currently possible on
>>> power (some old embedded chips could do 1K and 2K page sizes in the TLB
>>> iirc but we never supported that in Linux and it's being phased out in
>>> HW).
>>>
>>> I suggest that gets dealt with when/if it needs to, which means probably
>>> never :-)
>>
>> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
>> on a 64k host?
>>
>> Maybe I'm misremembering or misunderstanding something.

> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
> #define.

Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.

Maybe just add an assert and be done with it?

For the record, I am hoping to get rid of some of those cpu.h defines in
a later stage of QOM'ification.

Andreas
Alexander Graf March 4, 2012, 6:46 p.m. UTC | #15
On 04.03.2012, at 17:46, Andreas Färber <afaerber@suse.de> wrote:

> Am 04.03.2012 12:53, schrieb Benjamin Herrenschmidt:
>> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
>>> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
>>>> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>>>> 
>>>>> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>>>> 
>>>> We have yet to encounter such a case. It's not currently possible on
>>>> power (some old embedded chips could do 1K and 2K page sizes in the TLB
>>>> iirc but we never supported that in Linux and it's being phased out in
>>>> HW).
>>>> 
>>>> I suggest that gets dealt with when/if it needs to, which means probably
>>>> never :-)
>>> 
>>> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
>>> on a 64k host?
>>> 
>>> Maybe I'm misremembering or misunderstanding something.
> 
>> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
>> #define.
> 
> Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.
> 
> Maybe just add an assert and be done with it?

Assert for what? Linux page size of 64k is something perfectly normal on ppc. The hardware can always do at least 4k maps however.

Alex

> 
> For the record, I am hoping to get rid of some of those cpu.h defines in
> a later stage of QOM'ification.
> 
> Andreas
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Andreas Färber March 4, 2012, 8:21 p.m. UTC | #16
Am 04.03.2012 19:46, schrieb Alexander Graf:
> 
> 
> On 04.03.2012, at 17:46, Andreas Färber <afaerber@suse.de> wrote:
> 
>> Am 04.03.2012 12:53, schrieb Benjamin Herrenschmidt:
>>> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
>>>> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
>>>>> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>>>>>
>>>>>> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>>>>>
>>>>> We have yet to encounter such a case. It's not currently possible on
>>>>> power (some old embedded chips could do 1K and 2K page sizes in the TLB
>>>>> iirc but we never supported that in Linux and it's being phased out in
>>>>> HW).
>>>>>
>>>>> I suggest that gets dealt with when/if it needs to, which means probably
>>>>> never :-)
>>>>
>>>> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
>>>> on a 64k host?
>>>>
>>>> Maybe I'm misremembering or misunderstanding something.
>>
>>> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
>>> #define.
>>
>> Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.
>>
>> Maybe just add an assert and be done with it?
> 
> Assert for what? Linux page size of 64k is something perfectly normal on ppc. The hardware can always do at least 4k maps however.

g_assert(TARGET_PAGE_SIZE <= getpagesize())

Just declare the above case as unsupported and abort if we encounter it.

Andreas
Alexander Graf March 4, 2012, 8:25 p.m. UTC | #17
On 04.03.2012, at 21:21, Andreas Färber <afaerber@suse.de> wrote:

> Am 04.03.2012 19:46, schrieb Alexander Graf:
>> 
>> 
>> On 04.03.2012, at 17:46, Andreas Färber <afaerber@suse.de> wrote:
>> 
>>> Am 04.03.2012 12:53, schrieb Benjamin Herrenschmidt:
>>>> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
>>>>> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
>>>>>> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>>>>>> 
>>>>>>> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>>>>>> 
>>>>>> We have yet to encounter such a case. It's not currently possible on
>>>>>> power (some old embedded chips could do 1K and 2K page sizes in the TLB
>>>>>> iirc but we never supported that in Linux and it's being phased out in
>>>>>> HW).
>>>>>> 
>>>>>> I suggest that gets dealt with when/if it needs to, which means probably
>>>>>> never :-)
>>>>> 
>>>>> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
>>>>> on a 64k host?
>>>>> 
>>>>> Maybe I'm misremembering or misunderstanding something.
>>> 
>>>> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
>>>> #define.
>>> 
>>> Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.
>>> 
>>> Maybe just add an assert and be done with it?
>> 
>> Assert for what? Linux page size of 64k is something perfectly normal on ppc. The hardware can always do at least 4k maps however.
> 
> g_assert(TARGET_PAGE_SIZE <= getpagesize())
> 
> Just declare the above case as unsupported and abort if we encounter it.

What I'm trying to tell you is that it's the default case on book3s ppc! ;)

Alex
Andreas Färber March 4, 2012, 8:31 p.m. UTC | #18
Am 04.03.2012 21:25, schrieb Alexander Graf:
> 
> 
> On 04.03.2012, at 21:21, Andreas Färber <afaerber@suse.de> wrote:
> 
>> Am 04.03.2012 19:46, schrieb Alexander Graf:
>>>
>>>
>>> On 04.03.2012, at 17:46, Andreas Färber <afaerber@suse.de> wrote:
>>>
>>>> Am 04.03.2012 12:53, schrieb Benjamin Herrenschmidt:
>>>>> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
>>>>>> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
>>>>>>> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>>>>>>>
>>>>>>>> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>>>>>>>
>>>>>>> We have yet to encounter such a case. It's not currently possible on
>>>>>>> power (some old embedded chips could do 1K and 2K page sizes in the TLB
>>>>>>> iirc but we never supported that in Linux and it's being phased out in
>>>>>>> HW).
>>>>>>>
>>>>>>> I suggest that gets dealt with when/if it needs to, which means probably
>>>>>>> never :-)
>>>>>>
>>>>>> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
>>>>>> on a 64k host?
>>>>>>
>>>>>> Maybe I'm misremembering or misunderstanding something.
>>>>
>>>>> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
>>>>> #define.
>>>>
>>>> Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.
>>>>
>>>> Maybe just add an assert and be done with it?
>>>
>>> Assert for what? Linux page size of 64k is something perfectly normal on ppc. The hardware can always do at least 4k maps however.
>>
>> g_assert(TARGET_PAGE_SIZE <= getpagesize())
>>
>> Just declare the above case as unsupported and abort if we encounter it.
> 
> What I'm trying to tell you is that it's the default case on book3s ppc! ;)

Exactly, which is why I'm saying just ignore the weird embedded case. :)

Andreas
Alexander Graf March 4, 2012, 8:59 p.m. UTC | #19
On 04.03.2012, at 21:31, Andreas Färber <afaerber@suse.de> wrote:

> Am 04.03.2012 21:25, schrieb Alexander Graf:
>> 
>> 
>> On 04.03.2012, at 21:21, Andreas Färber <afaerber@suse.de> wrote:
>> 
>>> Am 04.03.2012 19:46, schrieb Alexander Graf:
>>>> 
>>>> 
>>>> On 04.03.2012, at 17:46, Andreas Färber <afaerber@suse.de> wrote:
>>>> 
>>>>> Am 04.03.2012 12:53, schrieb Benjamin Herrenschmidt:
>>>>>> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
>>>>>>> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
>>>>>>>> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>>>>>>>> 
>>>>>>>>> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>>>>>>>> 
>>>>>>>> We have yet to encounter such a case. It's not currently possible on
>>>>>>>> power (some old embedded chips could do 1K and 2K page sizes in the TLB
>>>>>>>> iirc but we never supported that in Linux and it's being phased out in
>>>>>>>> HW).
>>>>>>>> 
>>>>>>>> I suggest that gets dealt with when/if it needs to, which means probably
>>>>>>>> never :-)
>>>>>>> 
>>>>>>> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
>>>>>>> on a 64k host?
>>>>>>> 
>>>>>>> Maybe I'm misremembering or misunderstanding something.
>>>>> 
>>>>>> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
>>>>>> #define.
>>>>> 
>>>>> Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.
>>>>> 
>>>>> Maybe just add an assert and be done with it?
>>>> 
>>>> Assert for what? Linux page size of 64k is something perfectly normal on ppc. The hardware can always do at least 4k maps however.
>>> 
>>> g_assert(TARGET_PAGE_SIZE <= getpagesize())
>>> 
>>> Just declare the above case as unsupported and abort if we encounter it.
>> 
>> What I'm trying to tell you is that it's the default case on book3s ppc! ;)
> 
> Exactly, which is why I'm saying just ignore the weird embedded case. :)

Ugh. Sorry, apparently I can't read :). So you're saying 'break for ppcemb'. Hrm. Not sure that'd be all that great for 440, since there host pagesize is still 4k, but T_P_S is 1k.

Alex


> 
> Andreas
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Benjamin Herrenschmidt March 4, 2012, 9:17 p.m. UTC | #20
On Sun, 2012-03-04 at 17:46 +0100, Andreas Färber wrote:
> Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.
> 
> Maybe just add an assert and be done with it?

Well, my patch should work anyway, as long as getpagesize() returns a
higher power of two.

The case that wouldn't work (but also doesn't exist) is getpagesize() <
TARGET_PAGE_SIZE.

> For the record, I am hoping to get rid of some of those cpu.h defines
> in a later stage of QOM'ification.

Cheers,
Ben.
Benjamin Herrenschmidt March 4, 2012, 9:19 p.m. UTC | #21
On Sun, 2012-03-04 at 21:59 +0100, Alexander Graf wrote:
> >>> g_assert(TARGET_PAGE_SIZE <= getpagesize())
> >>> 
> >>> Just declare the above case as unsupported and abort if we
> encounter it.
> >> 
> >> What I'm trying to tell you is that it's the default case on book3s
> ppc! ;)
> > 
> > Exactly, which is why I'm saying just ignore the weird embedded
> case. :)
> 
> Ugh. Sorry, apparently I can't read :). So you're saying 'break for
> ppcemb'. Hrm. Not sure that'd be all that great for 440, since there
> host pagesize is still 4k, but T_P_S is 1k.

No, Alex, you are reading Andreas assert backward :-)

What he suggests is "break if TARGET_PAGE_SIZE > getpagesize()" which
currently cannot happen. Even embedded 1k TARGET_PAGE_SIZE is fine. It's
getpagesize() 1k that wouldn't be but it also never happens.

Cheers,
Ben.
Andreas Färber March 4, 2012, 9:21 p.m. UTC | #22
Am 04.03.2012 21:59, schrieb Alexander Graf:
> 
> 
> On 04.03.2012, at 21:31, Andreas Färber <afaerber@suse.de> wrote:
> 
>> Am 04.03.2012 21:25, schrieb Alexander Graf:
>>>
>>>
>>> On 04.03.2012, at 21:21, Andreas Färber <afaerber@suse.de> wrote:
>>>
>>>> Am 04.03.2012 19:46, schrieb Alexander Graf:
>>>>>
>>>>>
>>>>> On 04.03.2012, at 17:46, Andreas Färber <afaerber@suse.de> wrote:
>>>>>
>>>>>> Am 04.03.2012 12:53, schrieb Benjamin Herrenschmidt:
>>>>>>> On Sun, 2012-03-04 at 12:49 +0200, Avi Kivity wrote:
>>>>>>>> On 02/28/2012 11:48 PM, Benjamin Herrenschmidt wrote:
>>>>>>>>> On Tue, 2012-02-28 at 14:32 +0200, Avi Kivity wrote:
>>>>>>>>>
>>>>>>>>>> What if TARGET_PAGE_SIZE > getpagesize()?  Or is that impossible?
>>>>>>>>>
>>>>>>>>> We have yet to encounter such a case. It's not currently possible on
>>>>>>>>> power (some old embedded chips could do 1K and 2K page sizes in the TLB
>>>>>>>>> iirc but we never supported that in Linux and it's being phased out in
>>>>>>>>> HW).
>>>>>>>>>
>>>>>>>>> I suggest that gets dealt with when/if it needs to, which means probably
>>>>>>>>> never :-)
>>>>>>>>
>>>>>>>> Doesn't ppc support both 4k and 64k pages?  Suppose you run a 4k guest
>>>>>>>> on a 64k host?
>>>>>>>>
>>>>>>>> Maybe I'm misremembering or misunderstanding something.
>>>>>>
>>>>>>> TARGET_PAGE_SIZE in qemu is always 4k for powerpc, it's a compile time
>>>>>>> #define.
>>>>>>
>>>>>> Except for ppcemb-softmmu (1k), which is irrelevant for KVM AFAIU.
>>>>>>
>>>>>> Maybe just add an assert and be done with it?
>>>>>
>>>>> Assert for what? Linux page size of 64k is something perfectly normal on ppc. The hardware can always do at least 4k maps however.
>>>>
>>>> g_assert(TARGET_PAGE_SIZE <= getpagesize())
>>>>
>>>> Just declare the above case as unsupported and abort if we encounter it.
>>>
>>> What I'm trying to tell you is that it's the default case on book3s ppc! ;)
>>
>> Exactly, which is why I'm saying just ignore the weird embedded case. :)
> 
> [...] So you're saying 'break for ppcemb'. Hrm. Not sure that'd be all that great for 440, since there host pagesize is still 4k, but T_P_S is 1k.

Err, 1k <= 4k would still be supported. The way I see it, the only cases
breaking would be ppc with host page size < 4k, and ppcemb with host
page size < 1k (which I'm not aware of).

Is it realistic to expect virtualizing a Mac or pSeries to work on a
1k/2k bamboo? TCG would be unaffected AFAICT.

Andreas
Alexander Graf March 4, 2012, 9:45 p.m. UTC | #23
On 04.03.2012, at 22:19, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Sun, 2012-03-04 at 21:59 +0100, Alexander Graf wrote:
>>>>> g_assert(TARGET_PAGE_SIZE <= getpagesize())
>>>>> 
>>>>> Just declare the above case as unsupported and abort if we
>> encounter it.
>>>> 
>>>> What I'm trying to tell you is that it's the default case on book3s
>> ppc! ;)
>>> 
>>> Exactly, which is why I'm saying just ignore the weird embedded
>> case. :)
>> 
>> Ugh. Sorry, apparently I can't read :). So you're saying 'break for
>> ppcemb'. Hrm. Not sure that'd be all that great for 440, since there
>> host pagesize is still 4k, but T_P_S is 1k.
> 
> No, Alex, you are reading Andreas assert backward :-)
> 
> What he suggests is "break if TARGET_PAGE_SIZE > getpagesize()" which
> currently cannot happen. Even embedded 1k TARGET_PAGE_SIZE is fine. It's
> getpagesize() 1k that wouldn't be but it also never happens.

O_o. Ok, that would obviously work. We never get hostpagesize <4k and T_P_S is always 1k/4k, never bigger.

Alex

> 
> Cheers,
> Ben.
> 
>
diff mbox

Patch

diff --git a/kvm-all.c b/kvm-all.c
index 5e188bf..3f8cfd9 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -348,10 +348,11 @@  static int kvm_set_migration_log(int enable)
 static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
                                          unsigned long *bitmap)
 {
-    unsigned int i, j;
+  unsigned int i, j;
     unsigned long page_number, c;
     target_phys_addr_t addr, addr1;
     unsigned int len = ((section->size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / HOST_LONG_BITS;
+    unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE;
 
     /*
      * bitmap-traveling is faster than memory-traveling (for addr...)
@@ -363,10 +364,10 @@  static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
             do {
                 j = ffsl(c) - 1;
                 c &= ~(1ul << j);
-                page_number = i * HOST_LONG_BITS + j;
+                page_number = (i * HOST_LONG_BITS + j) * hpratio;
                 addr1 = page_number * TARGET_PAGE_SIZE;
                 addr = section->offset_within_region + addr1;
-                memory_region_set_dirty(section->mr, addr, TARGET_PAGE_SIZE);
+                memory_region_set_dirty(section->mr, addr, TARGET_PAGE_SIZE * hpratio);
             } while (c != 0);
         }
     }