diff mbox

Call MADV_HUGEPAGE for guest RAM allocations

Message ID 20121005164758.4808b2d1@doriath.home
State New
Headers show

Commit Message

Luiz Capitulino Oct. 5, 2012, 7:47 p.m. UTC
This makes it possible for QEMU to use transparent huge pages (THP)
when transparent_hugepage/enabled=madvise. Otherwise THP is only
used when it's enabled system wide.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 exec.c  | 1 +
 osdep.h | 5 +++++
 2 files changed, 6 insertions(+)

Comments

Luiz Capitulino Oct. 15, 2012, 6:57 p.m. UTC | #1
On Fri, 5 Oct 2012 16:47:57 -0300
Luiz Capitulino <lcapitulino@redhat.com> wrote:

> This makes it possible for QEMU to use transparent huge pages (THP)
> when transparent_hugepage/enabled=madvise. Otherwise THP is only
> used when it's enabled system wide.
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>

ping?

> ---
>  exec.c  | 1 +
>  osdep.h | 5 +++++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/exec.c b/exec.c
> index 1114a09..7504909 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2584,6 +2584,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>      cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
>  
>      qemu_ram_setup_dump(new_block->host, size);
> +    qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
>  
>      if (kvm_enabled())
>          kvm_setup_guest_memory(new_block->host, size);
> diff --git a/osdep.h b/osdep.h
> index cb213e0..c5fd3d9 100644
> --- a/osdep.h
> +++ b/osdep.h
> @@ -108,6 +108,11 @@ void qemu_vfree(void *ptr);
>  #else
>  #define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
>  #endif
> +#ifdef MADV_HUGEPAGE
> +#define QEMU_MADV_HUGEPAGE MADV_HUGEPAGE
> +#else
> +#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
> +#endif
>  
>  #elif defined(CONFIG_POSIX_MADVISE)
>
Michael Tokarev Oct. 15, 2012, 9:14 p.m. UTC | #2
On 15.10.2012 22:57, Luiz Capitulino wrote:
> On Fri, 5 Oct 2012 16:47:57 -0300
> Luiz Capitulino <lcapitulino@redhat.com> wrote:
> 
>> This makes it possible for QEMU to use transparent huge pages (THP)
>> when transparent_hugepage/enabled=madvise. Otherwise THP is only
>> used when it's enabled system wide.
>>
>> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> 
> ping?
> 
>> ---
>>  exec.c  | 1 +
>>  osdep.h | 5 +++++
>>  2 files changed, 6 insertions(+)
>>
>> diff --git a/exec.c b/exec.c
>> index 1114a09..7504909 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2584,6 +2584,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>>      cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
>>  
>>      qemu_ram_setup_dump(new_block->host, size);
>> +    qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);

FWIW, there was another attempt to do something like this:

https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg02870.html

I'm not sure it is right or not, I just tried to guess, but that one
also went unanswered.  Maybe you will have better luck.

Thanks,

/mjt
Aurelien Jarno Oct. 21, 2012, 3:46 a.m. UTC | #3
On Mon, Oct 15, 2012 at 03:57:54PM -0300, Luiz Capitulino wrote:
> On Fri, 5 Oct 2012 16:47:57 -0300
> Luiz Capitulino <lcapitulino@redhat.com> wrote:
> 
> > This makes it possible for QEMU to use transparent huge pages (THP)
> > when transparent_hugepage/enabled=madvise. Otherwise THP is only
> > used when it's enabled system wide.
> > 
> > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> 
> ping?
> 
> > ---
> >  exec.c  | 1 +
> >  osdep.h | 5 +++++
> >  2 files changed, 6 insertions(+)
> > 
> > diff --git a/exec.c b/exec.c
> > index 1114a09..7504909 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2584,6 +2584,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
> >      cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
> >  
> >      qemu_ram_setup_dump(new_block->host, size);
> > +    qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
> >  
> >      if (kvm_enabled())
> >          kvm_setup_guest_memory(new_block->host, size);
> > diff --git a/osdep.h b/osdep.h
> > index cb213e0..c5fd3d9 100644
> > --- a/osdep.h
> > +++ b/osdep.h
> > @@ -108,6 +108,11 @@ void qemu_vfree(void *ptr);
> >  #else
> >  #define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
> >  #endif
> > +#ifdef MADV_HUGEPAGE
> > +#define QEMU_MADV_HUGEPAGE MADV_HUGEPAGE
> > +#else
> > +#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
> > +#endif
> >  
> >  #elif defined(CONFIG_POSIX_MADVISE)
> >  
> 

I don't know this part of QEMU very well, so I tried to compare with how
it was done for KSM. I found two main differences:
- In the case of -mem-path QEMU doesn't try to mark the pages as
  mergeable.
- An option (-machine mem-merge=false/true) is provided to enable KSM,
  defaulting to true.

I am not sure if it makes sense for hugepages, but providing a
mem-huge=false/true defaulting to true might be a good idea.
Luiz Capitulino Oct. 22, 2012, 1:50 p.m. UTC | #4
On Sun, 21 Oct 2012 05:46:25 +0200
Aurelien Jarno <aurelien@aurel32.net> wrote:

> On Mon, Oct 15, 2012 at 03:57:54PM -0300, Luiz Capitulino wrote:
> > On Fri, 5 Oct 2012 16:47:57 -0300
> > Luiz Capitulino <lcapitulino@redhat.com> wrote:
> > 
> > > This makes it possible for QEMU to use transparent huge pages (THP)
> > > when transparent_hugepage/enabled=madvise. Otherwise THP is only
> > > used when it's enabled system wide.
> > > 
> > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> > 
> > ping?
> > 
> > > ---
> > >  exec.c  | 1 +
> > >  osdep.h | 5 +++++
> > >  2 files changed, 6 insertions(+)
> > > 
> > > diff --git a/exec.c b/exec.c
> > > index 1114a09..7504909 100644
> > > --- a/exec.c
> > > +++ b/exec.c
> > > @@ -2584,6 +2584,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
> > >      cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
> > >  
> > >      qemu_ram_setup_dump(new_block->host, size);
> > > +    qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
> > >  
> > >      if (kvm_enabled())
> > >          kvm_setup_guest_memory(new_block->host, size);
> > > diff --git a/osdep.h b/osdep.h
> > > index cb213e0..c5fd3d9 100644
> > > --- a/osdep.h
> > > +++ b/osdep.h
> > > @@ -108,6 +108,11 @@ void qemu_vfree(void *ptr);
> > >  #else
> > >  #define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
> > >  #endif
> > > +#ifdef MADV_HUGEPAGE
> > > +#define QEMU_MADV_HUGEPAGE MADV_HUGEPAGE
> > > +#else
> > > +#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
> > > +#endif
> > >  
> > >  #elif defined(CONFIG_POSIX_MADVISE)
> > >  
> > 
> 
> I don't know this part of QEMU very well, so I tried to compare with how
> it was done for KSM. I found two main differences:
> - In the case of -mem-path QEMU doesn't try to mark the pages as
>   mergeable.

As I wasn't completely sure that I could mark hugetlbfs areas as mergeable,
I skipped them. Also, _iirc_ I based my patch on a RHEL patch by Andrea that
did the same thing.

Needless to say, but it's trivial to also mark hugetlbfs as mergeable if
we want to.

Now, marking hugetlbfs areas as HUGEPAGE seems definitely wrong. But would
be nice if any of the CC'ed people could clarify these details.

> - An option (-machine mem-merge=false/true) is provided to enable KSM,
>   defaulting to true.
> 
> I am not sure if it makes sense for hugepages, but providing a
> mem-huge=false/true defaulting to true might be a good idea.

I thought about doing that, but went with a simpler version to get the
discussion started...
diff mbox

Patch

diff --git a/exec.c b/exec.c
index 1114a09..7504909 100644
--- a/exec.c
+++ b/exec.c
@@ -2584,6 +2584,7 @@  ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
 
     qemu_ram_setup_dump(new_block->host, size);
+    qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
 
     if (kvm_enabled())
         kvm_setup_guest_memory(new_block->host, size);
diff --git a/osdep.h b/osdep.h
index cb213e0..c5fd3d9 100644
--- a/osdep.h
+++ b/osdep.h
@@ -108,6 +108,11 @@  void qemu_vfree(void *ptr);
 #else
 #define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
 #endif
+#ifdef MADV_HUGEPAGE
+#define QEMU_MADV_HUGEPAGE MADV_HUGEPAGE
+#else
+#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
+#endif
 
 #elif defined(CONFIG_POSIX_MADVISE)