diff mbox

qemu and transparent huge pages

Message ID 5055B5C2.9040700@msgid.tls.msk.ru
State New
Headers show

Commit Message

Michael Tokarev Sept. 16, 2012, 11:19 a.m. UTC
So, is the patch okay?

Thanks,

/mjt

On 15.08.2012 19:03, Michael Tokarev wrote:
> On 15.08.2012 18:26, Avi Kivity wrote:
>> On 08/15/2012 05:22 PM, Michael Tokarev wrote:
>>
>>>>
>>>> Please provide extra info, like the setting of
>>>> /sys/kernel/mm/transparent_hugepage/enabled.
>>>
>>> That was it - sort of.  Default value here is enabled=madvise.
>>> When setting it to always the effect finally started appearing,
>>> so it is actually working.
>>>
>>> But can't qemu set MADV_HUGEPAGE flag too, so it works automatically?
>>
>> It can and should.
> 
> Something like the attached patch?
> 
> Thanks,
> 
> /mjt

Comments

Michael Tokarev Nov. 12, 2012, 3:18 p.m. UTC | #1
Ping^2 ?

Thanks,

/mjt

16.09.2012 15:19, Michael Tokarev wrote:
> So, is the patch okay?
> 
> Thanks,
> 
> /mjt
> 
> On 15.08.2012 19:03, Michael Tokarev wrote:
>> On 15.08.2012 18:26, Avi Kivity wrote:
>>> On 08/15/2012 05:22 PM, Michael Tokarev wrote:
>>>
>>>>>
>>>>> Please provide extra info, like the setting of
>>>>> /sys/kernel/mm/transparent_hugepage/enabled.
>>>>
>>>> That was it - sort of.  Default value here is enabled=madvise.
>>>> When setting it to always the effect finally started appearing,
>>>> so it is actually working.
>>>>
>>>> But can't qemu set MADV_HUGEPAGE flag too, so it works automatically?
>>>
>>> It can and should.
>>
>> Something like the attached patch?
>>
>> Thanks,
>>
>> /mjt
>
Aurelien Jarno Nov. 13, 2012, 2:30 p.m. UTC | #2
Isn't ad0b5321f1f797274603ebbe20108b0750baee94 enough?

On Mon, Nov 12, 2012 at 07:18:49PM +0400, Michael Tokarev wrote:
> Ping^2 ?
> 
> Thanks,
> 
> /mjt
> 
> 16.09.2012 15:19, Michael Tokarev wrote:
> > So, is the patch okay?
> > 
> > Thanks,
> > 
> > /mjt
> > 
> > On 15.08.2012 19:03, Michael Tokarev wrote:
> >> On 15.08.2012 18:26, Avi Kivity wrote:
> >>> On 08/15/2012 05:22 PM, Michael Tokarev wrote:
> >>>
> >>>>>
> >>>>> Please provide extra info, like the setting of
> >>>>> /sys/kernel/mm/transparent_hugepage/enabled.
> >>>>
> >>>> That was it - sort of.  Default value here is enabled=madvise.
> >>>> When setting it to always the effect finally started appearing,
> >>>> so it is actually working.
> >>>>
> >>>> But can't qemu set MADV_HUGEPAGE flag too, so it works automatically?
> >>>
> >>> It can and should.
> >>
> >> Something like the attached patch?
> >>
> >> Thanks,
> >>
> >> /mjt
> > 
> 
> 
>
Michael Tokarev Nov. 13, 2012, 4:38 p.m. UTC | #3
On 13.11.2012 18:30, Aurelien Jarno wrote:
> Isn't ad0b5321f1f797274603ebbe20108b0750baee94 enough?

Oh.  It has been applied.  I expected it will be ignored
just like my patch has been.

No, it is not enough: that patch alone does nothing for
the alignment on at least x86, which is necessary for
hugepages to work.  My patch _also_ fixes alignment issue.

Where to apply MADV_HUGEPAGE is a different question.
I don't know which layer it is best to apply it to.

Thanks,

/mjt

> On Mon, Nov 12, 2012 at 07:18:49PM +0400, Michael Tokarev wrote:
>> Ping^2 ?
>>
>> Thanks,
>>
>> /mjt
>>
>> 16.09.2012 15:19, Michael Tokarev wrote:
>>> So, is the patch okay?
>>>
>>> Thanks,
>>>
>>> /mjt
>>>
>>> On 15.08.2012 19:03, Michael Tokarev wrote:
>>>> On 15.08.2012 18:26, Avi Kivity wrote:
>>>>> On 08/15/2012 05:22 PM, Michael Tokarev wrote:
>>>>>
>>>>>>>
>>>>>>> Please provide extra info, like the setting of
>>>>>>> /sys/kernel/mm/transparent_hugepage/enabled.
>>>>>>
>>>>>> That was it - sort of.  Default value here is enabled=madvise.
>>>>>> When setting it to always the effect finally started appearing,
>>>>>> so it is actually working.
>>>>>>
>>>>>> But can't qemu set MADV_HUGEPAGE flag too, so it works automatically?
>>>>>
>>>>> It can and should.
>>>>
>>>> Something like the attached patch?
>>>>
>>>> Thanks,
>>>>
>>>> /mjt
>>>
>>
>>
>>
>
diff mbox

Patch

From 705b3efb8c0cf06cbf087204fc61863c2bbb9e27 Mon Sep 17 00:00:00 2001
From: Michael Tokarev <mjt@tls.msk.ru>
Date: Wed, 15 Aug 2012 18:55:16 +0400
Subject: [PATCH] mark large vmalloc areas as MADV_HUGEPAGE and allow
 hugepages on i386

A followup to commit 36b586284e678d.

On linux only (which supports transparent hugepages), explicitly mark
large vmalloced areas with madvise(MADV_HUGEPAGES).  The patch changes
previous logic a bit to allow inserting the call to madvise(), but keeps
the code the same (and saves one call to getpagesize() per allocation).

The code also adds #include <sys/mman.h> to the linux-specific part,
to get MADV_HUGEPAGES definition.

While at it, enable transparent hugepages (alignment and the new
explicit marking with madvise()) for 32bit x86 too - it makes good
sense for, say, 32bit userspace on 64bit kernel.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
---
 oslib-posix.c |   35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/oslib-posix.c b/oslib-posix.c
index dbeb627..ab32d6d 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -35,19 +35,23 @@ 
 extern int daemon(int, int);
 #endif
 
-#if defined(__linux__) && defined(__x86_64__)
+#ifdef __linux__
+# include <sys/mman.h>
+
+# if defined(__x86_64__) || defined(__i386__)
    /* Use 2 MiB alignment so transparent hugepages can be used by KVM.
       Valgrind does not support alignments larger than 1 MiB,
       therefore we need special code which handles running on Valgrind. */
-#  define QEMU_VMALLOC_ALIGN (512 * 4096)
+#  define QEMU_VMALLOC_ALIGN_HUGE (512 * 4096)
 #  define CONFIG_VALGRIND
-#elif defined(__linux__) && defined(__s390x__)
+# elif defined(__s390x__)
    /* Use 1 MiB (segment size) alignment so gmap can be used by KVM. */
-#  define QEMU_VMALLOC_ALIGN (256 * 4096)
-#else
-#  define QEMU_VMALLOC_ALIGN getpagesize()
+#  define QEMU_VMALLOC_ALIGN_HUGE (256 * 4096)
+# endif
 #endif
 
+#define QEMU_VMALLOC_ALIGN getpagesize()
+
 #include "config-host.h"
 #include "sysemu.h"
 #include "trace.h"
@@ -114,7 +118,6 @@  void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_vmalloc(size_t size)
 {
     void *ptr;
-    size_t align = QEMU_VMALLOC_ALIGN;
 
 #if defined(CONFIG_VALGRIND)
     if (running_on_valgrind < 0) {
@@ -125,10 +128,22 @@  void *qemu_vmalloc(size_t size)
     }
 #endif
 
-    if (size < align || running_on_valgrind) {
-        align = getpagesize();
+#ifdef QEMU_VMALLOC_ALIGN_HUGE
+    /* try to allocate as huge pages if supported and large enough */
+    if (size >= QEMU_VMALLOC_ALIGN_HUGE && !running_on_valgrind) {
+        ptr = qemu_memalign(QEMU_VMALLOC_ALIGN_HUGE, size);
+#ifdef MADV_HUGEPAGE
+#error
+        qemu_madvise(ptr, size, MADV_HUGEPAGE);
+#endif
     }
-    ptr = qemu_memalign(align, size);
+    else
+#endif
+    {
+        /* if unsupported or small, allocate pagesize-aligned */
+        ptr = qemu_memalign(QEMU_VMALLOC_ALIGN, size);
+    }
+
     trace_qemu_vmalloc(size, ptr);
     return ptr;
 }
-- 
1.7.10.4