Patchwork add MADV_DONTFORK to guest physical memory

login
register
mail settings
Submitter Michael Roth
Date Jan. 5, 2011, 9:27 p.m.
Message ID <4D24E249.2040501@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/77639/
State New
Headers show

Comments

Michael Roth - Jan. 5, 2011, 9:27 p.m.
On 01/05/2011 02:35 PM, Andrea Arcangeli wrote:
> On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
>> Yah you're right, but I've seen several discussions about using mempath
>> for tmpfs/ram-backed files for things like numa/zram/etc so tend to
>> think of it as something potentially more than just a hook for
>> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK
>> stuff should still be immediately applicable.
>
> Yes, MADV_DONTFORK should be set all on all guest physical memory
> without options so I hope the new patch I just posted is fine to stop
> the spurious -ENOMEM failures in fork.

The patch in this thread? A couple paths still aren't covered when using 
-mem-path. Something like this should get them all:
Andrea Arcangeli - Jan. 6, 2011, 5:49 p.m.
On Wed, Jan 05, 2011 at 03:27:37PM -0600, Michael Roth wrote:
> On 01/05/2011 02:35 PM, Andrea Arcangeli wrote:
> > On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
> >> Yah you're right, but I've seen several discussions about using mempath
> >> for tmpfs/ram-backed files for things like numa/zram/etc so tend to
> >> think of it as something potentially more than just a hook for
> >> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK
> >> stuff should still be immediately applicable.
> >
> > Yes, MADV_DONTFORK should be set all on all guest physical memory
> > without options so I hope the new patch I just posted is fine to stop
> > the spurious -ENOMEM failures in fork.
> 
> The patch in this thread? A couple paths still aren't covered when using 
> -mem-path. Something like this should get them all:

Well the reason of MADV_DONTFORK is to avoid accounting issues with
anonymous memory, mem-path don't have that issue as hugetlbfs skips
the accounting (it has to because hugetlbfs are not always taken from
the regular page allocator). It could be however considered a minor
performance optimization.

Now you mention that you want to use -mem-path for other things too,
so maybe that's why you need it there too. BTW, if you ever need it
for more than hugetlbfs, I'm afraid this MAP_PRIVATE I see when
mem_prealloc isn't set, is going to screw any other potential useful
usage besides hugetlbfs, not exactly sure why it makes any sense to
use MAP_PRIVATE there and not only MAP_SHARED.
Michael Roth - Jan. 6, 2011, 8:49 p.m.
On 01/06/2011 11:49 AM, Andrea Arcangeli wrote:
> On Wed, Jan 05, 2011 at 03:27:37PM -0600, Michael Roth wrote:
>> On 01/05/2011 02:35 PM, Andrea Arcangeli wrote:
>>> On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
>>>> Yah you're right, but I've seen several discussions about using mempath
>>>> for tmpfs/ram-backed files for things like numa/zram/etc so tend to
>>>> think of it as something potentially more than just a hook for
>>>> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK
>>>> stuff should still be immediately applicable.
>>>
>>> Yes, MADV_DONTFORK should be set all on all guest physical memory
>>> without options so I hope the new patch I just posted is fine to stop
>>> the spurious -ENOMEM failures in fork.
>>
>> The patch in this thread? A couple paths still aren't covered when using
>> -mem-path. Something like this should get them all:
>
> Well the reason of MADV_DONTFORK is to avoid accounting issues with
> anonymous memory, mem-path don't have that issue as hugetlbfs skips
> the accounting (it has to because hugetlbfs are not always taken from
> the regular page allocator). It could be however considered a minor
> performance optimization.

That's one case, but there's also a wonky fallback in that path that 
defaults to the normal qemu_vmalloc():

if (mem_path) {
#if defined (__linux__) && !defined(TARGET_S390X)
             new_block->host = file_ram_alloc(new_block, size, mem_path);
             if (!new_block->host) {
                 new_block->host = qemu_vmalloc(size);
                 qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
             }

May make sense to only add coverage for that specific case though. If 
file_ram_alloc() is generalized we could deal with it then.

> Now you mention that you want to use -mem-path for other things too,
> so maybe that's why you need it there too. BTW, if you ever need it
> for more than hugetlbfs, I'm afraid this MAP_PRIVATE I see when
> mem_prealloc isn't set, is going to screw any other potential useful
> usage besides hugetlbfs, not exactly sure why it makes any sense to
> use MAP_PRIVATE there and not only MAP_SHARED.

Not sure either...but if it's another hugetlbfs-ism it shouldn't matter 
since using mem-path for something other than hugetlbfs would ideally be 
configurable with a -mem path=/dev/shm/vm0.mem,hugetlbfs=off or 
something along that line, which wouldn't necessarily set MAP_PRIVATE.

Patch

diff --git a/exec.c b/exec.c
index 49c28b1..cbdcb16 100644
--- a/exec.c
+++ b/exec.c
@@ -2853,6 +2853,9 @@  ram_addr_t qemu_ram_alloc_from_ptr(DeviceState 
*dev, const char *name,
  #endif
              qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
          }
+
+        /* no allocation failures during fork/exec for migrate/hotplug */
+        qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
      }