diff mbox

add MADV_DONTFORK to guest physical memory

Message ID 20110105151012.GC15823@random.random
State New
Headers show

Commit Message

Andrea Arcangeli Jan. 5, 2011, 3:10 p.m. UTC
The bug is still there so I rediffed the old patch against current
code.

On a related topic: could somebody give me advice on how to implement
a command line (command line seems enough, the other option would be
monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
THP working fine but KSM may decrease performance by increasing the
number of copy on write and by splitting hugepages, so we'd like to be
able to turn off KSM on a per-VM basis (not on the whole host, which
of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
that high perf VMs will keep running at maximum speed with KSM off but
others may still benefit from KSM. For that I need to make the below
MADV_MERGEABLE madvise conditional to something and the code itself
will be trivial, we've just to converge on a command line option
(hopefully quickly ;).

Comments

Michael Roth Jan. 5, 2011, 6:02 p.m. UTC | #1
On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
> The bug is still there so I rediffed the old patch against current
> code.
>
> On a related topic: could somebody give me advice on how to implement
> a command line (command line seems enough, the other option would be
> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
> THP working fine but KSM may decrease performance by increasing the
> number of copy on write and by splitting hugepages, so we'd like to be
> able to turn off KSM on a per-VM basis (not on the whole host, which
> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
> that high perf VMs will keep running at maximum speed with KSM off but
> others may still benefit from KSM. For that I need to make the below
> MADV_MERGEABLE madvise conditional to something and the code itself
> will be trivial, we've just to converge on a command line option
> (hopefully quickly ;).

There was a -mem_prealloc option added a while back to set MAP_POPULATE 
on memory mapped in via the -mem-path option. So an analogous 
-mem_nomerge option or something along that line seems reasonable for 
conditionally unsetting QEMU_MADV_MERGEABLE.

And for consistency you should probably make both your proposed changes 
for -mem-path'd memory as well.
Alexander Graf Jan. 5, 2011, 7:44 p.m. UTC | #2
On 05.01.2011, at 19:02, Michael Roth wrote:

> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>> The bug is still there so I rediffed the old patch against current
>> code.
>> 
>> On a related topic: could somebody give me advice on how to implement
>> a command line (command line seems enough, the other option would be
>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>> THP working fine but KSM may decrease performance by increasing the
>> number of copy on write and by splitting hugepages, so we'd like to be
>> able to turn off KSM on a per-VM basis (not on the whole host, which
>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>> that high perf VMs will keep running at maximum speed with KSM off but
>> others may still benefit from KSM. For that I need to make the below
>> MADV_MERGEABLE madvise conditional to something and the code itself
>> will be trivial, we've just to converge on a command line option
>> (hopefully quickly ;).
> 
> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
> 
> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.

Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?

-mem size=512,populate=on,ksm=off

and default -m to something reasonable with the new syntax.


Alex
Andrea Arcangeli Jan. 5, 2011, 7:54 p.m. UTC | #3
Hello everyone,

On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
> 
> On 05.01.2011, at 19:02, Michael Roth wrote:
> 
> > On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
> >> The bug is still there so I rediffed the old patch against current
> >> code.
> >> 
> >> On a related topic: could somebody give me advice on how to implement
> >> a command line (command line seems enough, the other option would be
> >> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
> >> THP working fine but KSM may decrease performance by increasing the
> >> number of copy on write and by splitting hugepages, so we'd like to be
> >> able to turn off KSM on a per-VM basis (not on the whole host, which
> >> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
> >> that high perf VMs will keep running at maximum speed with KSM off but
> >> others may still benefit from KSM. For that I need to make the below
> >> MADV_MERGEABLE madvise conditional to something and the code itself
> >> will be trivial, we've just to converge on a command line option
> >> (hopefully quickly ;).
> > 
> > There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
> > 
> > And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
> 
> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
> 
> -mem size=512,populate=on,ksm=off
> 
> and default -m to something reasonable with the new syntax.

I'm neutral... so feel free to decide what I should implement ;).

One comment on combining -m ksm=off (or -mem_nomerge) with
-mem-path. It seems unnecessary because ksm can't be turned on on
VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
makes sense if used in combination with hugetlbfs (which sets
VM_HUGETLB of course).
Alexander Graf Jan. 5, 2011, 8 p.m. UTC | #4
On 05.01.2011, at 20:54, Andrea Arcangeli wrote:

> Hello everyone,
> 
> On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
>> 
>> On 05.01.2011, at 19:02, Michael Roth wrote:
>> 
>>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>>> The bug is still there so I rediffed the old patch against current
>>>> code.
>>>> 
>>>> On a related topic: could somebody give me advice on how to implement
>>>> a command line (command line seems enough, the other option would be
>>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>>> THP working fine but KSM may decrease performance by increasing the
>>>> number of copy on write and by splitting hugepages, so we'd like to be
>>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>>> that high perf VMs will keep running at maximum speed with KSM off but
>>>> others may still benefit from KSM. For that I need to make the below
>>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>>> will be trivial, we've just to converge on a command line option
>>>> (hopefully quickly ;).
>>> 
>>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>> 
>>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>> 
>> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>> 
>> -mem size=512,populate=on,ksm=off
>> 
>> and default -m to something reasonable with the new syntax.
> 
> I'm neutral... so feel free to decide what I should implement ;).
> 
> One comment on combining -m ksm=off (or -mem_nomerge) with
> -mem-path. It seems unnecessary because ksm can't be turned on on
> VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
> makes sense if used in combination with hugetlbfs (which sets
> VM_HUGETLB of course).

Sure, not all combinations make sense. But "-mem size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH should go along the same lines here too. It'd just be a flag "tph" that defaults to on if available.

That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though.


Alex
Anthony Liguori Jan. 5, 2011, 8:10 p.m. UTC | #5
On 01/05/2011 01:44 PM, Alexander Graf wrote:
> On 05.01.2011, at 19:02, Michael Roth wrote:
>
>    
>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>      
>>> The bug is still there so I rediffed the old patch against current
>>> code.
>>>
>>> On a related topic: could somebody give me advice on how to implement
>>> a command line (command line seems enough, the other option would be
>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>> THP working fine but KSM may decrease performance by increasing the
>>> number of copy on write and by splitting hugepages, so we'd like to be
>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>> that high perf VMs will keep running at maximum speed with KSM off but
>>> others may still benefit from KSM. For that I need to make the below
>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>> will be trivial, we've just to converge on a command line option
>>> (hopefully quickly ;).
>>>        
>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>
>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>>      
> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>
> -mem size=512,populate=on,ksm=off
>
> and default -m to something reasonable with the new syntax.
>    

Yeah, that does make sense.  Maybe we should consider folding it in with 
the -numa option too?

Regards,

Anthony Liguori

>
> Alex
>
>
Anthony Liguori Jan. 5, 2011, 8:12 p.m. UTC | #6
On 01/05/2011 02:00 PM, Alexander Graf wrote:
> On 05.01.2011, at 20:54, Andrea Arcangeli wrote:
>
>    
>> Hello everyone,
>>
>> On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
>>      
>>> On 05.01.2011, at 19:02, Michael Roth wrote:
>>>
>>>        
>>>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>>>          
>>>>> The bug is still there so I rediffed the old patch against current
>>>>> code.
>>>>>
>>>>> On a related topic: could somebody give me advice on how to implement
>>>>> a command line (command line seems enough, the other option would be
>>>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>>>> THP working fine but KSM may decrease performance by increasing the
>>>>> number of copy on write and by splitting hugepages, so we'd like to be
>>>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>>>> that high perf VMs will keep running at maximum speed with KSM off but
>>>>> others may still benefit from KSM. For that I need to make the below
>>>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>>>> will be trivial, we've just to converge on a command line option
>>>>> (hopefully quickly ;).
>>>>>            
>>>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>>>
>>>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>>>>          
>>> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>>>
>>> -mem size=512,populate=on,ksm=off
>>>
>>> and default -m to something reasonable with the new syntax.
>>>        
>> I'm neutral... so feel free to decide what I should implement ;).
>>
>> One comment on combining -m ksm=off (or -mem_nomerge) with
>> -mem-path. It seems unnecessary because ksm can't be turned on on
>> VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
>> makes sense if used in combination with hugetlbfs (which sets
>> VM_HUGETLB of course).
>>      
> Sure, not all combinations make sense. But "-mem size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH should go along the same lines here too. It'd just be a flag "tph" that defaults to on if available.
>
> That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though.
>    

Where it's really helpful is in the ever elusive configuration file 
format.  -mem becomes:

[mem]
size=1G
path=/dev/shm/vm1.ram
populate=on

Which is nice from a grouping perspective.

Regards,

Anthony Liguori

> Alex
>
>
Andrea Arcangeli Jan. 5, 2011, 8:15 p.m. UTC | #7
On Wed, Jan 05, 2011 at 09:00:45PM +0100, Alexander Graf wrote:
> Sure, not all combinations make sense. But "-mem
> size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH

I was referring to Michael's suggestion, sure many options make sense
for mem-path too, in fact populate makes more sense for mem-path
(considering if hugetlbfs allocation fails guest may die if
libhugetlbfs isn't trapping the -ENOMEM and doing munmap creating hole
and replacing hole with regular anoymous memory).

> should go along the same lines here too. It'd just be a flag "tph"
> that defaults to on if available.

THP is already default on (thp only requires us to 2m align the
virtual address of guest physical memory which we should do by default
even for qemu no-kvm). We can add a thp=off switch on the same lines
of ksm=off.

> That way we could also do all the sanity checks in a single place. I
> really like the idea of combining memory management command line
> parameters into a single option :). In the end I'd assume it's
> Anthony's call though.

Ok, let me know ;). Sounds like a lot of trouble compared to
-mem_nomerge, but I agree it may be worth it for the long term.

Thanks,
Andrea
Michael Roth Jan. 5, 2011, 8:26 p.m. UTC | #8
On 01/05/2011 01:54 PM, Andrea Arcangeli wrote:
> Hello everyone,
>
> On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
>>
>> On 05.01.2011, at 19:02, Michael Roth wrote:
>>
>>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>>> The bug is still there so I rediffed the old patch against current
>>>> code.
>>>>
>>>> On a related topic: could somebody give me advice on how to implement
>>>> a command line (command line seems enough, the other option would be
>>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>>> THP working fine but KSM may decrease performance by increasing the
>>>> number of copy on write and by splitting hugepages, so we'd like to be
>>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>>> that high perf VMs will keep running at maximum speed with KSM off but
>>>> others may still benefit from KSM. For that I need to make the below
>>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>>> will be trivial, we've just to converge on a command line option
>>>> (hopefully quickly ;).
>>>
>>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>>
>>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>>
>> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>>
>> -mem size=512,populate=on,ksm=off
>>
>> and default -m to something reasonable with the new syntax.
>
> I'm neutral... so feel free to decide what I should implement ;).

Have to agree the consolidated approach definitely seems nicer.

>
> One comment on combining -m ksm=off (or -mem_nomerge) with
> -mem-path. It seems unnecessary because ksm can't be turned on on
> VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
> makes sense if used in combination with hugetlbfs (which sets
> VM_HUGETLB of course).
>

Yah you're right, but I've seen several discussions about using mempath 
for tmpfs/ram-backed files for things like numa/zram/etc so tend to 
think of it as something potentially more than just a hook for 
hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK 
stuff should still be immediately applicable.
Andrea Arcangeli Jan. 5, 2011, 8:35 p.m. UTC | #9
On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
> Yah you're right, but I've seen several discussions about using mempath 
> for tmpfs/ram-backed files for things like numa/zram/etc so tend to 
> think of it as something potentially more than just a hook for 
> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK 
> stuff should still be immediately applicable.

Yes, MADV_DONTFORK should be set all on all guest physical memory
without options so I hope the new patch I just posted is fine to stop
the spurious -ENOMEM failures in fork.

The ksm part should go incremental but 99% of it will be about
changing the command line, making the madvise conditional will be
trivial as well.
diff mbox

Patch

======
Subject: avoid allocation failures during fork/exec for migrate/hotplug

From: Andrea Arcangeli <aarcange@redhat.com>

Mark all guest physical memory as MADV_DONTFORK to avoid false positive
allocation failures during fork in migrate/netdev hotplug.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/exec.c b/exec.c
index db9ff55..9e2aa12 100644
--- a/exec.c
+++ b/exec.c
@@ -2851,6 +2851,9 @@  ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
             new_block->host = qemu_vmalloc(size);
 #endif
             qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
+
+            /* no allocation failures during fork/exec for migrate/hotplug */
+            qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
         }
     }