Message ID | 20110105151012.GC15823@random.random |
---|---|
State | New |
Headers | show |
On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: > The bug is still there so I rediffed the old patch against current > code. > > On a related topic: could somebody give me advice on how to implement > a command line (command line seems enough, the other option would be > monitor command) to make the MADV_MERGEABLE conditional? I got KSM on > THP working fine but KSM may decrease performance by increasing the > number of copy on write and by splitting hugepages, so we'd like to be > able to turn off KSM on a per-VM basis (not on the whole host, which > of course we already can by setting /sys/kernel/mm/ksm/run to 0) so > that high perf VMs will keep running at maximum speed with KSM off but > others may still benefit from KSM. For that I need to make the below > MADV_MERGEABLE madvise conditional to something and the code itself > will be trivial, we've just to converge on a command line option > (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
On 05.01.2011, at 19:02, Michael Roth wrote: > On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: >> The bug is still there so I rediffed the old patch against current >> code. >> >> On a related topic: could somebody give me advice on how to implement >> a command line (command line seems enough, the other option would be >> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on >> THP working fine but KSM may decrease performance by increasing the >> number of copy on write and by splitting hugepages, so we'd like to be >> able to turn off KSM on a per-VM basis (not on the whole host, which >> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so >> that high perf VMs will keep running at maximum speed with KSM off but >> others may still benefit from KSM. For that I need to make the below >> MADV_MERGEABLE madvise conditional to something and the code itself >> will be trivial, we've just to converge on a command line option >> (hopefully quickly ;). > > There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. > > And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? -mem size=512,populate=on,ksm=off and default -m to something reasonable with the new syntax. Alex
Hello everyone, On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: > > On 05.01.2011, at 19:02, Michael Roth wrote: > > > On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: > >> The bug is still there so I rediffed the old patch against current > >> code. > >> > >> On a related topic: could somebody give me advice on how to implement > >> a command line (command line seems enough, the other option would be > >> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on > >> THP working fine but KSM may decrease performance by increasing the > >> number of copy on write and by splitting hugepages, so we'd like to be > >> able to turn off KSM on a per-VM basis (not on the whole host, which > >> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so > >> that high perf VMs will keep running at maximum speed with KSM off but > >> others may still benefit from KSM. For that I need to make the below > >> MADV_MERGEABLE madvise conditional to something and the code itself > >> will be trivial, we've just to converge on a command line option > >> (hopefully quickly ;). > > > > There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. > > > > And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. > > Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? > > -mem size=512,populate=on,ksm=off > > and default -m to something reasonable with the new syntax. I'm neutral... so feel free to decide what I should implement ;). One comment on combining -m ksm=off (or -mem_nomerge) with -mem-path. It seems unnecessary because ksm can't be turned on on VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only makes sense if used in combination with hugetlbfs (which sets VM_HUGETLB of course).
On 05.01.2011, at 20:54, Andrea Arcangeli wrote: > Hello everyone, > > On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: >> >> On 05.01.2011, at 19:02, Michael Roth wrote: >> >>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: >>>> The bug is still there so I rediffed the old patch against current >>>> code. >>>> >>>> On a related topic: could somebody give me advice on how to implement >>>> a command line (command line seems enough, the other option would be >>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on >>>> THP working fine but KSM may decrease performance by increasing the >>>> number of copy on write and by splitting hugepages, so we'd like to be >>>> able to turn off KSM on a per-VM basis (not on the whole host, which >>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so >>>> that high perf VMs will keep running at maximum speed with KSM off but >>>> others may still benefit from KSM. For that I need to make the below >>>> MADV_MERGEABLE madvise conditional to something and the code itself >>>> will be trivial, we've just to converge on a command line option >>>> (hopefully quickly ;). >>> >>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. >>> >>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. >> >> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? >> >> -mem size=512,populate=on,ksm=off >> >> and default -m to something reasonable with the new syntax. > > I'm neutral... so feel free to decide what I should implement ;). > > One comment on combining -m ksm=off (or -mem_nomerge) with > -mem-path. It seems unnecessary because ksm can't be turned on on > VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only > makes sense if used in combination with hugetlbfs (which sets > VM_HUGETLB of course). Sure, not all combinations make sense. But "-mem size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH should go along the same lines here too. It'd just be a flag "tph" that defaults to on if available. That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though. Alex
On 01/05/2011 01:44 PM, Alexander Graf wrote: > On 05.01.2011, at 19:02, Michael Roth wrote: > > >> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: >> >>> The bug is still there so I rediffed the old patch against current >>> code. >>> >>> On a related topic: could somebody give me advice on how to implement >>> a command line (command line seems enough, the other option would be >>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on >>> THP working fine but KSM may decrease performance by increasing the >>> number of copy on write and by splitting hugepages, so we'd like to be >>> able to turn off KSM on a per-VM basis (not on the whole host, which >>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so >>> that high perf VMs will keep running at maximum speed with KSM off but >>> others may still benefit from KSM. For that I need to make the below >>> MADV_MERGEABLE madvise conditional to something and the code itself >>> will be trivial, we've just to converge on a command line option >>> (hopefully quickly ;). >>> >> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. >> >> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. >> > Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? > > -mem size=512,populate=on,ksm=off > > and default -m to something reasonable with the new syntax. > Yeah, that does make sense. Maybe we should consider folding it in with the -numa option too? Regards, Anthony Liguori > > Alex > >
On 01/05/2011 02:00 PM, Alexander Graf wrote: > On 05.01.2011, at 20:54, Andrea Arcangeli wrote: > > >> Hello everyone, >> >> On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: >> >>> On 05.01.2011, at 19:02, Michael Roth wrote: >>> >>> >>>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: >>>> >>>>> The bug is still there so I rediffed the old patch against current >>>>> code. >>>>> >>>>> On a related topic: could somebody give me advice on how to implement >>>>> a command line (command line seems enough, the other option would be >>>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on >>>>> THP working fine but KSM may decrease performance by increasing the >>>>> number of copy on write and by splitting hugepages, so we'd like to be >>>>> able to turn off KSM on a per-VM basis (not on the whole host, which >>>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so >>>>> that high perf VMs will keep running at maximum speed with KSM off but >>>>> others may still benefit from KSM. For that I need to make the below >>>>> MADV_MERGEABLE madvise conditional to something and the code itself >>>>> will be trivial, we've just to converge on a command line option >>>>> (hopefully quickly ;). >>>>> >>>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. >>>> >>>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. >>>> >>> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? >>> >>> -mem size=512,populate=on,ksm=off >>> >>> and default -m to something reasonable with the new syntax. >>> >> I'm neutral... so feel free to decide what I should implement ;). >> >> One comment on combining -m ksm=off (or -mem_nomerge) with >> -mem-path. It seems unnecessary because ksm can't be turned on on >> VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only >> makes sense if used in combination with hugetlbfs (which sets >> VM_HUGETLB of course). >> > Sure, not all combinations make sense. But "-mem size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH should go along the same lines here too. It'd just be a flag "tph" that defaults to on if available. > > That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though. > Where it's really helpful is in the ever elusive configuration file format. -mem becomes: [mem] size=1G path=/dev/shm/vm1.ram populate=on Which is nice from a grouping perspective. Regards, Anthony Liguori > Alex > >
On Wed, Jan 05, 2011 at 09:00:45PM +0100, Alexander Graf wrote: > Sure, not all combinations make sense. But "-mem > size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH I was referring to Michael's suggestion, sure many options make sense for mem-path too, in fact populate makes more sense for mem-path (considering if hugetlbfs allocation fails guest may die if libhugetlbfs isn't trapping the -ENOMEM and doing munmap creating hole and replacing hole with regular anoymous memory). > should go along the same lines here too. It'd just be a flag "tph" > that defaults to on if available. THP is already default on (thp only requires us to 2m align the virtual address of guest physical memory which we should do by default even for qemu no-kvm). We can add a thp=off switch on the same lines of ksm=off. > That way we could also do all the sanity checks in a single place. I > really like the idea of combining memory management command line > parameters into a single option :). In the end I'd assume it's > Anthony's call though. Ok, let me know ;). Sounds like a lot of trouble compared to -mem_nomerge, but I agree it may be worth it for the long term. Thanks, Andrea
On 01/05/2011 01:54 PM, Andrea Arcangeli wrote: > Hello everyone, > > On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: >> >> On 05.01.2011, at 19:02, Michael Roth wrote: >> >>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: >>>> The bug is still there so I rediffed the old patch against current >>>> code. >>>> >>>> On a related topic: could somebody give me advice on how to implement >>>> a command line (command line seems enough, the other option would be >>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on >>>> THP working fine but KSM may decrease performance by increasing the >>>> number of copy on write and by splitting hugepages, so we'd like to be >>>> able to turn off KSM on a per-VM basis (not on the whole host, which >>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so >>>> that high perf VMs will keep running at maximum speed with KSM off but >>>> others may still benefit from KSM. For that I need to make the below >>>> MADV_MERGEABLE madvise conditional to something and the code itself >>>> will be trivial, we've just to converge on a command line option >>>> (hopefully quickly ;). >>> >>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. >>> >>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. >> >> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? >> >> -mem size=512,populate=on,ksm=off >> >> and default -m to something reasonable with the new syntax. > > I'm neutral... so feel free to decide what I should implement ;). Have to agree the consolidated approach definitely seems nicer. > > One comment on combining -m ksm=off (or -mem_nomerge) with > -mem-path. It seems unnecessary because ksm can't be turned on on > VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only > makes sense if used in combination with hugetlbfs (which sets > VM_HUGETLB of course). > Yah you're right, but I've seen several discussions about using mempath for tmpfs/ram-backed files for things like numa/zram/etc so tend to think of it as something potentially more than just a hook for hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK stuff should still be immediately applicable.
On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote: > Yah you're right, but I've seen several discussions about using mempath > for tmpfs/ram-backed files for things like numa/zram/etc so tend to > think of it as something potentially more than just a hook for > hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK > stuff should still be immediately applicable. Yes, MADV_DONTFORK should be set all on all guest physical memory without options so I hope the new patch I just posted is fine to stop the spurious -ENOMEM failures in fork. The ksm part should go incremental but 99% of it will be about changing the command line, making the madvise conditional will be trivial as well.
====== Subject: avoid allocation failures during fork/exec for migrate/hotplug From: Andrea Arcangeli <aarcange@redhat.com> Mark all guest physical memory as MADV_DONTFORK to avoid false positive allocation failures during fork in migrate/netdev hotplug. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- diff --git a/exec.c b/exec.c index db9ff55..9e2aa12 100644 --- a/exec.c +++ b/exec.c @@ -2851,6 +2851,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, new_block->host = qemu_vmalloc(size); #endif qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE); + + /* no allocation failures during fork/exec for migrate/hotplug */ + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); } }