Message ID | 1374747075-7172-1-git-send-email-aarcange@redhat.com |
---|---|
State | New |
Headers | show |
On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote: > diff --git a/exec.c b/exec.c > index c99a883..d3bb58d 100644 > --- a/exec.c > +++ b/exec.c > @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, > > qemu_ram_setup_dump(new_block->host, size); > qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); > + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); > > if (kvm_enabled()) > kvm_setup_guest_memory(new_block->host, size); > kvm_setup_guest_memory() already calls qemu_madvise(start, size, QEMU_MADV_DONTFORK) so why do we need to do it here as well? If we should be doing it in all cases presumably the right fix is to move the if (!kvm_has_sync_mmu()) check in kvm_setup_guest_memory() from "do we call madvise" to "do we fail with an error if it failed". thanks -- PMM
On Thu, Jul 25, 2013 at 11:16:44AM +0100, Peter Maydell wrote: > On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote: > > diff --git a/exec.c b/exec.c > > index c99a883..d3bb58d 100644 > > --- a/exec.c > > +++ b/exec.c > > @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, > > > > qemu_ram_setup_dump(new_block->host, size); > > qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); > > + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); > > > > if (kvm_enabled()) > > kvm_setup_guest_memory(new_block->host, size); > > > > kvm_setup_guest_memory() already calls > qemu_madvise(start, size, QEMU_MADV_DONTFORK) > so why do we need to do it here as well? That only runs if kvm is enabled and mmu is not sync. But we need it in the common case too, to prevent -ENOMEM (if MADV_DONTFORK is available in the host OS, otherwise well we'll just do best effort and skip). See commit message for more details. > If we should be doing it in all cases presumably the right > fix is to move the if (!kvm_has_sync_mmu()) check in > kvm_setup_guest_memory() from "do we call madvise" to > "do we fail with an error if it failed". We could pass an error to kvm_setup_guest_memory but it's not worth it considering more likely we should abort if kvm is enabled and mmu is not sync (without bothering to call MADV_DONTFORK there).
On 07/25/2013 12:11 PM, Andrea Arcangeli wrote: > MADV_DONTFORK prevents fork to fail with -ENOMEM if the default > overcommit heuristics decides there's too much anonymous virtual > memory allocated. If the KVM secondary MMU is synchronized with MMU > notifiers or not, doesn't make a difference in that regard. > > Secondly it's always more efficient to avoid copying the guest > physical address space in the fork child (so we avoid to mark all the > guest memory readonly in the parent and so we skip the establishment > and teardown of lots of pagetables in the child). > > In the common case we can ignore the error if MADV_DONTFORK is not > available. Leave a second invocation that errors out in the KVM path > if MMU notifiers are missing and KVM is enabled, to abort in such > case. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > exec.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/exec.c b/exec.c > index c99a883..d3bb58d 100644 > --- a/exec.c > +++ b/exec.c > @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, > > qemu_ram_setup_dump(new_block->host, size); > qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); > + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); > > if (kvm_enabled()) > kvm_setup_guest_memory(new_block->host, size); > > PING. Benoit reported this on IRC, too. Paolo
Am 25.07.2013 12:32, schrieb Andrea Arcangeli: > On Thu, Jul 25, 2013 at 11:16:44AM +0100, Peter Maydell wrote: >> On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote: >>> diff --git a/exec.c b/exec.c >>> index c99a883..d3bb58d 100644 >>> --- a/exec.c >>> +++ b/exec.c >>> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, >>> >>> qemu_ram_setup_dump(new_block->host, size); >>> qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); >>> + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); >>> >>> if (kvm_enabled()) >>> kvm_setup_guest_memory(new_block->host, size); >>> >> >> kvm_setup_guest_memory() already calls >> qemu_madvise(start, size, QEMU_MADV_DONTFORK) >> so why do we need to do it here as well? > > That only runs if kvm is enabled and mmu is not sync. But we need it > in the common case too, to prevent -ENOMEM (if MADV_DONTFORK is > available in the host OS, otherwise well we'll just do best effort and > skip). See commit message for more details. So if we add the DONTFORK unconditionally here, why not drop it in said kvm_setup_guest_memory()? That would make the patch more self-documenting while at it. Andreas > >> If we should be doing it in all cases presumably the right >> fix is to move the if (!kvm_has_sync_mmu()) check in >> kvm_setup_guest_memory() from "do we call madvise" to >> "do we fail with an error if it failed". > > We could pass an error to kvm_setup_guest_memory but it's not worth it > considering more likely we should abort if kvm is enabled and mmu is > not sync (without bothering to call MADV_DONTFORK there). >
Le Thursday 25 Jul 2013 à 12:11:15 (+0200), Andrea Arcangeli a écrit : > MADV_DONTFORK prevents fork to fail with -ENOMEM if the default > overcommit heuristics decides there's too much anonymous virtual > memory allocated. If the KVM secondary MMU is synchronized with MMU > notifiers or not, doesn't make a difference in that regard. > > Secondly it's always more efficient to avoid copying the guest > physical address space in the fork child (so we avoid to mark all the > guest memory readonly in the parent and so we skip the establishment > and teardown of lots of pagetables in the child). > > In the common case we can ignore the error if MADV_DONTFORK is not > available. Leave a second invocation that errors out in the KVM path > if MMU notifiers are missing and KVM is enabled, to abort in such > case. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > exec.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/exec.c b/exec.c > index c99a883..d3bb58d 100644 > --- a/exec.c > +++ b/exec.c > @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, > > qemu_ram_setup_dump(new_block->host, size); > qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); > + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); > > if (kvm_enabled()) > kvm_setup_guest_memory(new_block->host, size); > This patch solve a bug where the network down script of a regular tap interface is not executed because fork fail when a pci-assigment is done on a large guest. Tested-By: Benoit Canet <benoit@irqsave.net>
Il 06/08/2013 18:47, Paolo Bonzini ha scritto: > On 07/25/2013 12:11 PM, Andrea Arcangeli wrote: >> MADV_DONTFORK prevents fork to fail with -ENOMEM if the default >> overcommit heuristics decides there's too much anonymous virtual >> memory allocated. If the KVM secondary MMU is synchronized with MMU >> notifiers or not, doesn't make a difference in that regard. >> >> Secondly it's always more efficient to avoid copying the guest >> physical address space in the fork child (so we avoid to mark all the >> guest memory readonly in the parent and so we skip the establishment >> and teardown of lots of pagetables in the child). >> >> In the common case we can ignore the error if MADV_DONTFORK is not >> available. Leave a second invocation that errors out in the KVM path >> if MMU notifiers are missing and KVM is enabled, to abort in such >> case. >> >> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> >> --- >> exec.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/exec.c b/exec.c >> index c99a883..d3bb58d 100644 >> --- a/exec.c >> +++ b/exec.c >> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t >> size, void *host, >> >> qemu_ram_setup_dump(new_block->host, size); >> qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); >> + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); >> >> if (kvm_enabled()) >> kvm_setup_guest_memory(new_block->host, size); >> >> > > PING. > > Benoit reported this on IRC, too. > > Paolo > > PING^2 The last paragraph of the commit message should answer Andreas's objection. Paolo
Il 06/08/2013 18:55, Andreas Färber ha scritto: > Am 25.07.2013 12:32, schrieb Andrea Arcangeli: >> On Thu, Jul 25, 2013 at 11:16:44AM +0100, Peter Maydell wrote: >>> On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote: >>>> diff --git a/exec.c b/exec.c >>>> index c99a883..d3bb58d 100644 >>>> --- a/exec.c >>>> +++ b/exec.c >>>> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, >>>> >>>> qemu_ram_setup_dump(new_block->host, size); >>>> qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); >>>> + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); >>>> >>>> if (kvm_enabled()) >>>> kvm_setup_guest_memory(new_block->host, size); >>>> >>> >>> kvm_setup_guest_memory() already calls >>> qemu_madvise(start, size, QEMU_MADV_DONTFORK) >>> so why do we need to do it here as well? >> >> That only runs if kvm is enabled and mmu is not sync. But we need it >> in the common case too, to prevent -ENOMEM (if MADV_DONTFORK is >> available in the host OS, otherwise well we'll just do best effort and >> skip). See commit message for more details. > > So if we add the DONTFORK unconditionally here, why not drop it in said > kvm_setup_guest_memory()? That would make the patch more > self-documenting while at it. This is mentioned in the commit message: "In the common case we can ignore the error if MADV_DONTFORK is not available. Leave a second invocation that errors out in the KVM path if MMU notifiers are missing and KVM is enabled, to abort in such case". Paolo
Am 30.08.2013 17:48, schrieb Paolo Bonzini: > Il 06/08/2013 18:47, Paolo Bonzini ha scritto: >> On 07/25/2013 12:11 PM, Andrea Arcangeli wrote: >>> MADV_DONTFORK prevents fork to fail with -ENOMEM if the default >>> overcommit heuristics decides there's too much anonymous virtual >>> memory allocated. If the KVM secondary MMU is synchronized with MMU >>> notifiers or not, doesn't make a difference in that regard. >>> >>> Secondly it's always more efficient to avoid copying the guest >>> physical address space in the fork child (so we avoid to mark all the >>> guest memory readonly in the parent and so we skip the establishment >>> and teardown of lots of pagetables in the child). >>> >>> In the common case we can ignore the error if MADV_DONTFORK is not >>> available. Leave a second invocation that errors out in the KVM path >>> if MMU notifiers are missing and KVM is enabled, to abort in such >>> case. >>> >>> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> >>> --- >>> exec.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/exec.c b/exec.c >>> index c99a883..d3bb58d 100644 >>> --- a/exec.c >>> +++ b/exec.c >>> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t >>> size, void *host, >>> >>> qemu_ram_setup_dump(new_block->host, size); >>> qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); >>> + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); >>> >>> if (kvm_enabled()) >>> kvm_setup_guest_memory(new_block->host, size); >>> >>> >> >> PING. >> >> Benoit reported this on IRC, too. >> >> Paolo >> >> > > PING^2 > > The last paragraph of the commit message should answer Andreas's objection. OK. The patch is mislabelled as "KVM:" though putting this in a common code path and touching exec.c only, so please put it on uq/master and fix that up before you set the history in stone. :) Andreas
Il 30/08/2013 17:52, Andreas Färber ha scritto: > OK. The patch is mislabelled as "KVM:" though putting this in a common > code path and touching exec.c only, so please put it on uq/master and > fix that up before you set the history in stone. :) Well, touching exec.c is why I wasn't picking it up through uq/master, but that's indeed a good observation! If Anthony doesn't pick it up, I'll do so as soon as it's my turn to manage the qemu-kvm.git repository. Paolo
On Fri, Aug 30, 2013 at 06:03:42PM +0200, Paolo Bonzini wrote: > Il 30/08/2013 17:52, Andreas Färber ha scritto: > > OK. The patch is mislabelled as "KVM:" though putting this in a common > > code path and touching exec.c only, so please put it on uq/master and > > fix that up before you set the history in stone. :) > > Well, touching exec.c is why I wasn't picking it up through uq/master, > but that's indeed a good observation! > > If Anthony doesn't pick it up, I'll do so as soon as it's my turn to > manage the qemu-kvm.git repository. > Send your acks and I will merge it. -- Gleb.
Il 30/08/2013 18:21, Gleb Natapov ha scritto: > On Fri, Aug 30, 2013 at 06:03:42PM +0200, Paolo Bonzini wrote: >> Il 30/08/2013 17:52, Andreas Färber ha scritto: >>> OK. The patch is mislabelled as "KVM:" though putting this in a common >>> code path and touching exec.c only, so please put it on uq/master and >>> fix that up before you set the history in stone. :) >> >> Well, touching exec.c is why I wasn't picking it up through uq/master, >> but that's indeed a good observation! >> >> If Anthony doesn't pick it up, I'll do so as soon as it's my turn to >> manage the qemu-kvm.git repository. >> > Send your acks and I will merge it. Acked-by: Paolo Bonzini <pbonzini@redhat.com> with s/KVM/exec/ in the subject line
On Thu, Jul 25, 2013 at 12:11:15PM +0200, Andrea Arcangeli wrote: > MADV_DONTFORK prevents fork to fail with -ENOMEM if the default > overcommit heuristics decides there's too much anonymous virtual > memory allocated. If the KVM secondary MMU is synchronized with MMU > notifiers or not, doesn't make a difference in that regard. > > Secondly it's always more efficient to avoid copying the guest > physical address space in the fork child (so we avoid to mark all the > guest memory readonly in the parent and so we skip the establishment > and teardown of lots of pagetables in the child). > > In the common case we can ignore the error if MADV_DONTFORK is not > available. Leave a second invocation that errors out in the KVM path > if MMU notifiers are missing and KVM is enabled, to abort in such > case. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Applied to uq/master, thanks. > --- > exec.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/exec.c b/exec.c > index c99a883..d3bb58d 100644 > --- a/exec.c > +++ b/exec.c > @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, > > qemu_ram_setup_dump(new_block->host, size); > qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); > + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); > > if (kvm_enabled()) > kvm_setup_guest_memory(new_block->host, size); -- Gleb.
diff --git a/exec.c b/exec.c index c99a883..d3bb58d 100644 --- a/exec.c +++ b/exec.c @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, qemu_ram_setup_dump(new_block->host, size); qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); + qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK); if (kvm_enabled()) kvm_setup_guest_memory(new_block->host, size);
MADV_DONTFORK prevents fork to fail with -ENOMEM if the default overcommit heuristics decides there's too much anonymous virtual memory allocated. If the KVM secondary MMU is synchronized with MMU notifiers or not, doesn't make a difference in that regard. Secondly it's always more efficient to avoid copying the guest physical address space in the fork child (so we avoid to mark all the guest memory readonly in the parent and so we skip the establishment and teardown of lots of pagetables in the child). In the common case we can ignore the error if MADV_DONTFORK is not available. Leave a second invocation that errors out in the KVM path if MMU notifiers are missing and KVM is enabled, to abort in such case. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- exec.c | 1 + 1 file changed, 1 insertion(+)