Patchwork KVM: always use MADV_DONTFORK

login
register
mail settings
Submitter Andrea Arcangeli
Date July 25, 2013, 10:11 a.m.
Message ID <1374747075-7172-1-git-send-email-aarcange@redhat.com>
Download mbox | patch
Permalink /patch/261663/
State New
Headers show

Comments

Andrea Arcangeli - July 25, 2013, 10:11 a.m.
MADV_DONTFORK prevents fork to fail with -ENOMEM if the default
overcommit heuristics decides there's too much anonymous virtual
memory allocated. If the KVM secondary MMU is synchronized with MMU
notifiers or not, doesn't make a difference in that regard.

Secondly it's always more efficient to avoid copying the guest
physical address space in the fork child (so we avoid to mark all the
guest memory readonly in the parent and so we skip the establishment
and teardown of lots of pagetables in the child).

In the common case we can ignore the error if MADV_DONTFORK is not
available. Leave a second invocation that errors out in the KVM path
if MMU notifiers are missing and KVM is enabled, to abort in such
case.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 exec.c | 1 +
 1 file changed, 1 insertion(+)
Peter Maydell - July 25, 2013, 10:16 a.m.
On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote:
> diff --git a/exec.c b/exec.c
> index c99a883..d3bb58d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>
>      qemu_ram_setup_dump(new_block->host, size);
>      qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>
>      if (kvm_enabled())
>          kvm_setup_guest_memory(new_block->host, size);
>

kvm_setup_guest_memory() already calls
  qemu_madvise(start, size, QEMU_MADV_DONTFORK)
so why do we need to do it here as well?
If we should be doing it in all cases presumably the right
fix is to move the if (!kvm_has_sync_mmu()) check in
kvm_setup_guest_memory() from "do we call madvise" to
"do we fail with an error if it failed".

thanks
-- PMM
Andrea Arcangeli - July 25, 2013, 10:32 a.m.
On Thu, Jul 25, 2013 at 11:16:44AM +0100, Peter Maydell wrote:
> On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > diff --git a/exec.c b/exec.c
> > index c99a883..d3bb58d 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
> >
> >      qemu_ram_setup_dump(new_block->host, size);
> >      qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
> > +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
> >
> >      if (kvm_enabled())
> >          kvm_setup_guest_memory(new_block->host, size);
> >
> 
> kvm_setup_guest_memory() already calls
>   qemu_madvise(start, size, QEMU_MADV_DONTFORK)
> so why do we need to do it here as well?

That only runs if kvm is enabled and mmu is not sync. But we need it
in the common case too, to prevent -ENOMEM (if MADV_DONTFORK is
available in the host OS, otherwise well we'll just do best effort and
skip). See commit message for more details.

> If we should be doing it in all cases presumably the right
> fix is to move the if (!kvm_has_sync_mmu()) check in
> kvm_setup_guest_memory() from "do we call madvise" to
> "do we fail with an error if it failed".

We could pass an error to kvm_setup_guest_memory but it's not worth it
considering more likely we should abort if kvm is enabled and mmu is
not sync (without bothering to call MADV_DONTFORK there).
Paolo Bonzini - Aug. 6, 2013, 4:47 p.m.
On 07/25/2013 12:11 PM, Andrea Arcangeli wrote:
> MADV_DONTFORK prevents fork to fail with -ENOMEM if the default
> overcommit heuristics decides there's too much anonymous virtual
> memory allocated. If the KVM secondary MMU is synchronized with MMU
> notifiers or not, doesn't make a difference in that regard.
>
> Secondly it's always more efficient to avoid copying the guest
> physical address space in the fork child (so we avoid to mark all the
> guest memory readonly in the parent and so we skip the establishment
> and teardown of lots of pagetables in the child).
>
> In the common case we can ignore the error if MADV_DONTFORK is not
> available. Leave a second invocation that errors out in the KVM path
> if MMU notifiers are missing and KVM is enabled, to abort in such
> case.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>   exec.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/exec.c b/exec.c
> index c99a883..d3bb58d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>
>       qemu_ram_setup_dump(new_block->host, size);
>       qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>
>       if (kvm_enabled())
>           kvm_setup_guest_memory(new_block->host, size);
>
>

PING.

Benoit reported this on IRC, too.

Paolo
Andreas Färber - Aug. 6, 2013, 4:55 p.m.
Am 25.07.2013 12:32, schrieb Andrea Arcangeli:
> On Thu, Jul 25, 2013 at 11:16:44AM +0100, Peter Maydell wrote:
>> On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote:
>>> diff --git a/exec.c b/exec.c
>>> index c99a883..d3bb58d 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>>>
>>>      qemu_ram_setup_dump(new_block->host, size);
>>>      qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
>>> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>>>
>>>      if (kvm_enabled())
>>>          kvm_setup_guest_memory(new_block->host, size);
>>>
>>
>> kvm_setup_guest_memory() already calls
>>   qemu_madvise(start, size, QEMU_MADV_DONTFORK)
>> so why do we need to do it here as well?
> 
> That only runs if kvm is enabled and mmu is not sync. But we need it
> in the common case too, to prevent -ENOMEM (if MADV_DONTFORK is
> available in the host OS, otherwise well we'll just do best effort and
> skip). See commit message for more details.

So if we add the DONTFORK unconditionally here, why not drop it in said
kvm_setup_guest_memory()? That would make the patch more
self-documenting while at it.

Andreas

> 
>> If we should be doing it in all cases presumably the right
>> fix is to move the if (!kvm_has_sync_mmu()) check in
>> kvm_setup_guest_memory() from "do we call madvise" to
>> "do we fail with an error if it failed".
> 
> We could pass an error to kvm_setup_guest_memory but it's not worth it
> considering more likely we should abort if kvm is enabled and mmu is
> not sync (without bothering to call MADV_DONTFORK there).
>
Benoît Canet - Aug. 6, 2013, 5:06 p.m.
Le Thursday 25 Jul 2013 à 12:11:15 (+0200), Andrea Arcangeli a écrit :
> MADV_DONTFORK prevents fork to fail with -ENOMEM if the default
> overcommit heuristics decides there's too much anonymous virtual
> memory allocated. If the KVM secondary MMU is synchronized with MMU
> notifiers or not, doesn't make a difference in that regard.
> 
> Secondly it's always more efficient to avoid copying the guest
> physical address space in the fork child (so we avoid to mark all the
> guest memory readonly in the parent and so we skip the establishment
> and teardown of lots of pagetables in the child).
> 
> In the common case we can ignore the error if MADV_DONTFORK is not
> available. Leave a second invocation that errors out in the KVM path
> if MMU notifiers are missing and KVM is enabled, to abort in such
> case.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  exec.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/exec.c b/exec.c
> index c99a883..d3bb58d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>  
>      qemu_ram_setup_dump(new_block->host, size);
>      qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>  
>      if (kvm_enabled())
>          kvm_setup_guest_memory(new_block->host, size);
> 

This patch solve a bug where the network down script of a regular tap interface
is not executed because fork fail when a pci-assigment is done on a large guest.

Tested-By: Benoit Canet <benoit@irqsave.net>
Paolo Bonzini - Aug. 30, 2013, 3:48 p.m.
Il 06/08/2013 18:47, Paolo Bonzini ha scritto:
> On 07/25/2013 12:11 PM, Andrea Arcangeli wrote:
>> MADV_DONTFORK prevents fork to fail with -ENOMEM if the default
>> overcommit heuristics decides there's too much anonymous virtual
>> memory allocated. If the KVM secondary MMU is synchronized with MMU
>> notifiers or not, doesn't make a difference in that regard.
>>
>> Secondly it's always more efficient to avoid copying the guest
>> physical address space in the fork child (so we avoid to mark all the
>> guest memory readonly in the parent and so we skip the establishment
>> and teardown of lots of pagetables in the child).
>>
>> In the common case we can ignore the error if MADV_DONTFORK is not
>> available. Leave a second invocation that errors out in the KVM path
>> if MMU notifiers are missing and KVM is enabled, to abort in such
>> case.
>>
>> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
>> ---
>>   exec.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/exec.c b/exec.c
>> index c99a883..d3bb58d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t
>> size, void *host,
>>
>>       qemu_ram_setup_dump(new_block->host, size);
>>       qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
>> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>>
>>       if (kvm_enabled())
>>           kvm_setup_guest_memory(new_block->host, size);
>>
>>
> 
> PING.
> 
> Benoit reported this on IRC, too.
> 
> Paolo
> 
> 

PING^2

The last paragraph of the commit message should answer Andreas's objection.

Paolo
Paolo Bonzini - Aug. 30, 2013, 3:48 p.m.
Il 06/08/2013 18:55, Andreas Färber ha scritto:
> Am 25.07.2013 12:32, schrieb Andrea Arcangeli:
>> On Thu, Jul 25, 2013 at 11:16:44AM +0100, Peter Maydell wrote:
>>> On 25 July 2013 11:11, Andrea Arcangeli <aarcange@redhat.com> wrote:
>>>> diff --git a/exec.c b/exec.c
>>>> index c99a883..d3bb58d 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>>>>
>>>>      qemu_ram_setup_dump(new_block->host, size);
>>>>      qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
>>>> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>>>>
>>>>      if (kvm_enabled())
>>>>          kvm_setup_guest_memory(new_block->host, size);
>>>>
>>>
>>> kvm_setup_guest_memory() already calls
>>>   qemu_madvise(start, size, QEMU_MADV_DONTFORK)
>>> so why do we need to do it here as well?
>>
>> That only runs if kvm is enabled and mmu is not sync. But we need it
>> in the common case too, to prevent -ENOMEM (if MADV_DONTFORK is
>> available in the host OS, otherwise well we'll just do best effort and
>> skip). See commit message for more details.
> 
> So if we add the DONTFORK unconditionally here, why not drop it in said
> kvm_setup_guest_memory()? That would make the patch more
> self-documenting while at it.

This is mentioned in the commit message:

"In the common case we can ignore the error if MADV_DONTFORK is not
available. Leave a second invocation that errors out in the KVM path if
MMU notifiers are missing and KVM is enabled, to abort in such case".

Paolo
Andreas Färber - Aug. 30, 2013, 3:52 p.m.
Am 30.08.2013 17:48, schrieb Paolo Bonzini:
> Il 06/08/2013 18:47, Paolo Bonzini ha scritto:
>> On 07/25/2013 12:11 PM, Andrea Arcangeli wrote:
>>> MADV_DONTFORK prevents fork to fail with -ENOMEM if the default
>>> overcommit heuristics decides there's too much anonymous virtual
>>> memory allocated. If the KVM secondary MMU is synchronized with MMU
>>> notifiers or not, doesn't make a difference in that regard.
>>>
>>> Secondly it's always more efficient to avoid copying the guest
>>> physical address space in the fork child (so we avoid to mark all the
>>> guest memory readonly in the parent and so we skip the establishment
>>> and teardown of lots of pagetables in the child).
>>>
>>> In the common case we can ignore the error if MADV_DONTFORK is not
>>> available. Leave a second invocation that errors out in the KVM path
>>> if MMU notifiers are missing and KVM is enabled, to abort in such
>>> case.
>>>
>>> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
>>> ---
>>>   exec.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index c99a883..d3bb58d 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t
>>> size, void *host,
>>>
>>>       qemu_ram_setup_dump(new_block->host, size);
>>>       qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
>>> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>>>
>>>       if (kvm_enabled())
>>>           kvm_setup_guest_memory(new_block->host, size);
>>>
>>>
>>
>> PING.
>>
>> Benoit reported this on IRC, too.
>>
>> Paolo
>>
>>
> 
> PING^2
> 
> The last paragraph of the commit message should answer Andreas's objection.

OK. The patch is mislabelled as "KVM:" though putting this in a common
code path and touching exec.c only, so please put it on uq/master and
fix that up before you set the history in stone. :)

Andreas
Paolo Bonzini - Aug. 30, 2013, 4:03 p.m.
Il 30/08/2013 17:52, Andreas Färber ha scritto:
> OK. The patch is mislabelled as "KVM:" though putting this in a common
> code path and touching exec.c only, so please put it on uq/master and
> fix that up before you set the history in stone. :)

Well, touching exec.c is why I wasn't picking it up through uq/master,
but that's indeed a good observation!

If Anthony doesn't pick it up, I'll do so as soon as it's my turn to
manage the qemu-kvm.git repository.

Paolo
Gleb Natapov - Aug. 30, 2013, 4:21 p.m.
On Fri, Aug 30, 2013 at 06:03:42PM +0200, Paolo Bonzini wrote:
> Il 30/08/2013 17:52, Andreas Färber ha scritto:
> > OK. The patch is mislabelled as "KVM:" though putting this in a common
> > code path and touching exec.c only, so please put it on uq/master and
> > fix that up before you set the history in stone. :)
> 
> Well, touching exec.c is why I wasn't picking it up through uq/master,
> but that's indeed a good observation!
> 
> If Anthony doesn't pick it up, I'll do so as soon as it's my turn to
> manage the qemu-kvm.git repository.
> 
Send your acks and I will merge it.

--
			Gleb.
Paolo Bonzini - Aug. 30, 2013, 4:40 p.m.
Il 30/08/2013 18:21, Gleb Natapov ha scritto:
> On Fri, Aug 30, 2013 at 06:03:42PM +0200, Paolo Bonzini wrote:
>> Il 30/08/2013 17:52, Andreas Färber ha scritto:
>>> OK. The patch is mislabelled as "KVM:" though putting this in a common
>>> code path and touching exec.c only, so please put it on uq/master and
>>> fix that up before you set the history in stone. :)
>>
>> Well, touching exec.c is why I wasn't picking it up through uq/master,
>> but that's indeed a good observation!
>>
>> If Anthony doesn't pick it up, I'll do so as soon as it's my turn to
>> manage the qemu-kvm.git repository.
>>
> Send your acks and I will merge it.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

with s/KVM/exec/ in the subject line
Gleb Natapov - Sept. 1, 2013, 9:39 a.m.
On Thu, Jul 25, 2013 at 12:11:15PM +0200, Andrea Arcangeli wrote:
> MADV_DONTFORK prevents fork to fail with -ENOMEM if the default
> overcommit heuristics decides there's too much anonymous virtual
> memory allocated. If the KVM secondary MMU is synchronized with MMU
> notifiers or not, doesn't make a difference in that regard.
> 
> Secondly it's always more efficient to avoid copying the guest
> physical address space in the fork child (so we avoid to mark all the
> guest memory readonly in the parent and so we skip the establishment
> and teardown of lots of pagetables in the child).
> 
> In the common case we can ignore the error if MADV_DONTFORK is not
> available. Leave a second invocation that errors out in the KVM path
> if MMU notifiers are missing and KVM is enabled, to abort in such
> case.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Applied to uq/master, thanks.

> ---
>  exec.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/exec.c b/exec.c
> index c99a883..d3bb58d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1162,6 +1162,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>  
>      qemu_ram_setup_dump(new_block->host, size);
>      qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
> +    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
>  
>      if (kvm_enabled())
>          kvm_setup_guest_memory(new_block->host, size);

--
			Gleb.

Patch

diff --git a/exec.c b/exec.c
index c99a883..d3bb58d 100644
--- a/exec.c
+++ b/exec.c
@@ -1162,6 +1162,7 @@  ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 
     qemu_ram_setup_dump(new_block->host, size);
     qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
+    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
 
     if (kvm_enabled())
         kvm_setup_guest_memory(new_block->host, size);