diff mbox

stop cpus before forking.

Message ID 1276543644-32689-1-git-send-email-glommer@redhat.com
State New
Headers show

Commit Message

Glauber Costa June 14, 2010, 7:27 p.m. UTC
This patch fixes a bug that happens with kvm, irqchip-in-kernel,
while adding a netdev. Despite the situations of reproduction being
specific to kvm, I believe this fix is pretty generic, and fits here.
Specially if we ever want to have our own irqchip in kernel too.

The problem happens after the fork system call, and although it is not
100 % reproduceable, happens pretty often. After fork, the memory where
the apic is mapped is present in both processes. It ends up confusing
the vcpus somewhere in the irq <-> ack path, and qemu hangs, with no
irqs being delivered at all from that point on.

Making sure the vcpus are stopped before forking makes the problem go
away. Besides, this is a pretty unfrequent operation, which already hangs
the io-thread for a while. So it should not hurt performance.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 net/tap.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

Comments

Anthony Liguori June 14, 2010, 7:33 p.m. UTC | #1
On 06/14/2010 02:27 PM, Glauber Costa wrote:
> This patch fixes a bug that happens with kvm, irqchip-in-kernel,
> while adding a netdev. Despite the situations of reproduction being
> specific to kvm, I believe this fix is pretty generic, and fits here.
> Specially if we ever want to have our own irqchip in kernel too.
>
> The problem happens after the fork system call, and although it is not
> 100 % reproduceable, happens pretty often. After fork, the memory where
> the apic is mapped is present in both processes. It ends up confusing
> the vcpus somewhere in the irq<->  ack path, and qemu hangs, with no
> irqs being delivered at all from that point on.
>
> Making sure the vcpus are stopped before forking makes the problem go
> away. Besides, this is a pretty unfrequent operation, which already hangs
> the io-thread for a while. So it should not hurt performance.
>
> Signed-off-by: Glauber Costa<glommer@redhat.com>
>    

This doesn't make very much sense to me but smells like a kernel bug to me.

Even if it isn't, I can't rationalize why stopping the vm like this is 
enough to fix such a problem.  Is the problem that the KVM VCPU threads 
get duplicated while potentially running or something like that?

Regards,

Anthony Liguori

> ---
>   net/tap.c |    4 ++++
>   1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/net/tap.c b/net/tap.c
> index 0147dab..f34dd9c 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -330,6 +330,9 @@ static int launch_script(const char *setup_script, const char *ifname, int fd)
>       sigaddset(&mask, SIGCHLD);
>       sigprocmask(SIG_BLOCK,&mask,&oldmask);
>
> +    /* make sure no cpus are running, so the apic does not
> +     * get confused */
> +    vm_stop(0);
>       /* try to launch network script */
>       pid = fork();
>       if (pid == 0) {
> @@ -350,6 +353,7 @@ static int launch_script(const char *setup_script, const char *ifname, int fd)
>           execv(setup_script, args);
>           _exit(1);
>       } else if (pid>  0) {
> +        vm_start();
>           while (waitpid(pid,&status, 0) != pid) {
>               /* loop */
>           }
>
Glauber Costa June 14, 2010, 7:42 p.m. UTC | #2
On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote:
> On 06/14/2010 02:27 PM, Glauber Costa wrote:
> >This patch fixes a bug that happens with kvm, irqchip-in-kernel,
> >while adding a netdev. Despite the situations of reproduction being
> >specific to kvm, I believe this fix is pretty generic, and fits here.
> >Specially if we ever want to have our own irqchip in kernel too.
> >
> >The problem happens after the fork system call, and although it is not
> >100 % reproduceable, happens pretty often. After fork, the memory where
> >the apic is mapped is present in both processes. It ends up confusing
> >the vcpus somewhere in the irq<->  ack path, and qemu hangs, with no
> >irqs being delivered at all from that point on.
> >
> >Making sure the vcpus are stopped before forking makes the problem go
> >away. Besides, this is a pretty unfrequent operation, which already hangs
> >the io-thread for a while. So it should not hurt performance.
> >
> >Signed-off-by: Glauber Costa<glommer@redhat.com>
> 
> This doesn't make very much sense to me but smells like a kernel bug to me.
My interpretation is that by doing that, we make sure no in-flight
requests are happening. Actually, a sleep(x), with x sufficiently big
is enough to make this problem go away, but that is too hacky.

I do agree that this is most likely a kernel bug. But as with any other
kernel bugs, I believe this is a easy workaround to have things working
even in older kernels until we fix it.

> 
> Even if it isn't, I can't rationalize why stopping the vm like this
> is enough to fix such a problem.  Is the problem that the KVM VCPU
> threads get duplicated while potentially running or something like
> that?

I doubt fork is duplicating the vcpu threads. More than that, this
bug does not happen with userspace irqchip.
So I believe that either irq request or the ack itself is reaching the
wrong process, forever stalling the apic.
Anthony Liguori June 14, 2010, 7:58 p.m. UTC | #3
On 06/14/2010 02:42 PM, Glauber Costa wrote:
> On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote:
>    
>> On 06/14/2010 02:27 PM, Glauber Costa wrote:
>>      
>>> This patch fixes a bug that happens with kvm, irqchip-in-kernel,
>>> while adding a netdev. Despite the situations of reproduction being
>>> specific to kvm, I believe this fix is pretty generic, and fits here.
>>> Specially if we ever want to have our own irqchip in kernel too.
>>>
>>> The problem happens after the fork system call, and although it is not
>>> 100 % reproduceable, happens pretty often. After fork, the memory where
>>> the apic is mapped is present in both processes. It ends up confusing
>>> the vcpus somewhere in the irq<->   ack path, and qemu hangs, with no
>>> irqs being delivered at all from that point on.
>>>
>>> Making sure the vcpus are stopped before forking makes the problem go
>>> away. Besides, this is a pretty unfrequent operation, which already hangs
>>> the io-thread for a while. So it should not hurt performance.
>>>
>>> Signed-off-by: Glauber Costa<glommer@redhat.com>
>>>        
>> This doesn't make very much sense to me but smells like a kernel bug to me.
>>      
> My interpretation is that by doing that, we make sure no in-flight
> requests are happening. Actually, a sleep(x), with x sufficiently big
> is enough to make this problem go away, but that is too hacky.
>    

vm_stop() is probably just acting a glorified sleep() since it has to 
wait for each thread to stop.

> I do agree that this is most likely a kernel bug. But as with any other
> kernel bugs, I believe this is a easy workaround to have things working
> even in older kernels until we fix it.
>    

If we don't know what the bug is, then we do not know whether this is a 
work around.  Rather, this change happens to make the bug more difficult 
to reproduce with your test case.

>> Even if it isn't, I can't rationalize why stopping the vm like this
>> is enough to fix such a problem.  Is the problem that the KVM VCPU
>> threads get duplicated while potentially running or something like
>> that?
>>      
> I doubt fork is duplicating the vcpu threads. More than that, this
> bug does not happen with userspace irqchip.
> So I believe that either irq request or the ack itself is reaching the
> wrong process, forever stalling the apic.
>    

That sounds more like a signal delivery issue.  It's not obvious to me 
that we're doing the wrong thing with signal mask though.

If it's a signal mask related issue, then vm_stop isn't a proper fix as 
there would be still be a race.

Regards,

Anthony Liguori
Glauber Costa June 14, 2010, 8:05 p.m. UTC | #4
On Mon, Jun 14, 2010 at 02:58:47PM -0500, Anthony Liguori wrote:
> On 06/14/2010 02:42 PM, Glauber Costa wrote:
> >On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote:
> >>On 06/14/2010 02:27 PM, Glauber Costa wrote:
> >>>This patch fixes a bug that happens with kvm, irqchip-in-kernel,
> >>>while adding a netdev. Despite the situations of reproduction being
> >>>specific to kvm, I believe this fix is pretty generic, and fits here.
> >>>Specially if we ever want to have our own irqchip in kernel too.
> >>>
> >>>The problem happens after the fork system call, and although it is not
> >>>100 % reproduceable, happens pretty often. After fork, the memory where
> >>>the apic is mapped is present in both processes. It ends up confusing
> >>>the vcpus somewhere in the irq<->   ack path, and qemu hangs, with no
> >>>irqs being delivered at all from that point on.
> >>>
> >>>Making sure the vcpus are stopped before forking makes the problem go
> >>>away. Besides, this is a pretty unfrequent operation, which already hangs
> >>>the io-thread for a while. So it should not hurt performance.
> >>>
> >>>Signed-off-by: Glauber Costa<glommer@redhat.com>
> >>This doesn't make very much sense to me but smells like a kernel bug to me.
> >My interpretation is that by doing that, we make sure no in-flight
> >requests are happening. Actually, a sleep(x), with x sufficiently big
> >is enough to make this problem go away, but that is too hacky.
> 
> vm_stop() is probably just acting a glorified sleep() since it has
> to wait for each thread to stop.
> 
> >I do agree that this is most likely a kernel bug. But as with any other
> >kernel bugs, I believe this is a easy workaround to have things working
> >even in older kernels until we fix it.
> 
> If we don't know what the bug is, then we do not know whether this
> is a work around.  Rather, this change happens to make the bug more
> difficult to reproduce with your test case.
> 
> >>Even if it isn't, I can't rationalize why stopping the vm like this
> >>is enough to fix such a problem.  Is the problem that the KVM VCPU
> >>threads get duplicated while potentially running or something like
> >>that?
> >I doubt fork is duplicating the vcpu threads. More than that, this
> >bug does not happen with userspace irqchip.
> >So I believe that either irq request or the ack itself is reaching the
> >wrong process, forever stalling the apic.
> 
> That sounds more like a signal delivery issue.  It's not obvious to
> me that we're doing the wrong thing with signal mask though.
> 
> If it's a signal mask related issue, then vm_stop isn't a proper fix
> as there would be still be a race.
I do can investigate it further, but I doubt it is signal-delivery related.
I spent the first days believing it was, but now, I believe it is much
more likely to be apic-related. We don't need to wait for the child to exit for
this bug to happen, so SIGCHLD is never raised. And with in-kernel irqchip,
we don't deliver signals during normal vcpu execution.
Daniel P. Berrangé June 15, 2010, 6:14 a.m. UTC | #5
On Mon, Jun 14, 2010 at 04:42:37PM -0300, Glauber Costa wrote:
> On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote:
> > On 06/14/2010 02:27 PM, Glauber Costa wrote:
> > >This patch fixes a bug that happens with kvm, irqchip-in-kernel,
> > >while adding a netdev. Despite the situations of reproduction being
> > >specific to kvm, I believe this fix is pretty generic, and fits here.
> > >Specially if we ever want to have our own irqchip in kernel too.
> > >
> > >The problem happens after the fork system call, and although it is not
> > >100 % reproduceable, happens pretty often. After fork, the memory where
> > >the apic is mapped is present in both processes. It ends up confusing
> > >the vcpus somewhere in the irq<->  ack path, and qemu hangs, with no
> > >irqs being delivered at all from that point on.
> > >
> > >Making sure the vcpus are stopped before forking makes the problem go
> > >away. Besides, this is a pretty unfrequent operation, which already hangs
> > >the io-thread for a while. So it should not hurt performance.
> > >
> > >Signed-off-by: Glauber Costa<glommer@redhat.com>
> > 
> > This doesn't make very much sense to me but smells like a kernel bug to me.
> My interpretation is that by doing that, we make sure no in-flight
> requests are happening. Actually, a sleep(x), with x sufficiently big
> is enough to make this problem go away, but that is too hacky.
> 
> I do agree that this is most likely a kernel bug. But as with any other
> kernel bugs, I believe this is a easy workaround to have things working
> even in older kernels until we fix it.
> 
> > 
> > Even if it isn't, I can't rationalize why stopping the vm like this
> > is enough to fix such a problem.  Is the problem that the KVM VCPU
> > threads get duplicated while potentially running or something like
> > that?
> 
> I doubt fork is duplicating the vcpu threads. More than that, this
> bug does not happen with userspace irqchip.

Hmm, could this problem be hitting other places where we fork() besides
the netdev helper script ? I have seen a intermittent bug[1] with migration, 
where using exec: migration will freeze the entire guest and setting
-no-kvm-irqchip fixes the problem

Regards,
Daniel

[1] https://bugzilla.redhat.com/show_bug.cgi?id=585195
Avi Kivity June 15, 2010, 7:33 a.m. UTC | #6
On 06/14/2010 10:33 PM, Anthony Liguori wrote:
> On 06/14/2010 02:27 PM, Glauber Costa wrote:
>> This patch fixes a bug that happens with kvm, irqchip-in-kernel,
>> while adding a netdev. Despite the situations of reproduction being
>> specific to kvm, I believe this fix is pretty generic, and fits here.
>> Specially if we ever want to have our own irqchip in kernel too.
>>
>> The problem happens after the fork system call, and although it is not
>> 100 % reproduceable, happens pretty often. After fork, the memory where
>> the apic is mapped is present in both processes. It ends up confusing
>> the vcpus somewhere in the irq<->  ack path, and qemu hangs, with no
>> irqs being delivered at all from that point on.
>>
>> Making sure the vcpus are stopped before forking makes the problem go
>> away. Besides, this is a pretty unfrequent operation, which already 
>> hangs
>> the io-thread for a while. So it should not hurt performance.
>
> This doesn't make very much sense to me but smells like a kernel bug 
> to me.

It is, and the fix would be to create the APIC memory slot as sharable 
across forks (should be easy to fix in the kernel).

> Even if it isn't, I can't rationalize why stopping the vm like this is 
> enough to fix such a problem.  Is the problem that the KVM VCPU 
> threads get duplicated while potentially running or something like that?

I think it's COW triggering a copy on one vcpu while the other reads 
from the old page.
Avi Kivity June 15, 2010, 7:35 a.m. UTC | #7
On 06/14/2010 10:42 PM, Glauber Costa wrote:
> I do agree that this is most likely a kernel bug. But as with any other
> kernel bugs, I believe this is a easy workaround to have things working
> even in older kernels until we fix it.
>    

That's what -stable is for.  We fix it upstream and backport it.  No 
need to complicate userspace for something which isn't its fault.
Avi Kivity June 15, 2010, 7:36 a.m. UTC | #8
On 06/15/2010 09:14 AM, Daniel P. Berrange wrote:
>
> Hmm, could this problem be hitting other places where we fork() besides
> the netdev helper script ? I have seen a intermittent bug[1] with migration,
> where using exec: migration will freeze the entire guest and setting
> -no-kvm-irqchip fixes the problem
>
> Regards,
> Daniel
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=585195
>    

Looks very similar, I never thought of the APIC pages in that 
connection.  Kudos to Glauber for tracking it down.
Glauber Costa June 16, 2010, 4:57 p.m. UTC | #9
On Mon, Jun 14, 2010 at 02:58:47PM -0500, Anthony Liguori wrote:
> On 06/14/2010 02:42 PM, Glauber Costa wrote:
> >On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote:
> >>On 06/14/2010 02:27 PM, Glauber Costa wrote:
> >>>This patch fixes a bug that happens with kvm, irqchip-in-kernel,
> >>>while adding a netdev. Despite the situations of reproduction being
> >>>specific to kvm, I believe this fix is pretty generic, and fits here.
> >>>Specially if we ever want to have our own irqchip in kernel too.
> >>>
> >>>The problem happens after the fork system call, and although it is not
> >>>100 % reproduceable, happens pretty often. After fork, the memory where
> >>>the apic is mapped is present in both processes. It ends up confusing
> >>>the vcpus somewhere in the irq<->   ack path, and qemu hangs, with no
> >>>irqs being delivered at all from that point on.
> >>>
> >>>Making sure the vcpus are stopped before forking makes the problem go
> >>>away. Besides, this is a pretty unfrequent operation, which already hangs
> >>>the io-thread for a while. So it should not hurt performance.
> >>>
> >>>Signed-off-by: Glauber Costa<glommer@redhat.com>
> >>This doesn't make very much sense to me but smells like a kernel bug to me.
> >My interpretation is that by doing that, we make sure no in-flight
> >requests are happening. Actually, a sleep(x), with x sufficiently big
> >is enough to make this problem go away, but that is too hacky.
> 
> vm_stop() is probably just acting a glorified sleep() since it has
> to wait for each thread to stop.
No, it is not. It also makes sure no vcpus are running, and thus, not generating
new requests (or waiting for any replies, for that matter).

(Note: I am not advocating the inclusion of this, just trying to build my own
awareness)
Glauber Costa June 16, 2010, 4:58 p.m. UTC | #10
On Tue, Jun 15, 2010 at 10:33:21AM +0300, Avi Kivity wrote:
> On 06/14/2010 10:33 PM, Anthony Liguori wrote:
> >On 06/14/2010 02:27 PM, Glauber Costa wrote:
> >>This patch fixes a bug that happens with kvm, irqchip-in-kernel,
> >>while adding a netdev. Despite the situations of reproduction being
> >>specific to kvm, I believe this fix is pretty generic, and fits here.
> >>Specially if we ever want to have our own irqchip in kernel too.
> >>
> >>The problem happens after the fork system call, and although it is not
> >>100 % reproduceable, happens pretty often. After fork, the memory where
> >>the apic is mapped is present in both processes. It ends up confusing
> >>the vcpus somewhere in the irq<->  ack path, and qemu hangs, with no
> >>irqs being delivered at all from that point on.
> >>
> >>Making sure the vcpus are stopped before forking makes the problem go
> >>away. Besides, this is a pretty unfrequent operation, which
> >>already hangs
> >>the io-thread for a while. So it should not hurt performance.
> >
> >This doesn't make very much sense to me but smells like a kernel
> >bug to me.
> 
> It is, and the fix would be to create the APIC memory slot as
> sharable across forks (should be easy to fix in the kernel).
Kernel pages are already shared across fork, no?
Avi Kivity June 22, 2010, 12:18 p.m. UTC | #11
On 06/16/2010 07:58 PM, Glauber Costa wrote:
>
>> It is, and the fix would be to create the APIC memory slot as
>> sharable across forks (should be easy to fix in the kernel).
>>      
> Kernel pages are already shared across fork, no?
>    

What are kernel pages in this context?

The APIC access page is an ordinary user page.
diff mbox

Patch

diff --git a/net/tap.c b/net/tap.c
index 0147dab..f34dd9c 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -330,6 +330,9 @@  static int launch_script(const char *setup_script, const char *ifname, int fd)
     sigaddset(&mask, SIGCHLD);
     sigprocmask(SIG_BLOCK, &mask, &oldmask);
 
+    /* make sure no cpus are running, so the apic does not
+     * get confused */
+    vm_stop(0);
     /* try to launch network script */
     pid = fork();
     if (pid == 0) {
@@ -350,6 +353,7 @@  static int launch_script(const char *setup_script, const char *ifname, int fd)
         execv(setup_script, args);
         _exit(1);
     } else if (pid > 0) {
+        vm_start();
         while (waitpid(pid, &status, 0) != pid) {
             /* loop */
         }