From patchwork Fri Aug 17 14:36:03 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kiszka X-Patchwork-Id: 178225 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 760702C00B4 for ; Sat, 18 Aug 2012 00:38:00 +1000 (EST) Received: from localhost ([::1]:48385 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2NgT-00059b-Ou for incoming@patchwork.ozlabs.org; Fri, 17 Aug 2012 10:37:57 -0400 Received: from eggs.gnu.org ([208.118.235.92]:41497) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2NgM-00059R-RM for qemu-devel@nongnu.org; Fri, 17 Aug 2012 10:37:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T2NgL-0007PF-8d for qemu-devel@nongnu.org; Fri, 17 Aug 2012 10:37:50 -0400 Received: from thoth.sbs.de ([192.35.17.2]:30293) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2NgK-0007OB-UP for qemu-devel@nongnu.org; Fri, 17 Aug 2012 10:37:49 -0400 Received: from mail1.siemens.de (localhost [127.0.0.1]) by thoth.sbs.de (8.13.6/8.13.6) with ESMTP id q7HEa4Iw014843; Fri, 17 Aug 2012 16:36:04 +0200 Received: from mchn199C.mchp.siemens.de ([139.25.109.49]) by mail1.siemens.de (8.13.6/8.13.6) with ESMTP id q7HEa3PH012696; Fri, 17 Aug 2012 16:36:03 +0200 Message-ID: <502E56D3.6060607@siemens.com> Date: Fri, 17 Aug 2012 16:36:03 +0200 From: Jan Kiszka User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: Stefan Hajnoczi References: <4FEC56B2.6050502@dlhnet.de> <502E42E9.2020402@siemens.com> In-Reply-To: <502E42E9.2020402@siemens.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Received-From: 192.35.17.2 Cc: Paolo Bonzini , Peter Lieven , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" , Avi Kivity Subject: Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 2012-08-17 15:11, Jan Kiszka wrote: > On 2012-08-06 17:11, Stefan Hajnoczi wrote: >> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven wrote: >>> i debugged my initial problem further and found out that the problem happens >>> to be that >>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >>> the monitor >>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >>> condition from while (ret == 0) >>> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >>> "Quit" command seems to work, but on "Reset" the VM enterns pause state. >> >> I think I'm hitting something similar. I installed a F17 amd64 guest >> (3.5 kernel) but before booting entered the GRUB boot menu edit mode. >> The guest seemed unresponsive so I switched to the monitor, which also >> froze shortly afterwards. The VNC screen ended up being all black. >> >> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede >> Linux 3.2.0-3-amd64 from Debian testing >> >> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive >> if=virtio,cache=none,file=f17.img,aio=native -serial stdio >> >> (gdb) thread apply all bt >> >> Thread 3 (Thread 0x7f8008e23700 (LWP 367)): >> #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 >> #1 0x00007f80137b92c9 in kvm_vcpu_ioctl >> (env=env@entry=0x7f8015b49640, type=type@entry=44672) >> at /home/stefanha/qemu-kvm/kvm-all.c:1619 >> #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) >> at /home/stefanha/qemu-kvm/kvm-all.c:1506 >> #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) >> at /home/stefanha/qemu-kvm/cpus.c:756 >> #4 0x00007f800fb4db50 in start_thread (arg=) at >> pthread_create.c:304 >> #5 0x00007f800f8986dd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #6 0x0000000000000000 in ?? () >> >> This vcpu is still executing guest code and I've seen it successfully >> dispatching I/O. The problem is it's missing the exit_request... >> >> Thread 2 (Thread 0x7f8008622700 (LWP 368)): >> #0 pthread_cond_wait@@GLIBC_2.3.2 () >> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >> #1 0x00007f801372b229 in qemu_cond_wait (cond=, >> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >> #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=) >> at /home/stefanha/qemu-kvm/cpus.c:724 >> #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at >> /home/stefanha/qemu-kvm/cpus.c:761 >> #4 0x00007f800fb4db50 in start_thread (arg=) at >> pthread_create.c:304 >> #5 0x00007f800f8986dd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #6 0x0000000000000000 in ?? () >> >> No problems here. >> >> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): >> #0 pthread_cond_wait@@GLIBC_2.3.2 () >> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >> #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, >> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >> #2 0x00007f8013768949 in pause_all_vcpus () at >> /home/stefanha/qemu-kvm/cpus.c:962 >> #3 0x00007f80136028c8 in main (argc=, argv=, >> envp=) at /home/stefanha/qemu-kvm/vl.c:3695 >> >> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. >> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. >> >> Here are the vcpus: >> >> (gdb) p first_cpu >> $6 = (struct CPUX86State *) 0x7f8015b49640 >> (gdb) p first_cpu->next_cpu >> $7 = (struct CPUX86State *) 0x7f8015b67450 >> (gdb) p first_cpu->next_cpu->next_cpu >> $8 = (struct CPUX86State *) 0x0 >> >> (gdb) p first_cpu->stop >> $9 = 1 >> (gdb) p first_cpu->stopped >> $10 = 0 >> (gdb) p first_cpu->exit_request >> $11 = 0 > > CPUState::exit_request is only set on specific synchronous events, see > target-i386/kvm.c. > > More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick > will skip the kicking via a signal. Maybe there is some race. Let me > think about such possibilities again... Can anyone imagine that such a barrier may actually be required? If it is currently possible that env->stop is evaluated before we called into sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the signal without properly processing its reason (stop). Jan diff --git a/cpus.c b/cpus.c index e476a3c..30f3228 100644 --- a/cpus.c +++ b/cpus.c @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env) } qemu_kvm_eat_signals(env); + /* Ensure that checking env->stop cannot overtake signal processing so + * that we lose the latter without stopping. */ + smp_rmb(); qemu_wait_io_event_common(env); }