From patchwork Wed Mar 9 08:50:25 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Corentin Chary X-Patchwork-Id: 86092 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id D001EB70FF for ; Wed, 9 Mar 2011 20:01:30 +1100 (EST) Received: from localhost ([127.0.0.1]:55838 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PxF6w-00087s-II for incoming@patchwork.ozlabs.org; Wed, 09 Mar 2011 03:51:14 -0500 Received: from [140.186.70.92] (port=38579 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PxF6B-000846-SA for qemu-devel@nongnu.org; Wed, 09 Mar 2011 03:50:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PxF6A-0003Cm-HP for qemu-devel@nongnu.org; Wed, 09 Mar 2011 03:50:27 -0500 Received: from mail-ew0-f45.google.com ([209.85.215.45]:44040) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PxF6A-0003Cf-BC for qemu-devel@nongnu.org; Wed, 09 Mar 2011 03:50:26 -0500 Received: by ewy24 with SMTP id 24so95151ewy.4 for ; Wed, 09 Mar 2011 00:50:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=vObq8vf49bnggKH7X38kbwMaiPy7EiiRfua3kSjzvVo=; b=XsbghzOm0tzNTXrqOTZqVF4v7pxkgZexyCs9B+2ls/vNB1vcF5dthZ4zumWGZCuRkO cJBZS4Qdh0U91J1/yQcsuy49anbF4GMoJKXdtedOWC0LkP+aC43pHwxLay5hgogsyawP rWCMm6BH3Ui3bq2siP7KmCKvqZ3xhEcTKlo9g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=nxnWZBH+1JHfXzNQv1RwE91ddoZ164Rf4BQlUJnKnJ4pM+rHkbnwr6v9aSY28K2NdE jxv//z10kdF/iDFhGC4juXZC/YP9wBZNJMNyRRkEWm+kVyH4sY4TCvdZwHOEqPCObWZT hvZ0+NwU+nDllq6ey/VqO3sYMAcnl/5Aah7i0= MIME-Version: 1.0 Received: by 10.14.4.102 with SMTP id 78mr387585eei.24.1299660625202; Wed, 09 Mar 2011 00:50:25 -0800 (PST) Received: by 10.14.22.70 with HTTP; Wed, 9 Mar 2011 00:50:25 -0800 (PST) In-Reply-To: <4D772E4C.6020604@web.de> References: <2640D58E-2101-47FA-99B6-28815666651E@dlh.net> <4D772E4C.6020604@web.de> Date: Wed, 9 Mar 2011 08:50:25 +0000 Message-ID: Subject: Re: [Qemu-devel] Re: segmentation fault in qemu-kvm-0.14.0 From: Corentin Chary To: Jan Kiszka X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.215.45 Cc: Peter Lieven , qemu-devel , kvm@vger.kernel.org X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Wed, Mar 9, 2011 at 7:37 AM, Jan Kiszka wrote: > On 2011-03-08 23:53, Peter Lieven wrote: >> Hi, >> >> during testing of qemu-kvm-0.14.0 i can reproduce the following segfault. i have seen similar crash already in 0.13.0, but had no time to debug. >> my guess is that this segfault is related to the threaded vnc server which was introduced in qemu 0.13.0. the bug is only triggerable if a vnc >> client is attached. it might also be connected to a resolution change in the guest. i have a backtrace attached. the debugger is still running if someone >> needs more output >> > > ... > >> Thread 1 (Thread 0x7ffff7ff0700 (LWP 29038)): >> #0  0x0000000000000000 in ?? () >> No symbol table info available. >> #1  0x000000000041d669 in main_loop_wait (nonblocking=0) >>     at /usr/src/qemu-kvm-0.14.0/vl.c:1388 > > So we are calling a IOHandlerRecord::fd_write handler that is NULL. > Looking at qemu_set_fd_handler2, this may happen if that function is > called for an existing io-handler entry with non-NULL write handler, > passing a NULL write and a non-NULL read handler. And all this without > the global mutex held. > > And there are actually calls in vnc_client_write_plain and > vnc_client_write_locked (in contrast to vnc_write) that may generate > this pattern. It's probably worth validating that the iothread lock is > always held when qemu_set_fd_handler2 is invoked to confirm this race > theory, adding something like > > assert(pthread_mutex_trylock(&qemu_mutex) != 0); > (that's for qemu-kvm only) > > BTW, qemu with just --enable-vnc-thread, ie. without io-thread support, > should always run into this race as it then definitely lacks a global mutex. I'm not sure what mutex should be locked here (qemu_global_mutex, qemu_fair_mutex, lock_iothread). But here is where is should be locked (other vnc_write calls in this thread should never trigger qemu_set_fd_handler): diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c index 1d4c5e7..e02d891 100644 --- a/ui/vnc-jobs-async.c +++ b/ui/vnc-jobs-async.c @@ -258,7 +258,9 @@ static int vnc_worker_thread_loop(VncJobQueue *queue) goto disconnected; } + /* lock */ vnc_write(job->vs, vs.output.buffer, vs.output.offset); + /* unlock */ disconnected: /* Copy persistent encoding data */ @@ -267,7 +269,9 @@ disconnected: vnc_unlock_output(job->vs); if (flush) { + /* lock */ vnc_flush(job->vs); + /* unlock */ } vnc_lock_queue(queue)