From patchwork Fri Jan 6 11:16:16 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: MORITA Kazutaka X-Patchwork-Id: 134625 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id EFE48B6FBA for ; Fri, 6 Jan 2012 22:08:17 +1100 (EST) Received: from localhost ([::1]:46745 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rj7ed-0007cE-0y for incoming@patchwork.ozlabs.org; Fri, 06 Jan 2012 06:08:11 -0500 Received: from eggs.gnu.org ([140.186.70.92]:53611) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rj7eW-0007bz-Ux for qemu-devel@nongnu.org; Fri, 06 Jan 2012 06:08:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rj7eV-0008JM-GO for qemu-devel@nongnu.org; Fri, 06 Jan 2012 06:08:04 -0500 Received: from sh.osrg.net ([192.16.179.4]:46941) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rj7eV-0008J1-2R for qemu-devel@nongnu.org; Fri, 06 Jan 2012 06:08:03 -0500 Received: from fs.osrg.net (postfix@fs.osrg.net [10.0.0.12]) by sh.osrg.net (8.14.3/8.14.3/OSRG-NET) with ESMTP id q06B7gms022218; Fri, 6 Jan 2012 20:07:42 +0900 Received: from dfs1401.osrg.net.osrg.net (dfs1401.osrg.net [10.68.14.1]) by fs.osrg.net (Postfix) with ESMTP id 0E7583E0010; Fri, 6 Jan 2012 20:07:42 +0900 (JST) Date: Fri, 06 Jan 2012 20:16:16 +0900 Message-ID: From: MORITA Kazutaka To: Stefan Hajnoczi In-Reply-To: <20120103081351.GA28636@stefanha-thinkpad.localdomain> References: <1313152395-25248-1-git-send-email-morita.kazutaka@lab.ntt.co.jp> <20111223133850.GA12770@lst.de> <20111229120626.GA32331@lst.de> <20111230103500.GA1740@stefanha-thinkpad.localdomain> <20120102153959.GA22823@lst.de> <20120103081351.GA28636@stefanha-thinkpad.localdomain> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL/10.8 Emacs/22.2 (x86_64-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.7 (sh.osrg.net [192.16.179.4]); Fri, 06 Jan 2012 20:07:43 +0900 (JST) X-Virus-Scanned: clamav-milter 0.97.3 at sh X-Virus-Status: Clean X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 192.16.179.4 Cc: kwolf@redhat.com, sheepdog@lists.wpkg.org, Christoph Hellwig , MORITA Kazutaka , qemu-devel@nongnu.org Subject: Re: [Qemu-devel] coroutine bug?, was Re: [PATCH] sheepdog: use coroutines X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org At Tue, 3 Jan 2012 08:13:51 +0000, Stefan Hajnoczi wrote: > > On Mon, Jan 02, 2012 at 10:38:11PM +0000, Stefan Hajnoczi wrote: > > On Mon, Jan 2, 2012 at 3:39 PM, Christoph Hellwig wrote: > > > On Fri, Dec 30, 2011 at 10:35:01AM +0000, Stefan Hajnoczi wrote: > > >> If you can reproduce this bug and suspect coroutines are involved then I > > > > > > It's entirely reproducable. > > > > > > I've played around a bit and switched from the ucontext to the gthreads > > > coroutine implementation.  The result seems odd, but starts to make sense. > > > > > > Running the workload I now get the following message from qemu: > > > > > > Co-routine re-entered recursively > > > > > > and the gdb backtrace looks like: > > > > > > (gdb) bt > > > #0  0x00007f2fff36f405 in *__GI_raise (sig=) > > >    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > > > #1  0x00007f2fff372680 in *__GI_abort () at abort.c:92 > > > #2  0x00007f30019a6616 in qemu_coroutine_enter (co=0x7f3004d4d7b0, opaque=0x0) > > >    at qemu-coroutine.c:53 > > > #3  0x00007f30019a5e82 in qemu_co_queue_next_bh (opaque=) > > >    at qemu-coroutine-lock.c:43 > > > #4  0x00007f30018d5a72 in qemu_bh_poll () at async.c:71 > > > #5  0x00007f3001982990 in main_loop_wait (nonblocking=) > > >    at main-loop.c:472 > > > #6  0x00007f30018cf714 in main_loop () at /home/hch/work/qemu/vl.c:1481 > > > #7  main (argc=, argv=, envp=) > > >    at /home/hch/work/qemu/vl.c:3479 > > > > > > adding some printks suggest this happens when calling add_aio_request from > > > aio_read_response when either delaying creates, or updating metadata, > > > although not everytime one of these cases happens. > > > > > > I've tried to understand how the recursive calling happens, but unfortunately > > > the whole coroutine code lacks any sort of documentation how it should > > > behave or what it asserts about the callers. > > > > > >> I don't have a sheepdog setup here but if there's an easy way to > > >> reproduce please let me know and I'll take a look. > > > > > > With the small patch below applied to the sheppdog source I can reproduce > > > the issue on my laptop using the following setup: > > > > > > for port in 7000 7001 7002; do > > >    mkdir -p /mnt/sheepdog/$port > > >    /usr/sbin/sheep -p $port -c local /mnt/sheepdog/$port > > >    sleep 2 > > > done > > > > > > collie cluster format > > > collie vdi create test 20G > > > > > > then start a qemu instance that uses the the sheepdog volume using the > > > following device and drive lines: > > > > > >        -drive if=none,file=sheepdog:test,cache=none,id=test \ > > >        -device virtio-blk-pci,drive=test,id=testdev \ > > > > > > finally, in the guest run: > > > > > >        dd if=/dev/zero of=/dev/vdX bs=67108864 count=128 oflag=direct > > > > Thanks for these instructions. I can reproduce the issue here. > > > > It seems suspicious the way that BDRVSheepdogState->co_recv and > > ->co_send work. The code adds select(2) read/write callback functions > > on the sheepdog socket file descriptor. When the socket becomes > > writeable or readable the co_send or co_recv coroutines are entered. > > So far, so good, this is how a coroutine is integrated into the main > > loop of QEMU. > > > > The problem is that this patch is mixing things. The co_recv > > coroutine runs aio_read_response(), which invokes send_pending_req(). > > send_pending_req() invokes add_aio_request(). That function isn't > > suitable for co_recv's context because it actually sends data and hits > > a few blocking (yield) points. It takes a coroutine mutex - but the > > select(2) read callback is still in place. We're now still in the > > aio_read_response() call chain except we're actually not reading at > > all, we're trying to write! And we'll get spurious wakeups if there > > is any data readable on the socket. > > > > So the co_recv coroutine has two things in the system that will try to enter it: > > 1. The select(2) read callback on the sheepdog socket. > > 2. The aio_add_request() blocking operations, including a coroutine mutex. > > > > This is bad, a yielded coroutine should only have one thing that will > > enter it. It's rare that it makes sense to have multiple things > > entering a coroutine. > > > > It's late here but I hope this gives Kazutaka some thoughts on what is > > causing the issue with this patch. > > Poked around some more this morning. Here is a patch that isolates the > bug: > > ---8<--- > diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c > index 26ad76b..0d7a03f 100644 > --- a/qemu-coroutine-lock.c > +++ b/qemu-coroutine-lock.c > @@ -59,6 +59,16 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue) > QTAILQ_INSERT_TAIL(&queue->entries, self, co_queue_next); > qemu_coroutine_yield(); > assert(qemu_in_coroutine()); > + { > + Coroutine *co; > + > + QTAILQ_FOREACH(co, &queue->entries, co_queue_next) { > + assert(co != self); > + } > + QTAILQ_FOREACH(co, &unlock_bh_queue, co_queue_next) { > + assert(co != self); > + } > + } > } > > void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue *queue) > ---8<--- > > When you run this with ucontext or GThread coroutines you hit this > assertion failure: > > qemu-system-x86_64: qemu-coroutine-lock.c:66: qemu_co_queue_wait: Assertion `co != self' failed. > > Here is an explanation for what the asserts are checking and how we > violate the constraint: > > qemu_co_queue_next() and qemu_co_queue_next_bh() take a waiting > Coroutine off the wait queue and put it onto the unlock bh queue. When > the BH executes the coroutines from the unlock bh queue are popped off > that queue and entered. This is how coroutine wait queues work, and > coroutine mutexes are built on coroutine wait queues. > > The problem is that due to the spurious wakeups introduced in the > sheepdog patch we are acquiring the mutex earlier than normal with a BH > unlock still pending. Our coroutine can actually terminate by returning > from sheepdog.c:aio_read_respond() while the BH is still pending. When > we get around to executing the BH we're going to operate on a freed > coroutine - BOOM! Thanks for your detailed explanation. I confirmed the following hack fixes this problem. ---8<--- ---8<--- > > The reason why we get different failure behavior with ucontext and > GThread is because they have different implementations and reuse > coroutines different. We've still done something illegal but the > undefined behavior happens to be different - both ucontext and GThread > are working correctly, the problem is in the sheepdog patch. > > I suggest implementing coroutine socket I/O functions for both send > *and* receive. Then sheepdog request processing becomes simple - we'll > have less callback and coroutine trickery because it's possible to write > synchronous coroutine code. > > Right now the code is a mix of callbacks and some coroutine code (which > has the flaw I explained above). Things will be much easier if the code > is converted fully to coroutines. Yes, the original patch aimed to make Sheepdog I/O asynchronous with a small change, so the current sheepdog code is not clean. It looks like a good time to make the sheepdog driver fully coroutine-based code. Thanks, Kazutaka diff --git a/block/sheepdog.c b/block/sheepdog.c index 17a79be..b27c770 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -626,6 +626,9 @@ static void coroutine_fn aio_read_response(void *opaque) switch (acb->aiocb_type) { case AIOCB_WRITE_UDATA: + /* this coroutine context is no longer suitable for co_recv + * because we may send data to update vdi objects */ + s->co_recv = NULL; if (!is_data_obj(aio_req->oid)) { break; }