From patchwork Mon Jul 6 19:21:12 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Snow X-Patchwork-Id: 491785 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 97A9D140323 for ; Tue, 7 Jul 2015 05:21:47 +1000 (AEST) Received: from localhost ([::1]:52639 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCBxZ-0001qk-I4 for incoming@patchwork.ozlabs.org; Mon, 06 Jul 2015 15:21:45 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49525) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCBx8-0001KQ-0v for qemu-devel@nongnu.org; Mon, 06 Jul 2015 15:21:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZCBx5-0002tr-8X for qemu-devel@nongnu.org; Mon, 06 Jul 2015 15:21:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41966) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCBx5-0002ti-0U; Mon, 06 Jul 2015 15:21:15 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id 8204691C0A; Mon, 6 Jul 2015 19:21:14 +0000 (UTC) Received: from scv.usersys.redhat.com (dhcp-17-29.bos.redhat.com [10.18.17.29]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t66JLCBc021233; Mon, 6 Jul 2015 15:21:12 -0400 Message-ID: <559AD528.5030903@redhat.com> Date: Mon, 06 Jul 2015 15:21:12 -0400 From: John Snow User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Paolo Bonzini , Peter Maydell , QEMU Developers , Stefan Hajnoczi , "qemu-ppc@nongnu.org" References: <559AA1BB.8020709@redhat.com> <559AABAF.9020208@redhat.com> In-Reply-To: <559AABAF.9020208@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Subject: Re: [Qemu-devel] qtest hang in /x86_64/ahci/io/ncq/simple (ppc64 host) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 07/06/2015 12:24 PM, Paolo Bonzini wrote: > > > On 06/07/2015 17:41, John Snow wrote: >> I'll take a look. More than possible there's a race in the wait >> conditional that I have just never seen before. >> >> I tweaked it a little recently to wait on both NCQ and traditional >> completion, but it doesn't have a timeout or anything. I'll go ahead and >> fix that while I'm here. >> >> Should be something simple, hopefully. The one Peter Maydell reported is indeed something simple. Sent out a patch. > > My hang in test_halted_ncq is different: ... Great :) > > (gdb) up > #1 0x00007f3aaadb4476 in read (__nbytes=1, __buf=0x7ffc0652abbf, > __fd=) at /usr/include/bits/unistd.h:44 > 44 return __read_alias (__fd, __buf, __nbytes); > (gdb) > #2 qtest_qmp_receive (s=s@entry=0x7f3aac017360) at > /home/work/upstream/qemu/tests/libqtest.c:371 > 371 len = read(s->qmp_fd, &c, 1); > (gdb) > #3 0x00007f3aaadb4c48 in qtest_qmp_eventwait (s=0x7f3aac017360, > event=event@entry=0x7f3aaadbdc68 "STOP") at > /home/work/upstream/qemu/tests/libqtest.c:477 > 477 response = qtest_qmp_receive(s); > (gdb) > #4 0x00007f3aaadb3514 in qmp_eventwait (event=0x7f3aaadbdc68 "STOP") at > /home/work/upstream/qemu/tests/libqtest.h:498 > 498 return qtest_qmp_eventwait(global_qtest, event); > (gdb) > #5 ahci_guest_io_halt (ahci=ahci@entry=0x7f3aac014820, > port=port@entry=5 '\005', ide_cmd=ide_cmd@entry=97 'a', > buffer=buffer@entry=1097728, bufsize=bufsize@entry=4096, > sector=sector@entry=0) > at /home/work/upstream/qemu/tests/libqos/ahci.c:604 > 604 qmp_eventwait("STOP"); > (gdb) > #6 0x00007f3aaada9e09 in ahci_halted_io_test (cmd_read=, > cmd_write=) at > /home/work/upstream/qemu/tests/ahci-test.c:1244 > 1244 cmd = ahci_guest_io_halt(ahci, port, cmd_write, > > > In QEMU, current_run_state is RUN_STATE_RUNNING, so the halt didn't > happen. It's 100% reproducible with "make check-qtest-{i386,x86_64} V=1 > -j12". This is the final part of the qtest log: > > [R +0.066539] readl 0xe0000390 > [S +0.066553] OK 0x0000000000000000 > [R +0.066582] memset 0x10b000 0x100 0x00 > [S +0.066597] OK > [R +0.067627] write 0x10c000 0x1000 [snip] > [S +0.067748] OK > [R +0.067782] readl 0xe00003b8 > [S +0.067796] OK 0x0000000000000000 > [R +0.067824] read 0x10a020 0x20 > [S +0.067843] OK > 0x0000000000000000000000000000000000000000000000000000000000000000 > [R +0.067874] write 0x10a020 0x20 > 0x0000000000000000000000000000000000000000000000000000000000000000 > [S +0.067897] OK > [R +0.067926] write 0x10a020 0x20 > 0x450401000000000000e010000000000000000000000000000000000000000000 > [S +0.067940] OK > [R +0.067964] write 0x10e000 0x14 0x2780610800000040000000000800000000000000 > [S +0.067980] OK > [R +0.068004] write 0x10e080 0x10 0x00c010000000000000000000ff0f0080 > [S +0.068018] OK > [R +0.068042] writel 0xe00003b4 0x2 > [S +0.068056] OK > [R +0.068079] writel 0xe00003b8 0x2 > [S +0.069500] OK > > This is the /tmp/qtest-blkdebug.INHp7s file: > > [inject-error] > event = "write_aio" > errno = "5" > state = "1" > immediately = "off" > once = "on" > [set-state] > event = "write_aio" > new_state = "2" > > There is no trace of the write payload in the .qcow2 file. Smells like the injected error prevented that successfully, but... > > Paolo > Can't reproduce: Fedora 21, 4.0.5, Today's QEMU HEAD (7edd8e56) ... So it's a race condition of some sort. Obvious questions: You've tried a distclean? What was your configure invocation? My attempt: make distclean ../../configure --target-list="x86_64-softmmu i386-softmmu" --enable-debug make -j9 make check-qtest-{i386,x86_64} V=1 -j12 Completes just fine for me. That aside, I can't figure out why the machine HALT *and* the write wouldn't occur. Can you stick a bunch of printfs inside the NCQ error handling function and see if there's something goofing up in there, or if it's even making it there? My guess is I have goofed something up above there in the error handling section of ncq_cb, but "worksforme," so I am not sure what's going on. If no prints show up during testing, maybe turn on AHCI_DEBUG as well and see if any prints are showing up in the success case. --js diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index bb6a92f..f1af6f9 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -921,6 +921,7 @@ static void ncq_cb(void *opaque, int ret) IDEState *ide_state = &ncq_tfs->drive->port.ifs[0]; if (ret == -ECANCELED) { + fprintf(stderr, "ncq_cb: -ECANCELED\n"); return; } @@ -928,12 +929,17 @@ static void ncq_cb(void *opaque, int ret) bool is_read = ncq_tfs->cmd == READ_FPDMA_QUEUED; BlockErrorAction action = blk_get_error_action(ide_state->blk, is_read, -ret); + fprintf(stderr, "ncq_cb: ret (%d) is < 0; action is %s\n", + ret, BlockErrorAction_lookup[action]); if (action == BLOCK_ERROR_ACTION_STOP) { ncq_tfs->halt = true; ide_state->bus->error_status = IDE_RETRY_HBA; + fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_STOP\n"); } else if (action == BLOCK_ERROR_ACTION_REPORT) { ncq_err(ncq_tfs); + fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_REPORT\n"); } + fprintf(stderr, "ncq_cb: -> blk_error_action\n"); blk_error_action(ide_state->blk, action, is_read, -ret); } else { ide_state->status = READY_STAT | SEEK_STAT;