From patchwork Tue Aug 15 14:54:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 801617 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xWwXq2RBHz9t31 for ; Wed, 16 Aug 2017 00:56:27 +1000 (AEST) Received: from localhost ([::1]:39072 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhdGa-0004Fs-TY for incoming@patchwork.ozlabs.org; Tue, 15 Aug 2017 10:56:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33220) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhdEl-0003BE-MB for qemu-devel@nongnu.org; Tue, 15 Aug 2017 10:54:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dhdEh-0001z4-RI for qemu-devel@nongnu.org; Tue, 15 Aug 2017 10:54:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40000) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dhdEh-0001yf-Hd for qemu-devel@nongnu.org; Tue, 15 Aug 2017 10:54:27 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4552F5C9DE; Tue, 15 Aug 2017 14:54:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4552F5C9DE Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=eblake@redhat.com Received: from [10.10.121.22] (ovpn-121-22.rdu2.redhat.com [10.10.121.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id B27C06C505; Tue, 15 Aug 2017 14:54:25 +0000 (UTC) To: Vladimir Sementsov-Ogievskiy , qemu-devel@nongnu.org References: <20170814213426.24681-1-eblake@redhat.com> <7c221a24-bc5c-95a0-ea5e-141d66daafd2@virtuozzo.com> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <6c97fa2e-a04c-57e7-430d-8dcecb1875e8@redhat.com> Date: Tue, 15 Aug 2017 09:54:24 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <7c221a24-bc5c-95a0-ea5e-141d66daafd2@virtuozzo.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 15 Aug 2017 14:54:26 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 Subject: Re: [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" On 08/15/2017 09:45 AM, Vladimir Sementsov-Ogievskiy wrote: > 15.08.2017 00:34, Eric Blake wrote: >> When we switched NBD to use coroutines for qemu 2.9 (in particular, >> commit a12a712a), we introduced a regression: if a server sends us >> garbage (such as a corrupted magic number), we quit the read loop >> but do not stop sending further queued commands, resulting in the >> client hanging when it never reads the response to those additional >> commands. In qemu 2.8, we properly detected that the server is no >> longer reliable, and cancelled all existing pending commands with >> EIO, then tore down the socket so that all further command attempts >> get EPIPE. >> >> Restore the proper behavior of quitting (almost) all communication >> with a broken server: Once we know we are out of sync or otherwise >> can't trust the server, we must assume that any further incoming >> data is unreliable and therefore end all pending commands with EIO, >> and quit trying to send any further commands. As an exception, we >> still (try to) send NBD_CMD_DISC to let the server know we are going >> away (in part, because it is easier to do that than to further >> refactor nbd_teardown_connection, and in part because it is the >> only command where we do not have to wait for a reply). >> >> Based on a patch by Vladimir Sementsov-Ogievskiy. >> >> +++ b/block/nbd-client.c >> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void >> *opaque) >> int ret; >> Error *local_err = NULL; >> >> - for (;;) { >> + while (!s->quit) { > > looks like quit will never be true here > >> assert(s->reply.handle == 0); >> ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); >> if (ret < 0) { >> @@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void >> *opaque) >> qemu_coroutine_yield(); >> } >> >> + if (ret < 0) { >> + s->quit = true; > > so, you set quit only here.. if we fail on some write, reading coroutine > will not > know about it and will continue reading.. Looks like you are correct - your version set the flag in more places than me, but it looks like you're right that we DO want to set the flag when writing hits a known failure. Here's what I plan to squash in: diff --git i/block/nbd-client.c w/block/nbd-client.c index bb17e3da45..422ecb4307 100644 --- i/block/nbd-client.c +++ w/block/nbd-client.c @@ -161,6 +161,9 @@ static int nbd_co_send_request(BlockDriverState *bs, } else { rc = nbd_send_request(s->ioc, request); } + if (rc < 0) { + s->quit = true; + } qemu_co_mutex_unlock(&s->send_mutex); return rc; }