From patchwork Tue Oct 16 15:38:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 984819 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42ZKGP3kZNz9s8r; Wed, 17 Oct 2018 02:38:37 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1gCRQU-0004BQ-AU; Tue, 16 Oct 2018 15:38:30 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1gCRQS-0004Ay-Nv for kernel-team@lists.ubuntu.com; Tue, 16 Oct 2018 15:38:28 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gCRQS-0002tX-Dk for kernel-team@lists.ubuntu.com; Tue, 16 Oct 2018 15:38:28 +0000 Received: by mail-qt1-f197.google.com with SMTP id d52-v6so25164137qta.9 for ; Tue, 16 Oct 2018 08:38:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=2o7J6RqW8NrpuIOcLBUPkiYaW68ZfsLWvw2xdlfVFjE=; b=F5bFPmILfYIeF7dVI5YeUGR5PfR41wXzR13xCimXc5HdKCzTR8ROoAZHJLCACNg9Sz /gDbLQrP7zfuFHHxNx+j2foZwE5PjF/VG5GwHynozydPr6gftQgN7aPxxW9K7FfPWm4m KG0KR3gh6nDRs+pmQqvHRPNA5Zlb7UEhRvITWck38LkDVkA4k3xEt6mTxots21WbANIG wtNlPfEW/yqa8QjSNLzDsJuu4tO8nCTxozxGg0HyKzQEp3lrhj3i/A4zFpLtn7HG9hLd m9XSjZUxE5WdEl/beiphIp1tnpzbBU7PydE17VvoX85J5CYP5u6ZKh6O0Nnpr/Paprmg VJSA== X-Gm-Message-State: ABuFfoiK6RaqIBrcdLC8qV1GU/WNK0vXvgL81zL5GWv7ERNwkkXbGqCE Su3moBzyd+yJeL7hk9GJAhQNHbVwstCjZ/D6R2EdwCo8e4XQhowGExFv1aRhdd0h0BRHxwE0tnW aQc8pbp3GwlOvCNHkQA1dypBvqz0zXtKICt0N9MVjPg== X-Received: by 2002:ac8:1b55:: with SMTP id p21-v6mr20664518qtk.56.1539704307160; Tue, 16 Oct 2018 08:38:27 -0700 (PDT) X-Google-Smtp-Source: ACcGV61Y9birqO7tmxPjIXmiWCKtrBli3GlZhIp/t3huOoTGuTa4bs7jRsJj2GUoPHnurbZG1ckqFQ== X-Received: by 2002:ac8:1b55:: with SMTP id p21-v6mr20664492qtk.56.1539704306971; Tue, 16 Oct 2018 08:38:26 -0700 (PDT) Received: from localhost.localdomain ([179.159.57.206]) by smtp.gmail.com with ESMTPSA id c44-v6sm11082673qtb.53.2018.10.16.08.38.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Oct 2018 08:38:26 -0700 (PDT) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [SRU Xenial][PATCH 1/2] UBUNTU: SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI command requeue Date: Tue, 16 Oct 2018 12:38:19 -0300 Message-Id: <20181016153820.29487-2-mfo@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181016153820.29487-1-mfo@canonical.com> References: <20181016153820.29487-1-mfo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1798110 The 'reqs' counter is incremented in the SCSI .queuecommand() path, right before virtscsi_queuecommand() is called, in either - virtscsi_queuecommand_single(), or - virtscsi_queuecommand_multi(), via virtscsi_pick_vq{_mq}(). And it's decremented only in the SCSI command completion callback (after the command is successfully queued and completed by adapter): - virtscsi_complete_cmd(). This allows for the counter to be incremented but _not_ decremented if virtscsi_queuecommand() gets an error to add/kick the command to the virtio ring (i.e., virtscsi_kick_cmd() fails with not 0 nor EIO). static virtscsi_queuecommand(...) { ... ret = virtscsi_kick_cmd(...) if (ret == -EIO) { ... virtscsi_complete_cmd(vscsi, cmd); ... } else if (ret != 0) { return SCSI_MLQUEUE_HOST_BUSY; } return 0; } In that case, the return code SCSI_MLQUEUE_HOST_BUSY causes the SCSI command to be requeued by the SCSI layer, which sends it again later in the .queuecommand() path -- incrementing the reqs counter _again_. This may happen many times for the same SCSI command, depending on the virtio ring condition/implementation, so the reqs counter will never return to zero (it's decremented only once in the completion callback). And it may happen for (m)any SCSI commands in this path. Unfortunately.. that causes a problem with a downstream/SAUCE patch for Xenial, which uses the reqs counter to sync with the completion callback: commit f1f609d8015e ("UBUNTU: SAUCE: (no-up) virtio-scsi: Fix race in target free"), and waits for the value to become zero. This problem plus that patch prevent the SCSI target removal from finishing, eventually causing a CPU soft lockup on another CPU that is waiting for some kernel resource that is/remains locked in the stack chain of this CPU. This has been verified 1) with a synthetic test case with QEMU+GDB that fakes the number of available elements in virtio ring for one time (harmless), so to force the SCSI command to be requeued, then uses QEMU monitor to remove the virtio-scsi target. _AND_ 2) with the test-case reported by the customer (a for-loop on a cloud instance that repeatedly mounts the virtio-scsi drive, copy data out of it, unmount it, then detach the virtio-scsi drive). (Here, the problem usually happens in the 1st or 2nd iteration, but with the patch it has run for 35 iterations without any problems). Upstream has done away with the reqs counter (originally used only to check if any requests were still active, for steering; not for our sync purposes). Instead of trying to find an alternative sync way for now let's just fix the behavior which we know is incorrect. Signed-off-by: Mauricio Faria de Oliveira Tested-by: Mauricio Faria de Oliveira Tested-by: David Coronel Acked-by: Stefan Bader --- drivers/scsi/virtio_scsi.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index b4a41d581021..572722e86bef 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -571,6 +571,10 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi, virtscsi_complete_cmd(vscsi, cmd); spin_unlock_irqrestore(&req_vq->vq_lock, flags); } else if (ret != 0) { + /* The SCSI command requeue will increment 'tgt->reqs' again. */ + struct virtio_scsi_target_state *tgt = + scsi_target(sc->device)->hostdata; + atomic_dec(&tgt->reqs); return SCSI_MLQUEUE_HOST_BUSY; } return 0;