From patchwork Mon Apr 20 18:39:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kelsey Skunberg X-Patchwork-Id: 1273649 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 495b8h4FR8z9sR4; Tue, 21 Apr 2020 04:39:48 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jQbKY-0002ew-Dt; Mon, 20 Apr 2020 18:39:42 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jQbKW-0002eh-Sa for kernel-team@lists.ubuntu.com; Mon, 20 Apr 2020 18:39:40 +0000 Received: from mail-il1-f199.google.com ([209.85.166.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jQbKW-0007uS-Hf for kernel-team@lists.ubuntu.com; Mon, 20 Apr 2020 18:39:40 +0000 Received: by mail-il1-f199.google.com with SMTP id z24so12961306ilk.23 for ; Mon, 20 Apr 2020 11:39:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=qRwjSe1EIgth0HAsxa9IdFrgt7TwyZQD/UfJ7mMX+t4=; b=MOB5Bag6/WZ/75iq0b4Q8rRdadmbOGjruHWPtvxS6r3Bln2XWMceuG0Bos7eQXHGfh E2hKaizcOF5IchSGbHrhTBCEDxlaByjEzXXHh3mCLZ55ngyJ0OzC72xHX6Z1rATICsqi 6ahzL5vkpFof4gO623uvEqxX2VFz3zx+4DSdLOb2nQrm7MXX+6mtx6C+zKLDuFYzcS+X NsVh5bEhXNbQ8CW0qfQ77idLVhE4hC/dIYPlPofAbftMuXnbOcp6gUgvB7HRR0XrpvyD 6aNL/XHYIs0x37nLPxCvVEcqvXWQiJmkuQeyP/nsFULqsnjf+TTGB8llfvT7zPoCkZNi 23xg== X-Gm-Message-State: AGi0PuZZBMHW7OdDE//S5SWd/d8blsHXSWnsxWE9hFE3pbrrwPDBbrnH Z836YnijXQWnshdNwWn30Yq/4qcbm2EsFNaWLpWoB4HZBCmI7MMycsNJdb3TNTImUFu6ZeQdebs uJb7qUlZfL4q6KNE7A8xJnMT4EYvYzxYrbNO9745A4Q== X-Received: by 2002:a92:8951:: with SMTP id n78mr14814693ild.184.1587407979296; Mon, 20 Apr 2020 11:39:39 -0700 (PDT) X-Google-Smtp-Source: APiQypLjlUB5AbZVvO0W/FsjoH9YkFT1UJQYVZMxrX5GDo1Mnmc6fSYguo9Ciza1wOt/6o+ZIyhSgA== X-Received: by 2002:a92:8951:: with SMTP id n78mr14814670ild.184.1587407979011; Mon, 20 Apr 2020 11:39:39 -0700 (PDT) Received: from localhost.localdomain (c-73-243-191-173.hsd1.co.comcast.net. [73.243.191.173]) by smtp.gmail.com with ESMTPSA id x7sm5819ioj.39.2020.04.20.11.39.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Apr 2020 11:39:38 -0700 (PDT) From: Kelsey Skunberg To: kernel-team@lists.ubuntu.com Subject: [SRU][B][F][PATCH 0/3] Fix LIO from hanging in iscsit_free_session and iscsit_stop_session Date: Mon, 20 Apr 2020 12:39:33 -0600 Message-Id: <20200420183936.40908-1-kelsey.skunberg@canonical.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mcoleman@datto.com Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1871688 SRU Justification [Impact] (Following details are from the bug report) The target subsystem (LIO) can hang if multiple threads try to destroy iSCSI sessions simultaneously. This is reproducible on systems that have multiple targets with initiators regularly connecting/disconnecting. This may happen when a "targetcli iscsi/iqn.../tpg1 disable" command is executed when a logout operation is underway. The iscsi target doesn't handle such events in a correct way: two or more threads may end up sleeping while waiting for the driver to close the remaining connections on the session. When the connections are closed, the driver wakes up only the first thread that will then proceed to destroy the session structure. The remaining threads are blocked there forever, waiting on a completion synchronization mechanism that doesn't exist in memory anymore because it has been freed by the first thread. Note that if the blocked threads are somehow forced to wake up, they will try to free the same iSCSI session structure destroyed by the first thread, causing double frees, memory corruptions, etc. The driver has been reorganized so the concurrent threads will set a flag in the session structure to notify the driver that the session should be destroyed; then, they wait for the driver to close the remaining connections. When the connections are all closed, the driver will wake up all the threads and will wait for the refcount of the iSCSI session structure to reach zero. When the last thread wakes up, the refcount is decreased to zero and the driver can proceed to destroy the session structure because no one is referencing it anymore. Bug reporter witnessed this happening on hundreds of Ubuntu 16.04.5 systems. States this is a regression, because this did not occur several years ago. No detailed records from that far back to determine exactly which kernel reporter was running that was not affected by this bug (Believes it was either 4.8.x or 4.10.x). Attached in the bug report is the requested uname, version_signature, dmesg, and lspci from reporter's system. However, the reporter has seen this happen on a wide array of hardware: 2 to 24 cores, 8GB to 256GB RAM, both AMD and Intel CPUs, onboard storage and PCIe SAS cards, etc. This has been fixed in the upstream master branch, but it hasn't yet been backported to "-stable". [Fixes] These three commits should be backported: * https://github.com/torvalds/linux/commit/e49a7d994379278d3353d7ffc7994672752fb0ad * https://github.com/torvalds/linux/commit/57c46e9f33da530a2485fa01aa27b6d18c28c796 * https://github.com/torvalds/linux/commit/626bac73371eed79e2afa2966de393da96cf925e [Test] This is reproducible on systems that have multiple targets with initiators regularly connecting/disconnecting by having multiple threads try and destroy iSCSI sessions simultaneiously. This may happen when a "targetcli iscsi/iqn.../tpg1 disable" command is executed when a logout operation is underway. [Regression Risk] Low, cherry picked from upstream with no changes. Verified applies cleanly to Bionic/master-next and Focal/master-next. Build tests pass. Maurizio Lombardi (3): scsi: target: remove boilerplate code scsi: target: fix hang when multiple threads try to destroy the same iscsi session scsi: target: iscsi: calling iscsit_stop_session() inside iscsit_close_session() has no effect drivers/target/iscsi/iscsi_target.c | 82 ++++++-------------- drivers/target/iscsi/iscsi_target.h | 1 - drivers/target/iscsi/iscsi_target_configfs.c | 5 +- drivers/target/iscsi/iscsi_target_login.c | 5 +- include/target/iscsi/iscsi_target_core.h | 2 +- 5 files changed, 32 insertions(+), 63 deletions(-) Acked-by: Kleber Sacilotto de Souza Acked-by: Kamal Mostafa