From patchwork Wed Jun 28 10:04:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frank Heimes X-Patchwork-Id: 1800948 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=BHGLdIi0; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qrccx3dXdz20bH for ; Wed, 28 Jun 2023 20:04:32 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1qES2D-0002eR-Vx; Wed, 28 Jun 2023 10:04:25 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1qES2C-0002dw-8g for kernel-team@lists.ubuntu.com; Wed, 28 Jun 2023 10:04:24 +0000 Received: from localhost.localdomain (2.general.fheimes.us.vpn [10.172.66.67]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 72B553F038 for ; Wed, 28 Jun 2023 10:04:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1687946664; bh=YR1DuqF3HxEY4ARaqA9YnNNPR+zYCAFiiTV6AJYPE7Q=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BHGLdIi0VNql6L4DgWRhQ/yMrxN8R9AneNu6iSqth9RdvMvQcK4NxbV3cLLfBEe10 6Vl2k4mA3RohG9aA3Tf8PaGClNVBgQh0YFMAvkoCZv/G0UDMVwl0iQ56kBm7DFLWgv yTji7TrGINAl9Lj0P8GZyb4ZAjhVJT35urhcKjQ22Q4jDxX89q0DyVxHpHnYVCNloE XzMLKxqnBkz3PKcHKaoTL99dNbn2PSK4Myrgpt49A7dKG442Ef4mLc/J+NBXXMpYDM qzXmTYlHctfv5mqSE4dlePNA0hhu/0oGKZCm3P+wOns6a4pAORahdTFmsPZOrjqVpr XjEMBTd54r5oQ== From: frank.heimes@canonical.com To: kernel-team@lists.ubuntu.com Subject: [SRU][F][PATCH 1/2] net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible Date: Wed, 28 Jun 2023 12:04:06 +0200 Message-Id: <20230628100407.153108-2-frank.heimes@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230628100407.153108-1-frank.heimes@canonical.com> References: <20230628100407.153108-1-frank.heimes@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Saeed Mahameed BugLink: https://bugs.launchpad.net/bugs/2019011 In case of pci is offline reclaim_pages_cmd() will still try to call the FW to release FW pages, cmd_exec() in this case will return a silent success without actually calling the FW. This is wrong and will cause page leaks, what we should do is to detect pci offline or command interface un-available before tying to access the FW and manually release the FW pages in the driver. In this patch we share the code to check for FW command interface availability and we call it in sensitive places e.g. reclaim_pages_cmd(). Alternative fix: 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value, command success simulation list. 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd(). Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed (backported from commit b898ce7bccf13087719c021d829dab607c175246) [marcelo.cerri: adjusted context in drivers/net/ethernet/mellanox/mlx5/core/cmd.c] Signed-off-by: Marcelo Henrique Cerri Signed-off-by: Frank Heimes --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 17 +++++++++-------- .../net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +- include/linux/mlx5/driver.h | 1 + 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index dbf73c42ece3..b9d45d6442b9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -888,6 +888,13 @@ static bool opcode_allowed(struct mlx5_cmd *cmd, u16 opcode) return cmd->allowed_opcode == opcode; } +bool mlx5_cmd_is_down(struct mlx5_core_dev *dev) +{ + return pci_channel_offline(dev->pdev) || + dev->cmd.state != MLX5_CMDIF_STATE_UP || + dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR; +} + static void cmd_work_handler(struct work_struct *work) { struct mlx5_cmd_work_ent *ent = container_of(work, struct mlx5_cmd_work_ent, work); @@ -953,10 +960,7 @@ static void cmd_work_handler(struct work_struct *work) set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state); /* Skip sending command to fw if internal error */ - if (pci_channel_offline(dev->pdev) || - dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR || - cmd->state != MLX5_CMDIF_STATE_UP || - !opcode_allowed(&dev->cmd, ent->op)) { + if (mlx5_cmd_is_down(dev) || !opcode_allowed(&dev->cmd, ent->op)) { u8 status = 0; u32 drv_synd; @@ -1784,10 +1788,7 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out, u8 token; opcode = MLX5_GET(mbox_in, in, opcode); - if (pci_channel_offline(dev->pdev) || - dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR || - dev->cmd.state != MLX5_CMDIF_STATE_UP || - !opcode_allowed(&dev->cmd, opcode)) { + if (mlx5_cmd_is_down(dev) || !opcode_allowed(&dev->cmd, opcode)) { err = mlx5_internal_err_ret_value(dev, opcode, &drv_synd, &status); MLX5_SET(mbox_out, out, status, status); MLX5_SET(mbox_out, out, syndrome, drv_synd); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c index 7f7693b709d7..4604cb6e5f65 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c @@ -367,7 +367,7 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, u32 npages; u32 i = 0; - if (dev->state != MLX5_DEVICE_STATE_INTERNAL_ERROR) + if (!mlx5_cmd_is_down(dev)) return mlx5_cmd_exec(dev, in, in_size, out, out_size); /* No hard feelings, we want our pages back! */ diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 34861b619ab0..be20d81c8d99 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -937,6 +937,7 @@ int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int mlx5_cmd_exec_polling(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int out_size); void mlx5_cmd_mbox_status(void *out, u8 *status, u32 *syndrome); +bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type); int mlx5_cmd_alloc_uar(struct mlx5_core_dev *dev, u32 *uarn); From patchwork Wed Jun 28 10:04:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frank Heimes X-Patchwork-Id: 1800950 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=EngaDHt2; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qrccz2x7Cz20bH for ; Wed, 28 Jun 2023 20:04:35 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1qES2I-0002gb-6y; Wed, 28 Jun 2023 10:04:30 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1qES2D-0002eI-EC for kernel-team@lists.ubuntu.com; Wed, 28 Jun 2023 10:04:25 +0000 Received: from localhost.localdomain (2.general.fheimes.us.vpn [10.172.66.67]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 8B5553F038 for ; Wed, 28 Jun 2023 10:04:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1687946665; bh=QOwylcjSzEV+Yyk3okqOZ8E2mVoCqGdxSFg9nDywH1A=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EngaDHt2kBKzxNgKGgkZwDgRfFaaiauMxhkztgxQTOvv9vO8mneUX5nOLZGRU9j6R nGwRbxF9NEu41f9EUR+oakWa+rIoE7tKDODqPUu5+tUDktNe6SQKchmA85g0/98tSa LKV6f3mcC9ZDEQ5bGcEfvxjTy7MSTRpRquNe70xOAF1Dz7ggCmhVthNnrczc3ZouH1 EBzzKMvT6+81EJA8kxqiX3GGM1eDSzt9Ljr/pm+AcNmIxmvy8jiPgO5h09FN6BoYJH C4q9ZFcgkKaiWlDxpvwn1IEGjtYH49S5EE7JEzxOw3fDi8bXTzv/s6ZVseJBdXlbrX fJx/M7vNJjVUw== From: frank.heimes@canonical.com To: kernel-team@lists.ubuntu.com Subject: [SRU][F][PATCH 2/2] net/mlx5: Fix handling of entry refcount when command is not issued to FW Date: Wed, 28 Jun 2023 12:04:07 +0200 Message-Id: <20230628100407.153108-3-frank.heimes@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230628100407.153108-1-frank.heimes@canonical.com> References: <20230628100407.153108-1-frank.heimes@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Moshe Shemesh BugLink: https://bugs.launchpad.net/bugs/2019011 In case command interface is down, or the command is not allowed, driver did not increment the entry refcount, but might have decrement as part of forced completion handling. Fix that by always increment and decrement the refcount to make it symmetric for all flows. Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Signed-off-by: Eran Ben Elisha Signed-off-by: Moshe Shemesh Reported-by: Jack Wang Tested-by: Jack Wang Signed-off-by: Saeed Mahameed (cherry picked from commit aaf2e65cac7f2e1ae729c2fbc849091df9699f96) Signed-off-by: Frank Heimes --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index b9d45d6442b9..f6cc1c008eee 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -959,6 +959,7 @@ static void cmd_work_handler(struct work_struct *work) cmd_ent_get(ent); set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state); + cmd_ent_get(ent); /* for the _real_ FW event on completion */ /* Skip sending command to fw if internal error */ if (mlx5_cmd_is_down(dev) || !opcode_allowed(&dev->cmd, ent->op)) { u8 status = 0; @@ -972,7 +973,6 @@ static void cmd_work_handler(struct work_struct *work) return; } - cmd_ent_get(ent); /* for the _real_ FW event on completion */ /* ring doorbell after the descriptor is valid */ mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 << ent->idx); wmb(); @@ -1586,8 +1586,8 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force cmd_ent_put(ent); /* timeout work was canceled */ if (!forced || /* Real FW completion */ - pci_channel_offline(dev->pdev) || /* FW is inaccessible */ - dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) + mlx5_cmd_is_down(dev) || /* No real FW completion is expected */ + !opcode_allowed(cmd, ent->op)) cmd_ent_put(ent); ent->ts2 = ktime_get_ns();