From patchwork Thu Oct 19 16:39:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1851911 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SBD2p5kLRz20Zj for ; Fri, 20 Oct 2023 03:39:46 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1qtW3e-0003nV-Pw; Thu, 19 Oct 2023 16:39:38 +0000 Received: from mail-co1nam11on2044.outbound.protection.outlook.com ([40.107.220.44] helo=NAM11-CO1-obe.outbound.protection.outlook.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1qtW3c-0003nD-EY for kernel-team@lists.ubuntu.com; Thu, 19 Oct 2023 16:39:36 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SJz7eHY9bW7prqrkSGCmgd4xGHk/8gHUvM1Qtjj5XlfUlOscYsnYxlCIyOC3lXRoD3ZfNrQG5rynTXLHKlXVOD+N1yGQiAMLR7uvAOmputlK0eNcVu7vCZN8YiR6ICtneuhMicIHiLGuYfJJ6qpwcedUf5b2KyR3OjK/pp0iI/QfQm4twM/5tZMJG7b4sGIxuJ8HLc+NesqGEcVqBSK9xTQJ+DKLQeYWdKGRDHRpQTfG/m1puyJBWYrNg6VVcUL37UJ8cdtlZEKjFwicEaJkHNzSr94prZgHEEHe09fVQFoyDSvHsVJvDF2p/e8lst0F+DX5h10u13Chm0AGhWIE7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nX8kEVCse5zlbCL5vzF1OMpSwsi27NoLwS2WwUN4PEg=; b=XFtjE/uNIFJ4iU/5gZIdDNfJ0KofiMFcsAuV4rYk8ttD4MOrOdVzmBKIGphFEsK7fdOC08YKU80jw9O7t1XOFlryQ+unLT9pWELTprJZolHWAOkOizQtwq+evuSQe50ZasV4CUbl3ZzPjqoqDM3fxVxeKLi40O+/VNHwtlzl9TY7nmlxe3KNhKmf7QBhULxmLUHPvT+qxofp5A8whQCqezlbcF/PPFTmCOmK3or9lSuzNMLDbkTgioAroiDhDp1KatLYfY8RSXMKWhqIavlT/vd/hwJi29AEziUX7mijIvo302ox8hkje8XkbbZedMLeqmqmHiUnWcQb9SK92uy7IA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=lists.ubuntu.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none Received: from CH0PR03CA0191.namprd03.prod.outlook.com (2603:10b6:610:e4::16) by CO6PR12MB5491.namprd12.prod.outlook.com (2603:10b6:303:13b::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.24; Thu, 19 Oct 2023 16:39:32 +0000 Received: from DS3PEPF000099DD.namprd04.prod.outlook.com (2603:10b6:610:e4:cafe::f8) by CH0PR03CA0191.outlook.office365.com (2603:10b6:610:e4::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.23 via Frontend Transport; Thu, 19 Oct 2023 16:39:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DS3PEPF000099DD.mail.protection.outlook.com (10.167.17.199) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.20 via Frontend Transport; Thu, 19 Oct 2023 16:39:31 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Thu, 19 Oct 2023 09:39:17 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Thu, 19 Oct 2023 09:39:17 -0700 Received: from witu-mlt.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.986.41 via Frontend Transport; Thu, 19 Oct 2023 09:39:16 -0700 From: William Tu To: Subject: [SRU][J:linux-bluefield][PATCH v2 0/1] Devlink backport: fix race and lock issue Date: Thu, 19 Oct 2023 09:39:15 -0700 Message-ID: <20231019163916.4338-1-witu@nvidia.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF000099DD:EE_|CO6PR12MB5491:EE_ X-MS-Office365-Filtering-Correlation-Id: f9a44471-e1b1-4ebe-24dd-08dbd0c1f898 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: H4efAsHEJjTK+q78vH4U4m2ILrue7oyIJAu30+QLJaR5x0sWIdAS7bcMyZJ7TutV+QpBEbzznjYQzlroCJNg87oho3M9XIN4p7PhN8SCnkWXQFdWynEHm6So7hvUdBcM69+d4fP3cyN/JN0ebB8CnT1N8empApxQNpZSrcpSVITMTGy+r73RXszOVUzxqWSCc9HHdgLquiT2nVBKJGlmYMpRigNyq70hdx+bqxb3wHKoD2zPknneLab10SBDQnzuWegKFb6WdyrPBcQp3yaRHtyBY6Voyjz+kGrUBgA9BeFbqLlnhIa+xt6RSp3zmKxkqyzR4ITjwTECNez1mRYZ5IDZEG9/g+RjTuBchcFlbpK62u16Eq+iAa2G5a3uQzegnLCOmw6b0vns1FQtQrACUEirc0xp0vPA+p1+lkpnEQbNwQV7EyzuTQhy/A5hm20ZW5XmdJeYqDkYQ4XkcuTdkUnithWbadNkzydfkL4dHmDEHVNP6yj3BXiXJapnn3cIhcL3OfhsRqOAEw0hwaRR8B04o2kb0eUYeNHDJPj5UYvG0y/sTYo7xlk5H3mv7tEYhLmD2c5HkOUvfnCsQ0T19tR/VzPY6vBaIkHQCuF0EmuS7UdsG6kN7ZfKEW9Xub8RDzCX7mnPE77dFXjSR/ir9csvwWUnm81EWLw6nP0AvwewQlUyH8qw+XAIo5g+1436M76ysTWuGALfVHfSt63hAHt8QRK3kxJk5MHJlke7WrCHHEBWas+LK5DPMnJib549skA7yxOUYqWd0+O5kyTgET0q3/gG3FhiLW73LRLB1uoNs8Kwb0PbQ7yDgya+pP+9 X-Forefront-Antispam-Report: CIP:216.228.118.233; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc7edge2.nvidia.com; CAT:NONE; SFS:(13230031)(4636009)(376002)(39860400002)(346002)(396003)(136003)(230922051799003)(82310400011)(186009)(64100799003)(1800799009)(451199024)(40470700004)(46966006)(36840700001)(40480700001)(356005)(40460700003)(82740400003)(26005)(7636003)(426003)(4743002)(336012)(1076003)(2616005)(107886003)(5660300002)(478600001)(966005)(41300700001)(8676002)(8936002)(86362001)(2906002)(4326008)(70206006)(316002)(54906003)(36756003)(6916009)(70586007)(7696005)(36860700001)(47076005)(83380400001)(46800400005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Oct 2023 16:39:31.9636 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f9a44471-e1b1-4ebe-24dd-08dbd0c1f898 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.118.233]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF000099DD.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR12MB5491 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: bodong@nvidia.com, majd@nvidia.com, vlad@nvidia.com Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/2039869 The patch is a follow-up from the previous devlink backport series. We've found that devlink reload hangs the system when testing against OFED 2307. [ 1089.747409] INFO: task devlink:8753 blocked for more than 120 seconds. [ 1089.760560] Tainted: G OE 5.15.0-1027-bluefield #29-Ubuntu [ 1089.775086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1089.790829] task:devlink state:D stack: 0 pid: 8753 ppid: 5090 flags:0x00000004 [ 1089.790838] Call trace: [ 1089.790840] __switch_to+0xf8/0x150 [ 1089.790857] __schedule+0x2b8/0x790 [ 1089.790865] schedule+0x64/0x140 [ 1089.790870] schedule_preempt_disabled+0x18/0x24 [ 1089.790874] __mutex_lock.constprop.0+0x1a0/0x680 [ 1089.790878] __mutex_lock_slowpath+0x40/0x90 [ 1089.790883] mutex_lock+0x64/0x70 [ 1089.790887] devl_lock+0x1c/0x30 [ 1089.790893] mlx5_detach_device+0x58/0x190 [mlx5_core] [ 1089.791055] mlx5_unload_one+0x40/0xe4 [mlx5_core] [ 1089.791177] mlx5_devlink_reload_down+0x184/0x270 [mlx5_core] [ 1089.791318] devlink_reload+0x214/0x290 Checking the OFED source code, we found this missing devl trap group also need to be backported to avoid deadlock. void mlx5_detach_device(struct mlx5_core_dev *dev, bool suspend) { ... #ifdef HAVE_DEVL_PORT_REGISTER #ifdef HAVE_DEVL_TRAP_GROUPS_REGISTER devl_assert_locked(priv_to_devlink(dev)); #else devl_lock(devlink); #endif /* HAVE_DEVL_TRAP_GROUPS_REGISTER */ #endif /* HAVE_DEVL_PORT_REGISTER */ mutex_lock(&mlx5_intf_mutex); #ifdef HAVE_DEVL_PORT_REGISTER v2: Create new BugLink Jiri Pirko (1): net: devlink: add unlocked variants of devling_trap*() functions include/net/devlink.h | 20 +++++ net/core/devlink.c | 180 ++++++++++++++++++++++++++++++++++-------- 2 files changed, 168 insertions(+), 32 deletions(-) Acked-by: Bartlomiej Zolnierkiewicz Acked-by: Thibault Ferrante