From patchwork Mon Apr 1 14:01:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Thompson X-Patchwork-Id: 1918483 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4V7Xll5c8cz23sj for ; Tue, 2 Apr 2024 01:02:58 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1rrIFI-0003dL-Qt; Mon, 01 Apr 2024 14:02:44 +0000 Received: from mail-mw2nam10on2058.outbound.protection.outlook.com ([40.107.94.58] helo=NAM10-MW2-obe.outbound.protection.outlook.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1rrIFE-0003cZ-BG for kernel-team@lists.ubuntu.com; Mon, 01 Apr 2024 14:02:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kv1i7WPC2V1q2g+9ZpqtjKWik/8elJ6TjaxNRSxjKYv3cjUUex7SUKeW9AZu0Ssgtk3lYm4kyKeWoa/qMoTEKJN+kXGGmL2WWzN0wyyA+nbY47KjVRWqsOi55NYH7vMwZ5esyZyZg95KKPxFcVMEIa+aadOZ2OmJrIGE/6Qy1DoTnGvx9AXZXbZqmR70OB82wi6wK/YqBLRE/9aVTc8WLIC9cpwsLuiKJzRZzublglbOlghrnodml/XQSoGVaCQJDbD/pnaMANpDs+n30sA3o9X80P3sOk+T0PAveU0Dyu4IseUZnCWkHc7J04x5rZnsgS2FhT3+4nIs0KZZUGhyHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NFIwV/TYpfTUE8y0JpyPANElyuK3f62Obvl6aWxtW1Y=; b=nVLP7I7P/lqpu9oQ3szx7vx4xi/yp2HlbBQ5Db0RbpnqPPgJGmO10ZluXwsm8Um8GqoHUWbrUBH4VS9unM5TNsk34y/36HzjzZsn9k5faAX5XfYjYhnl1uGU0n9Y0XUYlX/tdu+VHiSBWnqrhSGYCWO/3Ys9iZhwVPadIbZMHaDpz3rFY+Vs7/ikcrisN3RpOFQ8jKww7QCxaeH24ztYMQt7Do/PGK4ssn+JW5Hdie4+rq/P/gUqtIU/YKxA9Z7dotS/LHfR5mpTsD2GLNTnv69O2Bh5C/AktmU/v9o+tQBmbaVgSPI9SkP3Xa6SaYz0I8DD+pwSGyo23T3mV8d+zg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=lists.ubuntu.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) Received: from MN2PR20CA0041.namprd20.prod.outlook.com (2603:10b6:208:235::10) by BY5PR12MB4035.namprd12.prod.outlook.com (2603:10b6:a03:206::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Mon, 1 Apr 2024 14:02:35 +0000 Received: from BL6PEPF0001AB4A.namprd04.prod.outlook.com (2603:10b6:208:235:cafe::25) by MN2PR20CA0041.outlook.office365.com (2603:10b6:208:235::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46 via Frontend Transport; Mon, 1 Apr 2024 14:02:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BL6PEPF0001AB4A.mail.protection.outlook.com (10.167.242.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7452.22 via Frontend Transport; Mon, 1 Apr 2024 14:02:34 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Mon, 1 Apr 2024 07:01:56 -0700 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.12; Mon, 1 Apr 2024 07:01:56 -0700 Received: from vdi.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server id 15.2.1258.12 via Frontend Transport; Mon, 1 Apr 2024 07:01:56 -0700 From: David Thompson To: Subject: [SRU][J:linux-bluefield][PATCH v1 2/2] mlxbf_gige: call request_irq() after NAPI initialized Date: Mon, 1 Apr 2024 10:01:54 -0400 Message-ID: <47e93e659724e772de850a8b1a54cd0877c0cbdc.1711979679.git.davthompson@nvidia.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: References: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL6PEPF0001AB4A:EE_|BY5PR12MB4035:EE_ X-MS-Office365-Filtering-Correlation-Id: 389e5a78-10a9-4d87-8869-08dc5254618f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +cpS69cW3uhwV6aLvDBCmxrv+bXSQgJg/PohwRT64wM6xqjLOLCMprG9sTctY9FK8gPvjuUo1Jn3/glbHGPtxszcXyp+4fYxMv8L8RTyIcYVi3L7RTS7VrFQNBvf+1yzQTnp1X5efIVEwHIjB2HMxKMvZ37R3jQV1UpU9gRshm2/t0/mtCSRjhhpxcJX8pDwGvciXSTzJjKX7uvoE+bvv2P71Cm+skO59nZUo5qf2Miv1SUzwPvlOZ6bn0QYCA7h6lFjWF6VkvLUr4xg80dBVadpsYn63/By4QmiRkVmWfVZc7YFisR/ybAScsFbg7fDd3aWQdgoum3FHGYv89KSOlSM3OeiUO4T9nlljRVHKpPeJMQ1Wz7lTSEk2HAASxnmBR3AJLzaU1Cljuw6WOC36Cgy8Qy0Bfr/xzEiF/memVyH3xqCATwHUavYcI1S/OkdDP2mrvLSBaHk1uiHko5SYLrXr82uDmKzuTi4MGOn/bijPpjC8+ZtQCxyHtflkA72finCyr33kN2OHC1sZEN4PCR18X7553FmRXsHMwFPu++BrKl9tBDXCmB4Zrp82snK0xneZaSuk4O+fjrsY+5TP6Yhn7KtvRVohdNu3kQbmyebGCvxIdrPmtZzU+iCzKTlUBji+Jm7mlkspwIgo4uPTFZwCV4egb5VK7dbFeTaSh4FtXQJVDdUEeRTS6meEOfLCoe6eYFj3ouaPWYbuWpum6+iOnv+7e2S7L91QD6rMTDwtPTZSJb2HIh8RP7o8bvz X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230031)(82310400014)(376005)(36860700004)(1800799015); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2024 14:02:34.4327 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 389e5a78-10a9-4d87-8869-08dc5254618f X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BL6PEPF0001AB4A.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4035 Received-SPF: softfail client-ip=40.107.94.58; envelope-from=davthompson@nvidia.com; helo=NAM10-MW2-obe.outbound.protection.outlook.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: asmaa@nvidia.com Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/2059310 The mlxbf_gige driver encounters a NULL pointer exception in mlxbf_gige_open() when kdump is enabled. The sequence to reproduce the exception is as follows: a) enable kdump b) trigger kdump via "echo c > /proc/sysrq-trigger" c) kdump kernel executes d) kdump kernel loads mlxbf_gige module e) the mlxbf_gige module runs its open() as the the "oob_net0" interface is brought up f) mlxbf_gige module will experience an exception during its open(), something like: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000004 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000086000004 [#1] SMP CPU: 0 PID: 812 Comm: NetworkManager Tainted: G OE 5.15.0-1035-bluefield #37-Ubuntu Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024 pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : __napi_poll+0x40/0x230 sp : ffff800008003e00 x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8 x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000 x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000 x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0 x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398 x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2 x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238 Call trace: 0x0 net_rx_action+0x178/0x360 __do_softirq+0x15c/0x428 __irq_exit_rcu+0xac/0xec irq_exit+0x18/0x2c handle_domain_irq+0x6c/0xa0 gic_handle_irq+0xec/0x1b0 call_on_irq_stack+0x20/0x2c do_interrupt_handler+0x5c/0x70 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x7c/0x80 __setup_irq+0x4c0/0x950 request_threaded_irq+0xf4/0x1bc mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige] mlxbf_gige_open+0x5c/0x170 [mlxbf_gige] __dev_open+0x100/0x220 __dev_change_flags+0x16c/0x1f0 dev_change_flags+0x2c/0x70 do_setlink+0x220/0xa40 __rtnl_newlink+0x56c/0x8a0 rtnl_newlink+0x58/0x84 rtnetlink_rcv_msg+0x138/0x3c4 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x2ec/0x360 netlink_sendmsg+0x278/0x490 __sock_sendmsg+0x5c/0x6c ____sys_sendmsg+0x290/0x2d4 ___sys_sendmsg+0x84/0xd0 __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0x54/0x184 do_el0_svc+0x30/0xac el0_svc+0x48/0x160 el0t_64_sync_handler+0xa4/0x12c el0t_64_sync+0x1a4/0x1a8 Code: bad PC value ---[ end trace 7d1c3f3bf9d81885 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x2870a7a00000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x0,000005c1,a3332a5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- The exception happens because there is a pending RX interrupt before the call to request_irq(RX IRQ) executes. Then, the RX IRQ handler fires immediately after this request_irq() completes. The RX IRQ handler runs "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()" and "napi_enable()", both which happen later in the open() logic. The logic in mlxbf_gige_open() must fully initialize NAPI before any calls to request_irq() execute. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson Reviewed-by: Asmaa Mnebhi Link: https://lore.kernel.org/r/20240325183627.7641-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski (cherry picked from commit f7442a634ac06b953fc1f7418f307b25acd4cfbc) Signed-off-by: David Thompson --- .../mellanox/mlxbf_gige/mlxbf_gige_main.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index 7a6e7a035f73..d606137643cc 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -140,13 +140,10 @@ static int mlxbf_gige_open(struct net_device *netdev) control |= MLXBF_GIGE_CONTROL_PORT_EN; writeq(control, priv->base + MLXBF_GIGE_CONTROL); - err = mlxbf_gige_request_irqs(priv); - if (err) - return err; mlxbf_gige_cache_stats(priv); err = mlxbf_gige_clean_port(priv); if (err) - goto free_irqs; + return err; /* Clear driver's valid_polarity to match hardware, * since the above call to clean_port() resets the @@ -167,6 +164,10 @@ static int mlxbf_gige_open(struct net_device *netdev) napi_enable(&priv->napi); netif_start_queue(netdev); + err = mlxbf_gige_request_irqs(priv); + if (err) + goto napi_deinit; + /* Set bits in INT_EN that we care about */ int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR | MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS | @@ -183,14 +184,17 @@ static int mlxbf_gige_open(struct net_device *netdev) return 0; +napi_deinit: + netif_stop_queue(netdev); + napi_disable(&priv->napi); + netif_napi_del(&priv->napi); + mlxbf_gige_rx_deinit(priv); + tx_deinit: mlxbf_gige_tx_deinit(priv); phy_deinit: phy_stop(phydev); - -free_irqs: - mlxbf_gige_free_irqs(priv); return err; }