From patchwork Sun May 5 00:32:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095322 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="MMBGOOp6"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRgD4jwxz9s4V for ; Sun, 5 May 2019 10:33:28 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727129AbfEEAdC (ORCPT ); Sat, 4 May 2019 20:33:02 -0400 Received: from mail-eopbgr70082.outbound.protection.outlook.com ([40.107.7.82]:14048 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726768AbfEEAdC (ORCPT ); Sat, 4 May 2019 20:33:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WRWsz8Uy5n6Qg3wQzUWljiXDcE5ZD4hR5Qt9vlEnrGg=; b=MMBGOOp6uIHaEVpAYoqiiIOxS+3gj+8yDO4mmGITLAYHyv/ZRopgu3O0a+JAQt21SQhCOSrAyHb2XgiICjG3l6fmUkuNuo0oHM0/eTgm3scsbiIn/oy4vSAAoC/NpOnZdnORtRgvWWv+scFLG6ebGsB+549ROHWJzrbmFD8VP/0= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:32:55 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:32:55 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Eran Ben Elisha , Moshe Shemesh , Saeed Mahameed Subject: [net-next 01/15] net/mlx5: Move all devlink related functions calls to devlink.c Thread-Topic: [net-next 01/15] net/mlx5: Move all devlink related functions calls to devlink.c Thread-Index: AQHVAtoUOKUe8hLhh0CdLgexXl4EzA== Date: Sun, 5 May 2019 00:32:55 +0000 Message-ID: <20190505003207.1353-2-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ecd81832-9e36-491f-bd05-08d6d0f13745 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:5516; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: TS5xb4UX5PyBn3kgAzDmm7vjHCgrrjjjFH6hyN8SeKQ/lvhip83pc/xRqRWOZDzCXxP3vnbyBE2QP7VWYeXJA5qJBcX7TNM0ufORZswKISoYIRRd9BMhdxZ5B0ntsz4gL/WbKbplFjX43ptnbCBmuSO2KH5XZVfDeqRlo3Wb4cda3Y2YRQvglQnDZbVCOw0qb1fJ3mMCJwVJJE4ONyVnGImAlgHuD36razpIVo5d/vFvhQUQ/HqxN/yKFnPHnF+O8X/WQfDfKDqP8gGP0mXXXX7br+20D66XJGtWzOlq/pq5NhpvdCA2+TChNXLdlu8vhn5IuzzEHxJvRNBgafHKADJhnBwNEXNYyRlgXSaz9JXW0UukSBr1FS+gMBkRKLJz9nFNwEYTNRdWEXDUsb91gjcyh7dfRUsq4nQ2Vu5GpLc= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: ecd81832-9e36-491f-bd05-08d6d0f13745 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:32:55.5538 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eran Ben Elisha Centralize all devlink related callbacks in one file. In the downstream patch, some more functionality will be added, this patch is preparing the driver infrastructure for it. Currently, move devlink un/register functions calls into this file. Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 14 ++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 12 ++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 +++-- 4 files changed, 30 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 243368dc23db..03831a1c02fd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -15,7 +15,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ health.o mcg.o cq.o alloc.o qp.o port.o mr.o pd.o \ transobj.o vport.o sriov.o fs_cmd.o fs_core.o \ fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \ - lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o + lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o # # Netdev basic diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c new file mode 100644 index 000000000000..72ff27f57817 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2019 Mellanox Technologies */ + +#include + +int mlx5_devlink_register(struct devlink *devlink, struct device *dev) +{ + return devlink_register(devlink, dev); +} + +void mlx5_devlink_unregister(struct devlink *devlink) +{ + devlink_unregister(devlink); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h new file mode 100644 index 000000000000..2242d73e8420 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2019, Mellanox Technologies */ + +#ifndef __MLX5_DEVLINK_H__ +#define __MLX5_DEVLINK_H__ + +#include + +int mlx5_devlink_register(struct devlink *devlink, struct device *dev); +void mlx5_devlink_unregister(struct devlink *devlink); + +#endif /* __MLX5_DEVLINK_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 61fa1d162d28..96917f444bef 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -56,6 +56,7 @@ #include "fs_core.h" #include "lib/mpfs.h" #include "eswitch.h" +#include "devlink.h" #include "lib/mlx5.h" #include "fpga/core.h" #include "fpga/ipsec.h" @@ -1312,7 +1313,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *id) request_module_nowait(MLX5_IB_MOD); - err = devlink_register(devlink, &pdev->dev); + err = mlx5_devlink_register(devlink, &pdev->dev); if (err) goto clean_load; @@ -1337,7 +1338,7 @@ static void remove_one(struct pci_dev *pdev) struct mlx5_core_dev *dev = pci_get_drvdata(pdev); struct devlink *devlink = priv_to_devlink(dev); - devlink_unregister(devlink); + mlx5_devlink_unregister(devlink); mlx5_unregister_device(dev); if (mlx5_unload_one(dev, true)) { From patchwork Sun May 5 00:32:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095324 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="qAGhWxdJ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRgG6W9Nz9sB8 for ; Sun, 5 May 2019 10:33:30 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727314AbfEEAdN (ORCPT ); Sat, 4 May 2019 20:33:13 -0400 Received: from mail-eopbgr70082.outbound.protection.outlook.com ([40.107.7.82]:14048 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727176AbfEEAdM (ORCPT ); Sat, 4 May 2019 20:33:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BO11ggEmUuqV7sSE1Cp7j0kEhb8/QWJuYTKF+z6x3RE=; b=qAGhWxdJ3mLBe4A5ev43j2uQjfrliWlIlVRByJuxbdjp79X83ltG//hB57c7SelHnwUWESkWoKz+48O8s3kBIo9ripaznqpwFoi6cwzZTnR0p6zmDScXMDpMDmHRcbbvogp0VcOV9vlSatbSlde6s5qehJc7eLtlONDZo5MR3+o= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:32:58 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:32:58 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Alex Vesker , Feras Daoud , Saeed Mahameed Subject: [net-next 02/15] net/mlx5: Add Vendor Specific Capability access gateway Thread-Topic: [net-next 02/15] net/mlx5: Add Vendor Specific Capability access gateway Thread-Index: AQHVAtoWFv0FAO1zn0CrC5PGZDJH0A== Date: Sun, 5 May 2019 00:32:57 +0000 Message-ID: <20190505003207.1353-3-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 0cec8efd-99e8-4f53-99d6-08d6d0f13875 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:2512; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(30864003)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006)(309714004); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: RJHZF9pHz/kwZo7oq9vgTUgTSKK3Sg6EC7vWVumaV34vvLg0YNjTUlAWUPeR83GZ/bKrHRKesVhJDifdtPmrdBP2W3hfwZ5VPBtnbKprtTBGTMZUhrzwJ3bI9TUrrFgTHaXL2p7eqzFHpd+QRZcOx56lIlflVtwx7y7CPcRtFxLxmDO1Jw8ITHew9Wio9I/t0af1Sbn/b2gkmMQIM4RWGbU1siQrErBDibZq9G8DWatdFassL3Mtmi0uoQjhwHtoS1kYY1A4Rhk2RTNYZkqQN+el/cpR8LCgO+1K3acllwDcHxzPAuY3rt1IHTijVGhCFqk2TIUJtZNByoHAfu1MIQxPyBPfSGZYAyi2aCe/o7xA0tKy7ShMllwzM1Ku56mqoyMjM6b+uDahCG5LdUPijOJUqPpMP46R+AwHCe2kIf4= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0cec8efd-99e8-4f53-99d6-08d6d0f13875 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:32:57.9816 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Alex Vesker The Vendor Specific Capability (VSC) is used to activate a gateway interfacing with the device. The gateway is used to read or write device configurations, which are organized in different domains (spaces). A configuration access may result in multiple actions, reads, writes. Example usages are accessing the Crspace domain to read the crspace or locking a device semaphore using the Semaphore domain. The configuration access use pci_cfg_access to prevent parallel access to the VSC space by the driver and userspace calls. Signed-off-by: Alex Vesker Signed-off-by: Feras Daoud Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/Makefile | 3 +- .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 283 ++++++++++++++++++ .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 25 ++ .../net/ethernet/mellanox/mlx5/core/main.c | 3 + include/linux/mlx5/driver.h | 1 + 5 files changed, 314 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 03831a1c02fd..34d9a079b608 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -15,7 +15,8 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ health.o mcg.o cq.o alloc.o qp.o port.o mr.o pd.o \ transobj.o vport.o sriov.o fs_cmd.o fs_core.o \ fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \ - lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o + lib/devcom.o lib/pci_vsc.o diag/fs_tracepoint.o \ + diag/fw_tracer.o devlink.o # # Netdev basic diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c new file mode 100644 index 000000000000..f42890bdd6b1 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -0,0 +1,283 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2019 Mellanox Technologies */ + +#include +#include "mlx5_core.h" +#include "pci_vsc.h" + +#define MLX5_EXTRACT_C(source, offset, size) \ + ((((u32)(source)) >> (offset)) & MLX5_ONES32(size)) +#define MLX5_EXTRACT(src, start, len) \ + (((len) == 32) ? (src) : MLX5_EXTRACT_C(src, start, len)) +#define MLX5_ONES32(size) \ + ((size) ? (0xffffffff >> (32 - (size))) : 0) +#define MLX5_MASK32(offset, size) \ + (MLX5_ONES32(size) << (offset)) +#define MLX5_MERGE_C(rsrc1, rsrc2, start, len) \ + ((((rsrc2) << (start)) & (MLX5_MASK32((start), (len)))) | \ + ((rsrc1) & (~MLX5_MASK32((start), (len))))) +#define MLX5_MERGE(rsrc1, rsrc2, start, len) \ + (((len) == 32) ? (rsrc2) : MLX5_MERGE_C(rsrc1, rsrc2, start, len)) +#define vsc_read(dev, offset, val) \ + pci_read_config_dword((dev)->pdev, (dev)->vsc_addr + (offset), (val)) +#define vsc_write(dev, offset, val) \ + pci_write_config_dword((dev)->pdev, (dev)->vsc_addr + (offset), (val)) +#define VSC_MAX_RETRIES 2048 + +enum mlx5_vsc_state { + MLX5_VSC_UNLOCK, + MLX5_VSC_LOCK, +}; + +enum { + VSC_CTRL_OFFSET = 0x4, + VSC_COUNTER_OFFSET = 0x8, + VSC_SEMAPHORE_OFFSET = 0xc, + VSC_ADDR_OFFSET = 0x10, + VSC_DATA_OFFSET = 0x14, + + VSC_FLAG_BIT_OFFS = 31, + VSC_FLAG_BIT_LEN = 1, + + VSC_SYND_BIT_OFFS = 30, + VSC_SYND_BIT_LEN = 1, + + VSC_ADDR_BIT_OFFS = 0, + VSC_ADDR_BIT_LEN = 30, + + VSC_SPACE_BIT_OFFS = 0, + VSC_SPACE_BIT_LEN = 16, + + VSC_SIZE_VLD_BIT_OFFS = 28, + VSC_SIZE_VLD_BIT_LEN = 1, + + VSC_STATUS_BIT_OFFS = 29, + VSC_STATUS_BIT_LEN = 3, +}; + +void mlx5_vsc_init(struct mlx5_core_dev *dev) +{ + dev->vsc_addr = pci_find_capability(dev->pdev, + PCI_CAP_ID_VNDR); + if (!dev->vsc_addr) + mlx5_core_warn(dev, "Failed to get valid vendor specific ID\n"); +} + +int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev) +{ + u32 counter = 0; + int retries = 0; + u32 lock_val; + int ret; + + pci_cfg_access_lock(dev->pdev); + do { + if (retries > VSC_MAX_RETRIES) { + ret = -EBUSY; + goto pci_unlock; + } + + /* Check if semaphore is already locked */ + ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val); + if (ret) + goto pci_unlock; + + if (lock_val) { + retries++; + usleep_range(1000, 2000); + continue; + } + + /* Read and write counter value, if written value is + * the same, semaphore was acquired successfully. + */ + ret = vsc_read(dev, VSC_COUNTER_OFFSET, &counter); + if (ret) + goto pci_unlock; + + ret = vsc_write(dev, VSC_SEMAPHORE_OFFSET, counter); + if (ret) + goto pci_unlock; + + ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val); + if (ret) + goto pci_unlock; + + retries++; + } while (counter != lock_val); + + return 0; + +pci_unlock: + pci_cfg_access_unlock(dev->pdev); + return ret; +} + +int mlx5_vsc_gw_unlock(struct mlx5_core_dev *dev) +{ + int ret; + + ret = vsc_write(dev, VSC_SEMAPHORE_OFFSET, MLX5_VSC_UNLOCK); + pci_cfg_access_unlock(dev->pdev); + return ret; +} + +int mlx5_vsc_gw_set_space(struct mlx5_core_dev *dev, u16 space, + u32 *ret_space_size) +{ + int ret; + u32 val = 0; + + if (!mlx5_vsc_accessible(dev)) + return -EINVAL; + + if (ret_space_size) + *ret_space_size = 0; + + /* Get a unique val */ + ret = vsc_read(dev, VSC_CTRL_OFFSET, &val); + if (ret) + goto out; + + /* Try to modify the lock */ + val = MLX5_MERGE(val, space, VSC_SPACE_BIT_OFFS, VSC_SPACE_BIT_LEN); + ret = vsc_write(dev, VSC_CTRL_OFFSET, val); + if (ret) + goto out; + + /* Verify lock was modified */ + ret = vsc_read(dev, VSC_CTRL_OFFSET, &val); + if (ret) + goto out; + + if (MLX5_EXTRACT(val, VSC_STATUS_BIT_OFFS, VSC_STATUS_BIT_LEN) == 0) + return -EINVAL; + + /* Get space max address if indicated by size valid bit */ + if (ret_space_size && + MLX5_EXTRACT(val, VSC_SIZE_VLD_BIT_OFFS, VSC_SIZE_VLD_BIT_LEN)) { + ret = vsc_read(dev, VSC_ADDR_OFFSET, &val); + if (ret) { + mlx5_core_warn(dev, "Failed to get max space size\n"); + goto out; + } + *ret_space_size = MLX5_EXTRACT(val, VSC_ADDR_BIT_OFFS, + VSC_ADDR_BIT_LEN); + } + return 0; + +out: + return ret; +} + +static int mlx5_vsc_wait_on_flag(struct mlx5_core_dev *dev, u8 expected_val) +{ + int retries = 0; + u32 flag; + int ret; + + do { + if (retries > VSC_MAX_RETRIES) + return -EBUSY; + + ret = vsc_read(dev, VSC_ADDR_OFFSET, &flag); + if (ret) + return ret; + flag = MLX5_EXTRACT(flag, VSC_FLAG_BIT_OFFS, VSC_FLAG_BIT_LEN); + retries++; + + if ((retries & 0xf) == 0) + usleep_range(1000, 2000); + + } while (flag != expected_val); + + return 0; +} + +static int mlx5_vsc_gw_write(struct mlx5_core_dev *dev, unsigned int address, + u32 data) +{ + int ret; + + if (MLX5_EXTRACT(address, VSC_SYND_BIT_OFFS, + VSC_FLAG_BIT_LEN + VSC_SYND_BIT_LEN)) + return -EINVAL; + + /* Set flag to 0x1 */ + address = MLX5_MERGE(address, 1, VSC_FLAG_BIT_OFFS, 1); + ret = vsc_write(dev, VSC_DATA_OFFSET, data); + if (ret) + goto out; + + ret = vsc_write(dev, VSC_ADDR_OFFSET, address); + if (ret) + goto out; + + /* Wait for the flag to be cleared */ + ret = mlx5_vsc_wait_on_flag(dev, 0); + +out: + return ret; +} + +static int mlx5_vsc_gw_read(struct mlx5_core_dev *dev, unsigned int address, + u32 *data) +{ + int ret; + + if (MLX5_EXTRACT(address, VSC_SYND_BIT_OFFS, + VSC_FLAG_BIT_LEN + VSC_SYND_BIT_LEN)) + return -EINVAL; + + ret = vsc_write(dev, VSC_ADDR_OFFSET, address); + if (ret) + goto out; + + ret = mlx5_vsc_wait_on_flag(dev, 1); + if (ret) + goto out; + + ret = vsc_read(dev, VSC_DATA_OFFSET, data); +out: + return ret; +} + +static int mlx5_vsc_gw_read_fast(struct mlx5_core_dev *dev, + unsigned int read_addr, + unsigned int *next_read_addr, + u32 *data) +{ + int ret; + + ret = mlx5_vsc_gw_read(dev, read_addr, data); + if (ret) + goto out; + + ret = vsc_read(dev, VSC_ADDR_OFFSET, next_read_addr); + if (ret) + goto out; + + *next_read_addr = MLX5_EXTRACT(*next_read_addr, VSC_ADDR_BIT_OFFS, + VSC_ADDR_BIT_LEN); + + if (*next_read_addr <= read_addr) + ret = -EINVAL; +out: + return ret; +} + +int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, + int length) +{ + unsigned int next_read_addr = 0; + unsigned int read_addr = 0; + + while (read_addr < length) { + if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr, + &data[(read_addr >> 2)])) + return read_addr; + + read_addr = next_read_addr; + } + return length; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h new file mode 100644 index 000000000000..c6ebf59006c5 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2019 Mellanox Technologies */ + +#ifndef __MLX5_PCI_VSC_H__ +#define __MLX5_PCI_VSC_H__ + +enum { + MLX5_VSC_SPACE_SCAN_CRSPACE = 0x7, +}; + +void mlx5_vsc_init(struct mlx5_core_dev *dev); +void mlx5_vsc_cleanup(struct mlx5_core_dev *dev); +int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev); +int mlx5_vsc_gw_unlock(struct mlx5_core_dev *dev); +int mlx5_vsc_gw_set_space(struct mlx5_core_dev *dev, u16 space, + u32 *ret_space_size); +int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, + int length); + +static inline bool mlx5_vsc_accessible(struct mlx5_core_dev *dev) +{ + return !!dev->vsc_addr; +} + +#endif /* __MLX5_PCI_VSC_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 96917f444bef..64eb2a558b30 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -65,6 +65,7 @@ #include "lib/clock.h" #include "lib/vxlan.h" #include "lib/devcom.h" +#include "lib/pci_vsc.h" #include "diag/fw_tracer.h" #include "ecpf.h" @@ -1313,6 +1314,8 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *id) request_module_nowait(MLX5_IB_MOD); + mlx5_vsc_init(dev); + err = mlx5_devlink_register(devlink, &pdev->dev); if (err) goto clean_load; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 5a39b323c52e..56d0a116f575 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -691,6 +691,7 @@ struct mlx5_core_dev { struct mlx5_ib_clock_info *clock_info; struct page *clock_info_page; struct mlx5_fw_tracer *tracer; + u32 vsc_addr; }; struct mlx5_db { From patchwork Sun May 5 00:33:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095325 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="rvmtGO8A"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRgH3Fs4z9s4V for ; Sun, 5 May 2019 10:33:31 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727325AbfEEAdV (ORCPT ); Sat, 4 May 2019 20:33:21 -0400 Received: from mail-eopbgr70082.outbound.protection.outlook.com ([40.107.7.82]:14048 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727295AbfEEAdU (ORCPT ); Sat, 4 May 2019 20:33:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5Db1n7FjCePViiTiP6YcXifJT94bwYGl7/4ITt/hA5c=; b=rvmtGO8A1LoqC8zzpwFd4fYE5tIipTn8Jzule7ULrxp2tRoiqoOCOHqDf22/nOw80gI09NiIIpZUOByoq0sefd9OHINtBsOjaCaFNvrpVQlT2CNZ78ok5qJMjtkQH/J0dQjm4Uz3lQLe5wpX82iCU64jZSzoixwsjpmqoVADkEo= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:00 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:00 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Alex Vesker , Moshe Shemesh , Feras Daoud , Saeed Mahameed Subject: [net-next 03/15] net/mlx5: Add Crdump FW snapshot support Thread-Topic: [net-next 03/15] net/mlx5: Add Crdump FW snapshot support Thread-Index: AQHVAtoXQZly9Pz+zUSTLSpGCp+U+w== Date: Sun, 5 May 2019 00:33:00 +0000 Message-ID: <20190505003207.1353-4-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ca5fff1d-7f66-4664-70a7-08d6d0f13a04 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:316; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(5024004)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: kyFEasB0srWhBMWALgSOIpDfT8+ieRE/lKEj77mgbzcXMhgMCF0Z8r/lfJyl+Z9fw6TdPbKI3LJuI/KhN4EHYdwIU/Bv66SyID0CkQENjHams5l3fDk7LfiapJLVGrIeaR1f4UtcqAehvEiKfn5IJS/csj00pWvmkdSXfEII+0gZe9ZIq//PjmILfMA7UFIJxL1+YYpVxfhXaMI3oPTxQzYkrJP5S0pzQbUgdgoafj0u8+cMAKlmkGuphsm7pBgYHxhIRGid3IjpDZs09Z89h24SoyUaiWFDpUBInBvZ0qtv8E7O9ig863G/nmpCVKHehC2Gopao536m0DYcvaTzeZGi2vepdMd3k+0Ar7mHaLbnDGdX30J1B2SC24PlsLeQAhDTZMdkT+wv5q6Yz7hofAqfBNKKvihTMAq4608i+aI= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: ca5fff1d-7f66-4664-70a7-08d6d0f13a04 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:00.1691 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Alex Vesker Crdump allows the driver to create a snapshot of the FW PCI crspace. This is useful in case of catastrophic issues which may require FW reset. The snapshot can be used for later debug. The snapshot is exposed using devlink region_snapshot in downstream patch, cr-space address regions are registered on init and snapshots are attached once a new snapshot is collected by the driver. Signed-off-by: Alex Vesker Signed-off-by: Moshe Shemesh Reviewed-by: Feras Daoud Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/Makefile | 2 +- .../ethernet/mellanox/mlx5/core/diag/crdump.c | 179 ++++++++++++++++++ .../net/ethernet/mellanox/mlx5/core/health.c | 1 + .../ethernet/mellanox/mlx5/core/lib/mlx5.h | 4 + .../net/ethernet/mellanox/mlx5/core/main.c | 5 + include/linux/mlx5/driver.h | 4 + 6 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 34d9a079b608..5feed9e1bec7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -16,7 +16,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ transobj.o vport.o sriov.o fs_cmd.o fs_core.o \ fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \ lib/devcom.o lib/pci_vsc.o diag/fs_tracepoint.o \ - diag/fw_tracer.o devlink.o + diag/fw_tracer.o diag/crdump.o devlink.o # # Netdev basic diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c new file mode 100644 index 000000000000..6430ceeefb53 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2019 Mellanox Technologies */ + +#include +#include +#include +#include "mlx5_core.h" +#include "lib/pci_vsc.h" +#include "lib/mlx5.h" + +#define BAD_ACCESS 0xBADACCE5 +#define MLX5_PROTECTED_CR_SCAN_CRSPACE 0x7 +#define MAX_NUM_OF_DUMPS_TO_STORE (8) + +static const char *region_cr_space_str = "cr-space"; + +struct mlx5_fw_crdump { + u32 size; + struct devlink_region *region_crspace; +}; + +static bool mlx5_crdump_enbaled(struct mlx5_core_dev *dev) +{ + struct mlx5_priv *priv = &dev->priv; + + return (!!priv->health.crdump); +} + +static int mlx5_crdump_fill(struct mlx5_core_dev *dev, + char *crdump_region, u32 *snapshot_id) +{ + struct devlink *devlink = priv_to_devlink(dev); + struct mlx5_priv *priv = &dev->priv; + struct mlx5_fw_crdump *crdump = priv->health.crdump; + int i, ret = 0; + u32 *cr_data; + u32 id; + + cr_data = kvmalloc(crdump->size, GFP_KERNEL); + if (!cr_data) + return -ENOMEM; + + for (i = 0; i < (crdump->size / 4); i++) + cr_data[i] = BAD_ACCESS; + + ret = mlx5_vsc_gw_read_block_fast(dev, cr_data, crdump->size); + if (ret <= 0) { + if (ret == 0) + ret = -EIO; + goto free_data; + } + + if (crdump->size != ret) { + mlx5_core_warn(dev, "failed to read full dump, read %d out of %u\n", + ret, crdump->size); + ret = -EINVAL; + goto free_data; + } + + /* Get the available snapshot ID for the dumps */ + id = devlink_region_shapshot_id_get(devlink); + ret = devlink_region_snapshot_create(crdump->region_crspace, + crdump->size, (u8 *)cr_data, + id, &kvfree); + if (ret) { + mlx5_core_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n", + region_cr_space_str, id, ret); + goto free_data; + } else { + *snapshot_id = id; + strcpy(crdump_region, region_cr_space_str); + } + return 0; + +free_data: + kvfree(cr_data); + return ret; +} + +int mlx5_crdump_collect(struct mlx5_core_dev *dev, + char *crdump_region, u32 *snapshot_id) +{ + int ret = 0; + + if (!mlx5_crdump_enbaled(dev)) + return -ENODEV; + + ret = mlx5_vsc_gw_lock(dev); + if (ret) { + mlx5_core_warn(dev, "crdump: failed to lock vsc gw err %d\n", + ret); + return ret; + } + + ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE, NULL); + if (ret) + goto unlock; + + ret = mlx5_crdump_fill(dev, crdump_region, snapshot_id); + +unlock: + mlx5_vsc_gw_unlock(dev); + return ret; +} + +int mlx5_crdump_init(struct mlx5_core_dev *dev) +{ + struct devlink *devlink = priv_to_devlink(dev); + struct mlx5_priv *priv = &dev->priv; + struct mlx5_fw_crdump *crdump; + u32 space_size; + int ret; + + if (!mlx5_core_is_pf(dev) || !mlx5_vsc_accessible(dev) || + mlx5_crdump_enbaled(dev)) + return 0; + + ret = mlx5_vsc_gw_lock(dev); + if (ret) + return ret; + + /* Check if space is supported and get space size */ + ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE, + &space_size); + if (ret) { + /* Unlock and mask error since space is not supported */ + mlx5_vsc_gw_unlock(dev); + return 0; + } + + if (!space_size) { + mlx5_core_warn(dev, "Invalid Crspace size, zero\n"); + mlx5_vsc_gw_unlock(dev); + return -EINVAL; + } + + ret = mlx5_vsc_gw_unlock(dev); + if (ret) + return ret; + + crdump = kzalloc(sizeof(*crdump), GFP_KERNEL); + if (!crdump) + return -ENOMEM; + + /* Create cr-space region */ + crdump->size = space_size; + crdump->region_crspace = + devlink_region_create(devlink, + region_cr_space_str, + MAX_NUM_OF_DUMPS_TO_STORE, + space_size); + if (IS_ERR(crdump->region_crspace)) { + mlx5_core_warn(dev, + "crdump: create devlink region %s err %ld\n", + region_cr_space_str, + PTR_ERR(crdump->region_crspace)); + ret = PTR_ERR(crdump->region_crspace); + goto free_crdump; + } + priv->health.crdump = crdump; + return 0; + +free_crdump: + kfree(crdump); + return ret; +} + +void mlx5_crdump_cleanup(struct mlx5_core_dev *dev) +{ + struct mlx5_priv *priv = &dev->priv; + struct mlx5_fw_crdump *crdump = priv->health.crdump; + + if (!crdump) + return; + + devlink_region_destroy(crdump->region_crspace); + kfree(crdump); + priv->health.crdump = NULL; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index a2656f4008d9..90f3da6da7f9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -388,6 +388,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) spin_lock_init(&health->wq_lock); INIT_WORK(&health->work, health_care); INIT_DELAYED_WORK(&health->recover_work, health_recover); + health->crdump = NULL; return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h index 397a2847867a..3c9a6dedccaa 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h @@ -41,6 +41,10 @@ int mlx5_core_reserve_gids(struct mlx5_core_dev *dev, unsigned int count); void mlx5_core_unreserve_gids(struct mlx5_core_dev *dev, unsigned int count); int mlx5_core_reserved_gid_alloc(struct mlx5_core_dev *dev, int *gid_index); void mlx5_core_reserved_gid_free(struct mlx5_core_dev *dev, int gid_index); +int mlx5_crdump_init(struct mlx5_core_dev *dev); +void mlx5_crdump_cleanup(struct mlx5_core_dev *dev); +int mlx5_crdump_collect(struct mlx5_core_dev *dev, + char *crdump_region, u32 *snapshot_id); /* TODO move to lib/events.h */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 64eb2a558b30..43f5487de4c3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1320,6 +1320,10 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto clean_load; + err = mlx5_crdump_init(dev); + if (err) + dev_err(&pdev->dev, "mlx5_crdump_init failed with error code %d\n", err); + pci_save_state(pdev); return 0; @@ -1341,6 +1345,7 @@ static void remove_one(struct pci_dev *pdev) struct mlx5_core_dev *dev = pci_get_drvdata(pdev); struct devlink *devlink = priv_to_devlink(dev); + mlx5_crdump_cleanup(dev); mlx5_devlink_unregister(devlink); mlx5_unregister_device(dev); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 56d0a116f575..ddf6f41a75d3 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -53,6 +53,7 @@ #include #include #include +#include enum { MLX5_BOARD_ID_LEN = 64, @@ -427,6 +428,8 @@ struct mlx5_sq_bfreg { unsigned int offset; }; +struct mlx5_fw_crdump; + struct mlx5_core_health { struct health_buffer __iomem *health; __be32 __iomem *health_counter; @@ -440,6 +443,7 @@ struct mlx5_core_health { unsigned long flags; struct work_struct work; struct delayed_work recover_work; + struct mlx5_fw_crdump *crdump; }; struct mlx5_qp_table { From patchwork Sun May 5 00:33:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095326 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="i5Mxq57f"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRgH6yc9z9s6w for ; Sun, 5 May 2019 10:33:31 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727355AbfEEAdY (ORCPT ); Sat, 4 May 2019 20:33:24 -0400 Received: from mail-eopbgr70082.outbound.protection.outlook.com ([40.107.7.82]:14048 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727295AbfEEAdX (ORCPT ); Sat, 4 May 2019 20:33:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=p4MtrIs1l37NQrq08rLQGFiI8dJ6s/a5kUH2Yiekg5M=; b=i5Mxq57fE1yp0XXbDsFLSw3CwBPOpAZ/9R3jrVD+wqrjLqfoVv9mXLmjHAHAh2tHPGbdR7O66ElspHsidj7/4xQ5NzZTKmPl+GJIwQOL+yWl8pcQdmlzX3HQsFQ1rbp7CmOubcYjRYQ/Mshv9mRZ49XVYVNex0+SXa8fqf9Imc0= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:02 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:02 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Alex Vesker , Feras Daoud , Moshe Shemesh , Saeed Mahameed Subject: [net-next 04/15] net/mlx5: Add support for devlink region_snapshot parameter Thread-Topic: [net-next 04/15] net/mlx5: Add support for devlink region_snapshot parameter Thread-Index: AQHVAtoZyTpGXRfQtE2y9nDMsjm/2Q== Date: Sun, 5 May 2019 00:33:02 +0000 Message-ID: <20190505003207.1353-5-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 8d1eb745-3028-4747-1066-08d6d0f13b54 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4502; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: pILYnvsMI9lNZcSrSLu15EodzD5O6yxIR3voDC9FajQXUsvaTDDnDXgfSWdkLbsbEmCVzZqX5/GUtfQUQHkDPoJLwsE0P+hJ7+u8mOGaqfF77Nnq3TijnU1DSg2KH6TxOlCxxKdJPQjSQ+d+hZa8DOcCyZF5pbGvs6jLYwQlQG8uqMcM5GExLz3kn3rrGK1FmZVmTIQfOXZ2nH7pnV16Q9h7iBGO3qmMy+Zqqba4cwFjmsk+urR3yWuXT0o2ZAEqjCG3slqyXF+5L2yLbJxOmtd6uVmCIUcmJlTDb31bWb9H/9DbnM9+Tm7KJaC8eDXwrN41s/wBev1X347nfRgF26NGaJJRQt5Bp6IMGvh7cav1xUeZiYPDXYz4aN6a5lizHd1TDLKeZuLz6vZEhprnyeHlteHBOIvp2UT5sXlArXk= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8d1eb745-3028-4747-1066-08d6d0f13b54 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:02.3466 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Alex Vesker This parameter enables capturing region snapshot of the crspace during critical errors. The default value of this parameter is disabled, it can be enabled using devlink param commands. It is possible to configure during runtime and also driver init. Command line examples: Delete snapshot id 1 from cr-space address region from device pci/0000:00:05.0 $ devlink region del pci/0000:00:05.0/cr-space snapshot 1 Dump the snapshot taken from cr-space address region with ID 1 $ devlink region dump pci/0000:00:05.0/cr-space snapshot 1 Read from address 0x10, 16 Bytes of snapshot ID 1 taken from cr-space address region $ devlink region read pci/0000:00:05.0/cr-space snapshot 1 address 0x10 length 16 Signed-off-by: Alex Vesker Reviewed-by: Feras Daoud Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Reviewed-by: Alex Vesker Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 60 ++++++++++++++++++- .../ethernet/mellanox/mlx5/core/diag/crdump.c | 22 +++++++ .../ethernet/mellanox/mlx5/core/lib/mlx5.h | 2 + 3 files changed, 83 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index 72ff27f57817..308fe64e7bcd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -2,13 +2,71 @@ /* Copyright (c) 2019 Mellanox Technologies */ #include +#include +#include "lib/mlx5.h" + +static int mlx5_devlink_get_crdump_snapshot(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + + ctx->val.vbool = mlx5_crdump_is_snapshot_enabled(dev); + return 0; +} + +static int mlx5_devlink_set_crdump_snapshot(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + + return mlx5_crdump_set_snapshot_enabled(dev, ctx->val.vbool); +} + +static const struct devlink_param mlx5_devlink_params[] = { + DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT, + BIT(DEVLINK_PARAM_CMODE_RUNTIME) | + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), + mlx5_devlink_get_crdump_snapshot, + mlx5_devlink_set_crdump_snapshot, NULL), +}; int mlx5_devlink_register(struct devlink *devlink, struct device *dev) { - return devlink_register(devlink, dev); + union devlink_param_value init_val; + int err; + + err = devlink_register(devlink, dev); + if (err) { + dev_warn(dev, + "devlink register failed (err = %d)", err); + return err; + } + + err = devlink_params_register(devlink, mlx5_devlink_params, + ARRAY_SIZE(mlx5_devlink_params)); + if (err) { + dev_err(dev, "devlink_params_register failed, err = %d\n", err); + goto unregister; + } + + init_val.vbool = false; + err = devlink_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, + init_val); + if (err) + dev_warn(dev, + "devlink param init failed (err = %d)", err); + + return 0; + +unregister: + devlink_unregister(devlink); + return err; } void mlx5_devlink_unregister(struct devlink *devlink) { + devlink_params_unregister(devlink, mlx5_devlink_params, + ARRAY_SIZE(mlx5_devlink_params)); devlink_unregister(devlink); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c index 6430ceeefb53..7337a49f2733 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c @@ -16,6 +16,7 @@ static const char *region_cr_space_str = "cr-space"; struct mlx5_fw_crdump { u32 size; + bool snapshot_enable; struct devlink_region *region_crspace; }; @@ -103,6 +104,27 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev, return ret; } +bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev) +{ + struct mlx5_priv *priv = &dev->priv; + + if (mlx5_crdump_enbaled(dev)) + return priv->health.crdump->snapshot_enable; + + return false; +} + +int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value) +{ + struct mlx5_priv *priv = &dev->priv; + + if (!mlx5_crdump_enbaled(dev)) + return -ENODEV; + + priv->health.crdump->snapshot_enable = value; + return 0; +} + int mlx5_crdump_init(struct mlx5_core_dev *dev) { struct devlink *devlink = priv_to_devlink(dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h index 3c9a6dedccaa..c639f0af29ed 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h @@ -45,6 +45,8 @@ int mlx5_crdump_init(struct mlx5_core_dev *dev); void mlx5_crdump_cleanup(struct mlx5_core_dev *dev); int mlx5_crdump_collect(struct mlx5_core_dev *dev, char *crdump_region, u32 *snapshot_id); +bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev); +int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value); /* TODO move to lib/events.h */ From patchwork Sun May 5 00:33:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095327 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="lrcJehqA"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRgJ2kRrz9sB8 for ; Sun, 5 May 2019 10:33:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727378AbfEEAd1 (ORCPT ); Sat, 4 May 2019 20:33:27 -0400 Received: from mail-eopbgr70082.outbound.protection.outlook.com ([40.107.7.82]:14048 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727331AbfEEAd0 (ORCPT ); Sat, 4 May 2019 20:33:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DoBcTBDuDHLJ9Fr4spIbK7TineiaeoBmt7+aFEmWr/Y=; b=lrcJehqAqcpBJMlXy468CgJ8QsZ2V00iOZeBAmAsKNqeEsR2fyF/79QEr/I5B97MzxObr40Mp6JBl9LvP7tbgAPRbuV5ftZUt8Lt76ehtjCNcPDdIguu/dfiQY8LzDmTSKwdkv37IHEDq/A/a3mSwQ7CbkAzPM4t1HrHGEZ/jAs= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:04 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:04 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Feras Daoud , Moshe Shemesh , Daniel Jurgens , Alex Vesker , Saeed Mahameed Subject: [net-next 05/15] net/mlx5: Handle SW reset of FW in error flow Thread-Topic: [net-next 05/15] net/mlx5: Handle SW reset of FW in error flow Thread-Index: AQHVAtoaopT79Pn7jUeLSf7j5Ci6Jg== Date: Sun, 5 May 2019 00:33:04 +0000 Message-ID: <20190505003207.1353-6-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 8e5de6f7-0321-4e9c-36e9-08d6d0f13ca5 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:7691; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(30864003)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: ZEvjCi8QLnTg6zellZ1TgX/wo2VL5j+4b06InMntIfE0KQBYKsVhi46lb7BccL9RpxsBo8F7by9GF0QECQWkYTYR1Eb/jXWQ4MIdZd2eEYEuo8UhRA2DKxGM5rIKw60OloK8jkh4NzIRWZlGlJjlAehVjUDytTim/ft6IR5MCCALs2pLUeDr9zu3G2dpVmhpOX5dQCSGKHNPRbVm3q3xiw1yfRzPYN5HKB82rnrd7mJKjxNjFw8UhPNEGdEq6xMCwjqRvot8q2Wlhkdj1UPEbrdG5fWqCpuo8lWr9hkqNGjQq7UfcHnSe11Si7uApfNsd4kyaEo8d2ruG4YjUrH5Nolyv9hbMcnpaJdUZnf6pXo8hGgqrQ4NdiVJcEVRF0qbskrDf4lISUhbagSJ1ddtVTrsUix2laVRwUmtdBknn1g= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8e5de6f7-0321-4e9c-36e9-08d6d0f13ca5 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:04.5852 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Feras Daoud New mlx5 adapters allow the driver to reset the FW in the event of an error, this action called "SW Reset". When an SW reset is issued on any PF all PFs enter reset state which is a recoverable condition. The existing recovery flow was designed to allow the recovery of a VF after a PF driver reload. This patch adds the sw reset to the NIC states as a preparation for sw reset handling. When a software reset is issued the following occurs: 1. The NIC interface mode is set to 7 while the reset is in progress. 2. Once the reset completes the NIC interface mode is set to 1. Signed-off-by: Feras Daoud Signed-off-by: Moshe Shemesh Signed-off-by: Daniel Jurgens Reviewed-by: Alex Vesker Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/en_selftest.c | 2 +- .../net/ethernet/mellanox/mlx5/core/health.c | 105 ++++++++---------- .../net/ethernet/mellanox/mlx5/core/main.c | 2 +- .../ethernet/mellanox/mlx5/core/mlx5_core.h | 2 +- include/linux/mlx5/driver.h | 3 +- 5 files changed, 48 insertions(+), 66 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c index 4382ef85488c..840ec945ccba 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -64,7 +64,7 @@ static int mlx5e_test_health_info(struct mlx5e_priv *priv) { struct mlx5_core_health *health = &priv->mdev->priv.health; - return health->sick ? 1 : 0; + return health->fatal_error ? 1 : 0; } static int mlx5e_test_link_state(struct mlx5e_priv *priv) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 90f3da6da7f9..adb40fe0f6ec 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -62,12 +62,18 @@ enum { enum { MLX5_DROP_NEW_HEALTH_WORK, - MLX5_DROP_NEW_RECOVERY_WORK, +}; + +enum { + MLX5_SENSOR_NO_ERR = 0, + MLX5_SENSOR_PCI_COMM_ERR = 1, + MLX5_SENSOR_NIC_DISABLED = 2, + MLX5_SENSOR_NIC_SW_RESET = 3, }; u8 mlx5_get_nic_state(struct mlx5_core_dev *dev) { - return (ioread32be(&dev->iseg->cmdq_addr_l_sz) >> 8) & 3; + return (ioread32be(&dev->iseg->cmdq_addr_l_sz) >> 8) & 7; } void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state) @@ -80,18 +86,25 @@ void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state) &dev->iseg->cmdq_addr_l_sz); } -static int in_fatal(struct mlx5_core_dev *dev) +static bool sensor_pci_not_working(struct mlx5_core_dev *dev) { struct mlx5_core_health *health = &dev->priv.health; struct health_buffer __iomem *h = health->health; - if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED) - return 1; + /* Offline PCI reads return 0xffffffff */ + return (ioread32be(&h->fw_ver) == 0xffffffff); +} - if (ioread32be(&h->fw_ver) == 0xffffffff) - return 1; +static u32 check_fatal_sensors(struct mlx5_core_dev *dev) +{ + if (sensor_pci_not_working(dev)) + return MLX5_SENSOR_PCI_COMM_ERR; + if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED) + return MLX5_SENSOR_NIC_DISABLED; + if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_SW_RESET) + return MLX5_SENSOR_NIC_SW_RESET; - return 0; + return MLX5_SENSOR_NO_ERR; } void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force) @@ -101,7 +114,8 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force) goto unlock; mlx5_core_err(dev, "start\n"); - if (pci_channel_offline(dev->pdev) || in_fatal(dev) || force) { + if (pci_channel_offline(dev->pdev) || + dev->priv.health.fatal_error != MLX5_SENSOR_NO_ERR || force) { dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR; mlx5_cmd_flush(dev); } @@ -137,38 +151,14 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev) mlx5_disable_device(dev); } -static void health_recover(struct work_struct *work) -{ - struct mlx5_core_health *health; - struct delayed_work *dwork; - struct mlx5_core_dev *dev; - struct mlx5_priv *priv; - u8 nic_state; - - dwork = container_of(work, struct delayed_work, work); - health = container_of(dwork, struct mlx5_core_health, recover_work); - priv = container_of(health, struct mlx5_priv, health); - dev = container_of(priv, struct mlx5_core_dev, priv); - - nic_state = mlx5_get_nic_state(dev); - if (nic_state == MLX5_NIC_IFC_INVALID) { - mlx5_core_err(dev, "health recovery flow aborted since the nic state is invalid\n"); - return; - } - - mlx5_core_err(dev, "starting health recovery flow\n"); - mlx5_recover_device(dev); -} - /* How much time to wait until health resetting the driver (in msecs) */ -#define MLX5_RECOVERY_DELAY_MSECS 60000 +#define MLX5_RECOVERY_WAIT_MSECS 60000 static void health_care(struct work_struct *work) { - unsigned long recover_delay = msecs_to_jiffies(MLX5_RECOVERY_DELAY_MSECS); struct mlx5_core_health *health; struct mlx5_core_dev *dev; struct mlx5_priv *priv; - unsigned long flags; + unsigned long end; health = container_of(work, struct mlx5_core_health, work); priv = container_of(health, struct mlx5_priv, health); @@ -176,13 +166,18 @@ static void health_care(struct work_struct *work) mlx5_core_warn(dev, "handling bad device here\n"); mlx5_handle_bad_state(dev); - spin_lock_irqsave(&health->wq_lock, flags); - if (!test_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags)) - schedule_delayed_work(&health->recover_work, recover_delay); - else - mlx5_core_err(dev, - "new health works are not permitted at this stage\n"); - spin_unlock_irqrestore(&health->wq_lock, flags); + end = jiffies + msecs_to_jiffies(MLX5_RECOVERY_WAIT_MSECS); + while (sensor_pci_not_working(dev)) { + if (time_after(jiffies, end)) { + mlx5_core_err(dev, + "health recovery flow aborted, PCI reads still not working\n"); + return; + } + msleep(100); + } + + mlx5_core_err(dev, "starting health recovery flow\n"); + mlx5_recover_device(dev); } static const char *hsynd_str(u8 synd) @@ -274,6 +269,7 @@ static void poll_health(struct timer_list *t) { struct mlx5_core_dev *dev = from_timer(dev, t, priv.health.timer); struct mlx5_core_health *health = &dev->priv.health; + u32 fatal_error; u32 count; if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) @@ -291,8 +287,11 @@ static void poll_health(struct timer_list *t) print_health_info(dev); } - if (in_fatal(dev) && !health->sick) { - health->sick = true; + fatal_error = check_fatal_sensors(dev); + + if (fatal_error && !health->fatal_error) { + mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error); + dev->priv.health.fatal_error = fatal_error; print_health_info(dev); mlx5_trigger_health_work(dev); } @@ -306,9 +305,8 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev) struct mlx5_core_health *health = &dev->priv.health; timer_setup(&health->timer, poll_health, 0); - health->sick = 0; + health->fatal_error = MLX5_SENSOR_NO_ERR; clear_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); - clear_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags); health->health = &dev->iseg->health; health->health_counter = &dev->iseg->health_counter; @@ -324,7 +322,6 @@ void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health) if (disable_health) { spin_lock_irqsave(&health->wq_lock, flags); set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); - set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags); spin_unlock_irqrestore(&health->wq_lock, flags); } @@ -338,23 +335,10 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev) spin_lock_irqsave(&health->wq_lock, flags); set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); - set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags); spin_unlock_irqrestore(&health->wq_lock, flags); - cancel_delayed_work_sync(&health->recover_work); cancel_work_sync(&health->work); } -void mlx5_drain_health_recovery(struct mlx5_core_dev *dev) -{ - struct mlx5_core_health *health = &dev->priv.health; - unsigned long flags; - - spin_lock_irqsave(&health->wq_lock, flags); - set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags); - spin_unlock_irqrestore(&health->wq_lock, flags); - cancel_delayed_work_sync(&dev->priv.health.recover_work); -} - void mlx5_health_flush(struct mlx5_core_dev *dev) { struct mlx5_core_health *health = &dev->priv.health; @@ -387,7 +371,6 @@ int mlx5_health_init(struct mlx5_core_dev *dev) return -ENOMEM; spin_lock_init(&health->wq_lock); INIT_WORK(&health->work, health_care); - INIT_DELAYED_WORK(&health->recover_work, health_recover); health->crdump = NULL; return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 43f5487de4c3..c94eaa49d1f6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1185,7 +1185,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup) int err = 0; if (cleanup) - mlx5_drain_health_recovery(dev); + mlx5_drain_health_wq(dev); mutex_lock(&dev->intf_state_mutex); if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 22e69d4813e4..d31b77ad533d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -213,7 +213,7 @@ enum { MLX5_NIC_IFC_FULL = 0, MLX5_NIC_IFC_DISABLED = 1, MLX5_NIC_IFC_NO_DRAM_NIC = 2, - MLX5_NIC_IFC_INVALID = 3 + MLX5_NIC_IFC_SW_RESET = 7 }; u8 mlx5_get_nic_state(struct mlx5_core_dev *dev); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index ddf6f41a75d3..086faa4d22bf 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -436,7 +436,7 @@ struct mlx5_core_health { struct timer_list timer; u32 prev; int miss_counter; - bool sick; + u32 fatal_error; /* wq spinlock to synchronize draining */ spinlock_t wq_lock; struct workqueue_struct *wq; @@ -907,7 +907,6 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev); void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health); void mlx5_drain_health_wq(struct mlx5_core_dev *dev); void mlx5_trigger_health_work(struct mlx5_core_dev *dev); -void mlx5_drain_health_recovery(struct mlx5_core_dev *dev); int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size, struct mlx5_frag_buf *buf, int node); int mlx5_buf_alloc(struct mlx5_core_dev *dev, From patchwork Sun May 5 00:33:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095328 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="R5HHkFV1"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRgK0WVRz9s4V for ; Sun, 5 May 2019 10:33:33 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727395AbfEEAd3 (ORCPT ); Sat, 4 May 2019 20:33:29 -0400 Received: from mail-eopbgr70082.outbound.protection.outlook.com ([40.107.7.82]:14048 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727295AbfEEAd2 (ORCPT ); Sat, 4 May 2019 20:33:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pvU/fKsfZVIM9Wp/fnRfbvpqXsYS1Vx9QanOfiUWI4Q=; b=R5HHkFV1EMS5LJRXe+LfKYk6Rf5VfbRGbjsbjw60Ylk5vAwSs86IOOstQFBagP4H6mwciwFd++dsdExpT/QyomFv6p5Em8MfQR3yYFBMHt7VwjKNHyRgurXUyq8NeIthOTHnX/As5GMi6YyF1m8MR3+/+AyZiJEfsj0Ia8g9ZW8= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:06 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:06 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Feras Daoud , Saeed Mahameed , Alex Vesker Subject: [net-next 06/15] net/mlx5: Control CR-space access by different PFs Thread-Topic: [net-next 06/15] net/mlx5: Control CR-space access by different PFs Thread-Index: AQHVAtobMVY9oVvvp0+pLXGREAGN7w== Date: Sun, 5 May 2019 00:33:06 +0000 Message-ID: <20190505003207.1353-7-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b63bde6a-f5e3-44cd-5758-08d6d0f13dbc x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:608; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006)(309714004); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: CfNHOQQY22nXuXn621kI7tL+AgENVS6ChdFuHORiYWBHhir3RPcYPoAzsEsa/l4clcbv/Sh1z7AB5C33f+nipuzMqxQrrHUMHbVki9mPqWcQCBGtMtROvPP+pc3zdr6bRQJONRYogOspoMQgPIXJDOfohffwogpPvycsWkY9r6QuHW06D/tfwBmzlmFjCi5pMPIAxpjVc849r0MGFsJbdw2vENm6qEryfaAUhzo4mVjv9/AtvG7/89eAwlRlNObJf41M9oki2IZCjw5wl7KHz8OwfMurMpPVvzpt7D9AE9cribx17kPTsoabyq3psRuRBh9IYNnnApszV/To5aibHEO5WOq9bsS9ohAotTlZIkMG75VnMb2VsEy+Uq0QAIASLpUoJZpBdFjRMAhEZFpCiJaUd1gH6H9ua1Bbc7ye0Nc= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: b63bde6a-f5e3-44cd-5758-08d6d0f13dbc X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:06.3734 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Feras Daoud Since the FW can be shared between different PFs/VFs it is common that more than one health poll will detected a failure, this can lead to multiple resets which are unneeded. The solution is to use a FW locking mechanism using semaphore space to provide a way to allow only one device to collect the cr-dump and to issue a sw-reset. Signed-off-by: Feras Daoud Signed-off-by: Saeed Mahameed Reviewed-by: Alex Vesker Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 40 ++++++++++++++++--- .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 8 ++++ .../ethernet/mellanox/mlx5/core/mlx5_core.h | 4 ++ 3 files changed, 47 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c index f42890bdd6b1..b6b8fb13f621 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -24,11 +24,6 @@ pci_write_config_dword((dev)->pdev, (dev)->vsc_addr + (offset), (val)) #define VSC_MAX_RETRIES 2048 -enum mlx5_vsc_state { - MLX5_VSC_UNLOCK, - MLX5_VSC_LOCK, -}; - enum { VSC_CTRL_OFFSET = 0x4, VSC_COUNTER_OFFSET = 0x8, @@ -281,3 +276,38 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, } return length; } + +int mlx5_vsc_sem_set_space(struct mlx5_core_dev *dev, u16 space, + enum mlx5_vsc_state state) +{ + u32 data, id = 0; + int ret; + + ret = mlx5_vsc_gw_set_space(dev, MLX5_SEMAPHORE_SPACE_DOMAIN, NULL); + if (ret) { + mlx5_core_warn(dev, "Failed to set gw space %d\n", ret); + return ret; + } + + if (state == MLX5_VSC_LOCK) { + /* Get a unique ID based on the counter */ + ret = vsc_read(dev, VSC_COUNTER_OFFSET, &id); + if (ret) + return ret; + } + + /* Try to modify lock */ + ret = mlx5_vsc_gw_write(dev, space, id); + if (ret) + return ret; + + /* Verify lock was modified */ + ret = mlx5_vsc_gw_read(dev, space, &data); + if (ret) + return -EINVAL; + + if (data != id) + return -EBUSY; + + return 0; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h index c6ebf59006c5..4264b65f7437 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h @@ -4,6 +4,11 @@ #ifndef __MLX5_PCI_VSC_H__ #define __MLX5_PCI_VSC_H__ +enum mlx5_vsc_state { + MLX5_VSC_UNLOCK, + MLX5_VSC_LOCK, +}; + enum { MLX5_VSC_SPACE_SCAN_CRSPACE = 0x7, }; @@ -22,4 +27,7 @@ static inline bool mlx5_vsc_accessible(struct mlx5_core_dev *dev) return !!dev->vsc_addr; } +int mlx5_vsc_sem_set_space(struct mlx5_core_dev *dev, u16 space, + enum mlx5_vsc_state state); + #endif /* __MLX5_PCI_VSC_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index d31b77ad533d..439cf23945a4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -111,6 +111,10 @@ enum { MLX5_DRIVER_SYND = 0xbadd00de, }; +enum mlx5_semaphore_space_address { + MLX5_SEMAPHORE_SPACE_DOMAIN = 0xA, +}; + int mlx5_query_hca_caps(struct mlx5_core_dev *dev); int mlx5_query_board_id(struct mlx5_core_dev *dev); int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id); From patchwork Sun May 5 00:33:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095337 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="lfDxqMt+"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRm23yzLz9s4V for ; Sun, 5 May 2019 10:37:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727127AbfEEAhg (ORCPT ); Sat, 4 May 2019 20:37:36 -0400 Received: from mail-eopbgr70082.outbound.protection.outlook.com ([40.107.7.82]:14048 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727331AbfEEAhg (ORCPT ); Sat, 4 May 2019 20:37:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c9/nuIV/XxXtQbARYNPTz41Hdp+/X1K1RjUcwVX7PwU=; b=lfDxqMt+txy7OBf7+VT55HvAMAkEYMjHVkW0GHUFGwJjiXLYU5KthQS7J1p0NJcb0GlnbzyMqmtKnWi/Ll+2U1iO0/Sf5+07cpJDfgJjkhhO09TFhpoZoz54T16XQ+EGNbWasxZzZ/PNovUvxECN6tZ2XTdGyKTZ9GRpOzRTVCE= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:19 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:19 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Feras Daoud , Alex Vesker , Moshe Shemesh , Daniel Jurgens , Saeed Mahameed Subject: [net-next 07/15] net/mlx5: Issue SW reset on FW assert Thread-Topic: [net-next 07/15] net/mlx5: Issue SW reset on FW assert Thread-Index: AQHVAtoiMYHIQJMNSE2ncyR9JF0TDA== Date: Sun, 5 May 2019 00:33:18 +0000 Message-ID: <20190505003207.1353-8-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: f7a49af2-5a25-46d4-28ce-08d6d0f13f25 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3968; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(30864003)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: FQ7QHrEemowlb0THNptml81y82iXaxlcke8G4B/Wloy7zGDP7Ojl2Ken9ARXD6D6DT5z0yINu4yofmgR0SBZsJ/6fqBDKTwHu8poMjtjFk4S/d2gPqzDSeLL8TvluPgVt4MAcoW/5iwbnLDEOaLZqznaREukcTAv/wVyRflmrRU2OLEA5mXWgr9V1bbWIduBsvOJZMBrmrBl4X0m1sbeYWOJQqvXybiblgD9GNvg5zStxrfwjk70biGQPvK0WTuwzl2CZssky6I5lYRxMQREHaYjwYTVb/LJ4rctp+cmlTCTHYMwXqXWjb1eVd1nQQDoL7wyBuhjYlhptOpT8/387uTpsovdffc8p+q19ur1I2swxnR9HA+F5cNKrajxhYWaGob6EwPooVJHpl9854Jypdt1EOx07ge7jOmunqWXZOU= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: f7a49af2-5a25-46d4-28ce-08d6d0f13f25 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:18.9182 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Feras Daoud If a FW assert is considered fatal, indicated by a new bit in the health buffer, reset the FW. After the reset go through the normal recovery flow. Only one PF needs to issue the reset, so an attempt is made to prevent the 2nd function from also issuing the reset. It's not an error if that happens, it just slows recovery. Signed-off-by: Feras Daoud Signed-off-by: Alex Vesker Signed-off-by: Moshe Shemesh Signed-off-by: Daniel Jurgens Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/diag/crdump.c | 13 +- .../net/ethernet/mellanox/mlx5/core/health.c | 157 +++++++++++++++++- .../net/ethernet/mellanox/mlx5/core/main.c | 1 + .../ethernet/mellanox/mlx5/core/mlx5_core.h | 2 + include/linux/mlx5/device.h | 10 +- include/linux/mlx5/driver.h | 1 + 6 files changed, 176 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c index 7337a49f2733..8cd4dd1d11d2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c @@ -92,14 +92,23 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev, ret); return ret; } + /* Verify no other PF is running cr-dump or sw reset */ + ret = mlx5_vsc_sem_set_space(dev, MLX5_SEMAPHORE_SW_RESET, + MLX5_VSC_LOCK); + if (ret) { + mlx5_core_warn(dev, "Failed to lock SW reset semaphore\n"); + goto unlock_gw; + } ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE, NULL); if (ret) - goto unlock; + goto unlock_sem; ret = mlx5_crdump_fill(dev, crdump_region, snapshot_id); -unlock: +unlock_sem: + mlx5_vsc_sem_set_space(dev, MLX5_SEMAPHORE_SW_RESET, MLX5_VSC_UNLOCK); +unlock_gw: mlx5_vsc_gw_unlock(dev); return ret; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index adb40fe0f6ec..19d9297682d7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -40,6 +40,7 @@ #include "mlx5_core.h" #include "lib/eq.h" #include "lib/mlx5.h" +#include "lib/pci_vsc.h" enum { MLX5_HEALTH_POLL_INTERVAL = 2 * HZ, @@ -67,8 +68,10 @@ enum { enum { MLX5_SENSOR_NO_ERR = 0, MLX5_SENSOR_PCI_COMM_ERR = 1, - MLX5_SENSOR_NIC_DISABLED = 2, - MLX5_SENSOR_NIC_SW_RESET = 3, + MLX5_SENSOR_PCI_ERR = 2, + MLX5_SENSOR_NIC_DISABLED = 3, + MLX5_SENSOR_NIC_SW_RESET = 4, + MLX5_SENSOR_FW_SYND_RFR = 5, }; u8 mlx5_get_nic_state(struct mlx5_core_dev *dev) @@ -95,32 +98,162 @@ static bool sensor_pci_not_working(struct mlx5_core_dev *dev) return (ioread32be(&h->fw_ver) == 0xffffffff); } +static bool sensor_fw_synd_rfr(struct mlx5_core_dev *dev) +{ + struct mlx5_core_health *health = &dev->priv.health; + struct health_buffer __iomem *h = health->health; + u32 rfr = ioread32be(&h->rfr) >> MLX5_RFR_OFFSET; + u8 synd = ioread8(&h->synd); + + if (rfr && synd) + mlx5_core_dbg(dev, "FW requests reset, synd: %d\n", synd); + return rfr && synd; +} + static u32 check_fatal_sensors(struct mlx5_core_dev *dev) { if (sensor_pci_not_working(dev)) return MLX5_SENSOR_PCI_COMM_ERR; + if (pci_channel_offline(dev->pdev)) + return MLX5_SENSOR_PCI_ERR; if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED) return MLX5_SENSOR_NIC_DISABLED; if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_SW_RESET) return MLX5_SENSOR_NIC_SW_RESET; + if (sensor_fw_synd_rfr(dev)) + return MLX5_SENSOR_FW_SYND_RFR; return MLX5_SENSOR_NO_ERR; } +static int lock_sem_sw_reset(struct mlx5_core_dev *dev, bool lock) +{ + enum mlx5_vsc_state state; + int ret; + + if (!mlx5_core_is_pf(dev)) + return -EBUSY; + + /* Try to lock GW access, this stage doesn't return + * EBUSY because locked GW does not mean that other PF + * already started the reset. + */ + ret = mlx5_vsc_gw_lock(dev); + if (ret == -EBUSY) + return -EINVAL; + if (ret) + return ret; + + state = lock ? MLX5_VSC_LOCK : MLX5_VSC_UNLOCK; + /* At this stage, if the return status == EBUSY, then we know + * for sure that another PF started the reset, so don't allow + * another reset. + */ + ret = mlx5_vsc_sem_set_space(dev, MLX5_SEMAPHORE_SW_RESET, state); + if (ret) + mlx5_core_warn(dev, "Failed to lock SW reset semaphore\n"); + + /* Unlock GW access */ + mlx5_vsc_gw_unlock(dev); + + return ret; +} + +static bool reset_fw_if_needed(struct mlx5_core_dev *dev) +{ + bool supported = (ioread32be(&dev->iseg->initializing) >> + MLX5_FW_RESET_SUPPORTED_OFFSET) & 1; + u32 fatal_error; + + if (!supported) + return false; + + /* The reset only needs to be issued by one PF. The health buffer is + * shared between all functions, and will be cleared during a reset. + * Check again to avoid a redundant 2nd reset. If the fatal erros was + * PCI related a reset won't help. + */ + fatal_error = check_fatal_sensors(dev); + if (fatal_error == MLX5_SENSOR_PCI_COMM_ERR || + fatal_error == MLX5_SENSOR_NIC_DISABLED || + fatal_error == MLX5_SENSOR_NIC_SW_RESET) { + mlx5_core_warn(dev, "Not issuing FW reset. Either it's already done or won't help."); + return false; + } + + mlx5_core_warn(dev, "Issuing FW Reset\n"); + /* Write the NIC interface field to initiate the reset, the command + * interface address also resides here, don't overwrite it. + */ + mlx5_set_nic_state(dev, MLX5_NIC_IFC_SW_RESET); + + return true; +} + void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force) { mutex_lock(&dev->intf_state_mutex); if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) goto unlock; + if (dev->state == MLX5_DEVICE_STATE_UNINITIALIZED) { + dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR; + goto unlock; + } - mlx5_core_err(dev, "start\n"); - if (pci_channel_offline(dev->pdev) || - dev->priv.health.fatal_error != MLX5_SENSOR_NO_ERR || force) { + if (check_fatal_sensors(dev) || force) { dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR; mlx5_cmd_flush(dev); } mlx5_notifier_call_chain(dev->priv.events, MLX5_DEV_EVENT_SYS_ERROR, (void *)1); +unlock: + mutex_unlock(&dev->intf_state_mutex); +} + +#define MLX5_CRDUMP_WAIT_MS 60000 +#define MLX5_FW_RESET_WAIT_MS 1000 +void mlx5_error_sw_reset(struct mlx5_core_dev *dev) +{ + unsigned long end, delay_ms = MLX5_FW_RESET_WAIT_MS; + int lock = -EBUSY; + + mutex_lock(&dev->intf_state_mutex); + if (dev->state != MLX5_DEVICE_STATE_INTERNAL_ERROR) + goto unlock; + + mlx5_core_err(dev, "start\n"); + + if (check_fatal_sensors(dev) == MLX5_SENSOR_FW_SYND_RFR) { + /* Get cr-dump and reset FW semaphore */ + lock = lock_sem_sw_reset(dev, true); + + if (lock == -EBUSY) { + delay_ms = MLX5_CRDUMP_WAIT_MS; + goto recover_from_sw_reset; + } + /* Execute SW reset */ + reset_fw_if_needed(dev); + } + +recover_from_sw_reset: + /* Recover from SW reset */ + end = jiffies + msecs_to_jiffies(delay_ms); + do { + if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED) + break; + + cond_resched(); + } while (!time_after(jiffies, end)); + + if (mlx5_get_nic_state(dev) != MLX5_NIC_IFC_DISABLED) { + dev_err(&dev->pdev->dev, "NIC IFC still %d after %lums.\n", + mlx5_get_nic_state(dev), delay_ms); + } + + /* Release FW semaphore if you are the lock owner */ + if (!lock) + lock_sem_sw_reset(dev, false); + mlx5_core_err(dev, "end\n"); unlock: @@ -143,6 +276,20 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev) case MLX5_NIC_IFC_NO_DRAM_NIC: mlx5_core_warn(dev, "Expected to see disabled NIC but it is no dram nic\n"); break; + + case MLX5_NIC_IFC_SW_RESET: + /* The IFC mode field is 3 bits, so it will read 0x7 in 2 cases: + * 1. PCI has been disabled (ie. PCI-AER, PF driver unloaded + * and this is a VF), this is not recoverable by SW reset. + * Logging of this is handled elsewhere. + * 2. FW reset has been issued by another function, driver can + * be reloaded to recover after the mode switches to + * MLX5_NIC_IFC_DISABLED. + */ + if (dev->priv.health.fatal_error != MLX5_SENSOR_PCI_COMM_ERR) + mlx5_core_warn(dev, "NIC SW reset in progress\n"); + break; + default: mlx5_core_warn(dev, "Expected to see disabled NIC but it is has invalid value %d\n", nic_interface); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index c94eaa49d1f6..c22ff9a58ec5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1368,6 +1368,7 @@ static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev, mlx5_core_info(dev, "%s was called\n", __func__); mlx5_enter_error_state(dev, false); + mlx5_error_sw_reset(dev); mlx5_unload_one(dev, false); /* In case of kernel call drain the health wq */ if (state) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 439cf23945a4..9726af137be3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -113,6 +113,7 @@ enum { enum mlx5_semaphore_space_address { MLX5_SEMAPHORE_SPACE_DOMAIN = 0xA, + MLX5_SEMAPHORE_SW_RESET = 0x20, }; int mlx5_query_hca_caps(struct mlx5_core_dev *dev); @@ -122,6 +123,7 @@ int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev); int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev); int mlx5_cmd_fast_teardown_hca(struct mlx5_core_dev *dev); void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force); +void mlx5_error_sw_reset(struct mlx5_core_dev *dev); void mlx5_disable_device(struct mlx5_core_dev *dev); void mlx5_recover_device(struct mlx5_core_dev *dev); int mlx5_sriov_init(struct mlx5_core_dev *dev); diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index fc2b6e807f06..2cfa2ec8b5d3 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -510,6 +510,10 @@ struct mlx5_cmd_layout { u8 status_own; }; +enum mlx5_fatal_assert_bit_offsets { + MLX5_RFR_OFFSET = 31, +}; + struct health_buffer { __be32 assert_var[5]; __be32 rsvd0[3]; @@ -518,12 +522,16 @@ struct health_buffer { __be32 rsvd1[2]; __be32 fw_ver; __be32 hw_id; - __be32 rsvd2; + __be32 rfr; u8 irisc_index; u8 synd; __be16 ext_synd; }; +enum mlx5_initializing_bit_offsets { + MLX5_FW_RESET_SUPPORTED_OFFSET = 30, +}; + enum mlx5_cmd_addr_l_sz_offset { MLX5_NIC_IFC_OFFSET = 8, }; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 086faa4d22bf..33c977db6ceb 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -584,6 +584,7 @@ struct mlx5_priv { }; enum mlx5_device_state { + MLX5_DEVICE_STATE_UNINITIALIZED, MLX5_DEVICE_STATE_UP, MLX5_DEVICE_STATE_INTERNAL_ERROR, }; From patchwork Sun May 5 00:33:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095329 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="MknapPj+"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRgg4pGpz9s4V for ; Sun, 5 May 2019 10:33:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727415AbfEEAdt (ORCPT ); Sat, 4 May 2019 20:33:49 -0400 Received: from mail-eopbgr70077.outbound.protection.outlook.com ([40.107.7.77]:8128 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727295AbfEEAds (ORCPT ); Sat, 4 May 2019 20:33:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/qLh2QHmOBc/zTqTYCAUwJTFNosvS0xcZ0qMqizcRWE=; b=MknapPj+KBJWLlDDasouUvji5XsVbX+8XmDEDDmzZ3tSlXg9XG55brjqHB4XkrUkrx8Z4gdmyBmCf+TNEMRK5/aPE58dZMAqpeD8ybcvW98j4bE0tOF0kr+hLVoT5Cwevqrz6g/fhzCa+1P4LoEAqSb+x5RFiXmN51ULRbni2C8= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:21 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:21 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Eran Ben Elisha , Saeed Mahameed Subject: [net-next 08/15] net/mlx5: Refactor print health info Thread-Topic: [net-next 08/15] net/mlx5: Refactor print health info Thread-Index: AQHVAtok0mhE3VJAqUq1bUc6j94qfQ== Date: Sun, 5 May 2019 00:33:21 +0000 Message-ID: <20190505003207.1353-9-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b64ab17a-5787-49b0-2241-08d6d0f14688 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:247; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: PnT48P9Mmhfra+PE+ZboQEKB6RQ8iFzOW5Vjy9oEsnbYItLiKUA3xrdVXbR7byQjKgYzlk5VFKslIdZHnVkASeXanaebpaqAyINttZEc+jCvpzzozKztRo7GeNCRMUBd4/6tyNXbFlkA8X3bzJWG9HaMreDwsFHlCuooydnhvLKYxITD41MX30UCLsTBYzQ0ejkd0U0dVs04s5Wo4IyHY3JDPJYSqSt6DVJyseYyn3b6Pr0IBEnfBehw5pPhuXWhF6E54gmyi4Sa3y5Rk+VRTvBvYMa/3/LfYFeBAI28SKgOumiY+8/7VoINPZ5y2xw1Wj1zva5KQK4N/t9j2P40SHx1ZB8FzdkSCXvrQKo1ot2ycNs74in+q0asJW56ECbZxdy6w+YE/mTjRXCGCASatVWxIuOiPEayfXmOU2qWuXY= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: b64ab17a-5787-49b0-2241-08d6d0f14688 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:21.2539 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Refactor print health info code, split to two functions: 1. mlx5_get_health_info() - writes the health info into a buffer. 2. mlx5_print_health_info() - prints the health info to kernel log. This refactoring is done to enable using the health info data by devlink health reporter diagnose() in the downstream patch. Signed-off-by: Moshe Shemesh Signed-off-by: Eran Ben Elisha Reviewed-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/health.c | 83 +++++++++++++++---- include/linux/mlx5/driver.h | 4 + 2 files changed, 70 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 19d9297682d7..a3c7e46aafd9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -357,7 +357,28 @@ static const char *hsynd_str(u8 synd) } } -static void print_health_info(struct mlx5_core_dev *dev) +#define HEALTH_INFO_MAX_BUFF 1024 +static void mlx5_health_info_buf_reset(struct mlx5_core_dev *dev) +{ + dev->priv.health.info_buf_len = 0; +} + +static void +mlx5_health_info_buf_write(struct mlx5_core_dev *dev, const char *fmt, ...) +{ + struct mlx5_core_health *health = &dev->priv.health; + va_list args; + int len; + + va_start(args, fmt); + len = vsnprintf(health->info_buf + health->info_buf_len, + HEALTH_INFO_MAX_BUFF - health->info_buf_len, fmt, args); + va_end(args); + health->info_buf_len = min_t(int, health->info_buf_len + len, + HEALTH_INFO_MAX_BUFF); +} + +static void mlx5_get_health_info(struct mlx5_core_dev *dev, u8 *synd) { struct mlx5_core_health *health = &dev->priv.health; struct health_buffer __iomem *h = health->health; @@ -365,27 +386,46 @@ static void print_health_info(struct mlx5_core_dev *dev) u32 fw; int i; + *synd = ioread8(&h->synd); /* If the syndrome is 0, the device is OK and no need to print buffer */ - if (!ioread8(&h->synd)) + if (!synd) return; + mlx5_health_info_buf_reset(dev); + mlx5_health_info_buf_write(dev, "\n"); for (i = 0; i < ARRAY_SIZE(h->assert_var); i++) - mlx5_core_err(dev, "assert_var[%d] 0x%08x\n", i, - ioread32be(h->assert_var + i)); + mlx5_health_info_buf_write(dev, "assert_var[%d] 0x%08x\n", i, + ioread32be(h->assert_var + i)); - mlx5_core_err(dev, "assert_exit_ptr 0x%08x\n", - ioread32be(&h->assert_exit_ptr)); - mlx5_core_err(dev, "assert_callra 0x%08x\n", - ioread32be(&h->assert_callra)); + mlx5_health_info_buf_write(dev, "assert_exit_ptr 0x%08x\n", + ioread32be(&h->assert_exit_ptr)); + mlx5_health_info_buf_write(dev, "assert_callra 0x%08x\n", + ioread32be(&h->assert_callra)); sprintf(fw_str, "%d.%d.%d", fw_rev_maj(dev), fw_rev_min(dev), fw_rev_sub(dev)); - mlx5_core_err(dev, "fw_ver %s\n", fw_str); - mlx5_core_err(dev, "hw_id 0x%08x\n", ioread32be(&h->hw_id)); - mlx5_core_err(dev, "irisc_index %d\n", ioread8(&h->irisc_index)); - mlx5_core_err(dev, "synd 0x%x: %s\n", ioread8(&h->synd), - hsynd_str(ioread8(&h->synd))); - mlx5_core_err(dev, "ext_synd 0x%04x\n", ioread16be(&h->ext_synd)); + mlx5_health_info_buf_write(dev, "fw_ver %s\n", fw_str); + mlx5_health_info_buf_write(dev, "hw_id 0x%08x\n", ioread32be(&h->hw_id)); + mlx5_health_info_buf_write(dev, "irisc_index %d\n", ioread8(&h->irisc_index)); + mlx5_health_info_buf_write(dev, "synd 0x%x: %s\n", ioread8(&h->synd), + hsynd_str(ioread8(&h->synd))); + mlx5_health_info_buf_write(dev, "ext_synd 0x%04x\n", ioread16be(&h->ext_synd)); fw = ioread32be(&h->fw_ver); - mlx5_core_err(dev, "raw fw_ver 0x%08x\n", fw); + mlx5_health_info_buf_write(dev, "raw fw_ver 0x%08x\n", fw); +} + +static void mlx5_print_health_info(struct mlx5_core_dev *dev) +{ + struct mlx5_core_health *health = &dev->priv.health; + u8 synd; + + mutex_lock(&health->info_buf_lock); + mlx5_get_health_info(dev, &synd); + + if (!synd) + goto unlock; + + mlx5_core_err(dev, "%s", health->info_buf); +unlock: + mutex_unlock(&health->info_buf_lock); } static unsigned long get_next_poll_jiffies(void) @@ -431,7 +471,7 @@ static void poll_health(struct timer_list *t) health->prev = count; if (health->miss_counter == MAX_MISSES) { mlx5_core_err(dev, "device's health compromised - reached miss count\n"); - print_health_info(dev); + mlx5_print_health_info(dev); } fatal_error = check_fatal_sensors(dev); @@ -439,7 +479,7 @@ static void poll_health(struct timer_list *t) if (fatal_error && !health->fatal_error) { mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error); dev->priv.health.fatal_error = fatal_error; - print_health_info(dev); + mlx5_print_health_info(dev); mlx5_trigger_health_work(dev); } @@ -497,6 +537,7 @@ void mlx5_health_cleanup(struct mlx5_core_dev *dev) { struct mlx5_core_health *health = &dev->priv.health; + kfree(health->info_buf); destroy_workqueue(health->wq); } @@ -519,6 +560,14 @@ int mlx5_health_init(struct mlx5_core_dev *dev) spin_lock_init(&health->wq_lock); INIT_WORK(&health->work, health_care); health->crdump = NULL; + health->info_buf = kmalloc(HEALTH_INFO_MAX_BUFF, GFP_KERNEL); + if (!health->info_buf) + goto err_alloc_buff; + mutex_init(&health->info_buf_lock); return 0; + +err_alloc_buff: + destroy_workqueue(health->wq); + return -ENOMEM; } diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 33c977db6ceb..df8f4c4e21c6 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -444,6 +444,10 @@ struct mlx5_core_health { struct work_struct work; struct delayed_work recover_work; struct mlx5_fw_crdump *crdump; + char *info_buf; + int info_buf_len; + /* protect info buf access */ + struct mutex info_buf_lock; }; struct mlx5_qp_table { From patchwork Sun May 5 00:33:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095332 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="aiEmr0Ga"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhT2Q67z9s6w for ; Sun, 5 May 2019 10:34:33 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727459AbfEEAeb (ORCPT ); Sat, 4 May 2019 20:34:31 -0400 Received: from mail-eopbgr70077.outbound.protection.outlook.com ([40.107.7.77]:8128 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727404AbfEEAea (ORCPT ); Sat, 4 May 2019 20:34:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=U/OlQYWUJtd/cztl889oc8qsYvaJBo3i0bSZa1DKiWw=; b=aiEmr0Ga/GpadWEVpsmhHv5iQB+zzMt8yvjQ93iu3eWmcBF/hlONZFJschAapX5fuD7ow1Po7IVEBNRhJxeJq1RCSrwuV3Vlm3HVOPBaGnyif3LWXDkP4OyY61IPR4HoTzxI5NuJypujbbzFocY8azdPlaWrgbJTB010GnZVZ5o= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:23 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:23 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Eran Ben Elisha , Saeed Mahameed Subject: [net-next 09/15] net/mlx5: Create FW devlink health reporter Thread-Topic: [net-next 09/15] net/mlx5: Create FW devlink health reporter Thread-Index: AQHVAtolLdJJy+/cik2B95xXeBcmGA== Date: Sun, 5 May 2019 00:33:23 +0000 Message-ID: <20190505003207.1353-10-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b367b468-ebd6-40a4-c41c-08d6d0f147e0 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:2887; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: Y4hFuP5LANiZkQqaKicdSvpQWK4u/EafxOB0S6t+C5RcfwYUB/nwy+mEcNI7w8MhyK0RoktM3K2wx6C2Sir43GPwO1dXucHekXvTo/9Njwf6tAMx9l2vZ3pCViDxEcCUl1yX3tSdRo7UGxDUZP82olA97CyndrZjYaao57l9iB6IbVSW4xdjTTaW8vXHUVbcxnCn9NBEbJR5LG8dyerVC9fE7oT9Wj9929e68iA+gEOrqCzrO6qNwALvQiX6Vu5YLyT0NcmC0uKW7TvTaoDDZBbCAqbJN2OfXphDQSGqTjIfTWKbzITKHnVCnLTIfja3oOS0e4wzetL0FCXdkL2QIF38cM8qpAbm8jtOh8ndQbab0TSkMCXW9kfFirJGR0karmVeLcILH1SkBnrrFRc6gAVlPcdDIy6Zl3tpkl7RSig= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: b367b468-ebd6-40a4-c41c-08d6d0f147e0 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:23.4314 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Create mlx5_devlink_health_reporter for FW reporter. The FW reporter implements devlink_health_reporter diagnose callback. The fw reporter diagnose command can be triggered any time by the user to check current fw status. In healthy status, it will return clear syndrome. Otherwise it will dump the health info buffer. Command example and output on healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 0 Command example and output on non healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw diagnose data: assert_var[0] 0xfc3fc043 assert_var[1] 0x0001b41c assert_var[2] 0x00000000 assert_var[3] 0x00000000 assert_var[4] 0x00000000 assert_exit_ptr 0x008033b4 assert_callra 0x0080365c fw_ver 16.24.1000 hw_id 0x0000020d irisc_index 0 synd 0x8: unrecoverable hardware error ext_synd 0x003d raw fw_ver 0x101803e8 Signed-off-by: Moshe Shemesh Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/health.c | 55 +++++++++++++++++++ include/linux/mlx5/driver.h | 1 + 2 files changed, 56 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index a3c7e46aafd9..9ffa9c7f81a0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -428,6 +428,58 @@ static void mlx5_print_health_info(struct mlx5_core_dev *dev) mutex_unlock(&health->info_buf_lock); } +static int +mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg) +{ + struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter); + struct mlx5_core_health *health = &dev->priv.health; + u8 synd; + int err; + + mutex_lock(&health->info_buf_lock); + mlx5_get_health_info(dev, &synd); + + if (!synd) { + mutex_unlock(&health->info_buf_lock); + return devlink_fmsg_u8_pair_put(fmsg, "Syndrome", synd); + } + + err = devlink_fmsg_string_pair_put(fmsg, "diagnose data", + health->info_buf); + + mutex_unlock(&health->info_buf_lock); + return err; +} + +static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = { + .name = "fw", + .diagnose = mlx5_fw_reporter_diagnose, +}; + +static void mlx5_fw_reporter_create(struct mlx5_core_dev *dev) +{ + struct mlx5_core_health *health = &dev->priv.health; + struct devlink *devlink = priv_to_devlink(dev); + + health->fw_reporter = + devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops, + 0, false, dev); + if (IS_ERR(health->fw_reporter)) + mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n", + PTR_ERR(health->fw_reporter)); +} + +static void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev) +{ + struct mlx5_core_health *health = &dev->priv.health; + + if (IS_ERR_OR_NULL(health->fw_reporter)) + return; + + devlink_health_reporter_destroy(health->fw_reporter); +} + static unsigned long get_next_poll_jiffies(void) { unsigned long next; @@ -539,6 +591,7 @@ void mlx5_health_cleanup(struct mlx5_core_dev *dev) kfree(health->info_buf); destroy_workqueue(health->wq); + mlx5_fw_reporter_destroy(dev); } int mlx5_health_init(struct mlx5_core_dev *dev) @@ -565,6 +618,8 @@ int mlx5_health_init(struct mlx5_core_dev *dev) goto err_alloc_buff; mutex_init(&health->info_buf_lock); + mlx5_fw_reporter_create(dev); + return 0; err_alloc_buff: diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index df8f4c4e21c6..a362aa6c799c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -448,6 +448,7 @@ struct mlx5_core_health { int info_buf_len; /* protect info buf access */ struct mutex info_buf_lock; + struct devlink_health_reporter *fw_reporter; }; struct mlx5_qp_table { From patchwork Sun May 5 00:33:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095330 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="ZdVl97ov"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhP27F1z9s4V for ; Sun, 5 May 2019 10:34:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727430AbfEEAe1 (ORCPT ); Sat, 4 May 2019 20:34:27 -0400 Received: from mail-eopbgr70052.outbound.protection.outlook.com ([40.107.7.52]:1287 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727295AbfEEAe0 (ORCPT ); Sat, 4 May 2019 20:34:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Chh84IYY/8fOijQ+Ep9s9oHnKjNFYK2VcRLohNAQgXw=; b=ZdVl97ov9zFFAuc76JO/Ob0u147YgmBLkyR01PTDnEswY8fuud3zw8WzTzMPkeFog2Pp6XO6Gwek549pDHbGmvstLw2bZOs6oci2Oi9rQFsidvt+Q+hvyQDCTEEerp990EOGVgj0VHbd+66inMYCq5XRMtvTCxcaqCgZV0FFFMQ= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:25 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:25 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Eran Ben Elisha , Saeed Mahameed Subject: [net-next 10/15] net/mlx5: Add core dump register access functions Thread-Topic: [net-next 10/15] net/mlx5: Add core dump register access functions Thread-Index: AQHVAtomBM9ZBzctLU+LfeAHPJZ+RA== Date: Sun, 5 May 2019 00:33:25 +0000 Message-ID: <20190505003207.1353-11-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ad482b16-9f22-429e-ce6d-08d6d0f14917 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3383; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: o4hZ9vWkGctN/PGcSngYiDCpLk0e4WwGcCRTbhYK3cGbhfoK8hXrplg5a8JdpdUD8kCQFP6zKYh9JG9o6wTdO3abdcMPesQWEIOmiyPWDeO9rNUDQDR49Putlm5n6xK04TbT/zmxz16tbqIXw6fsoKnjCseFL3b9UsYe3m1w3LbXjIessoUqup6t2sDuPawGri0hPJKT4h8iQQwqc8Ui50SPHHVI29Hec/a4C6J1gQuqZFIqJVCWhUe9QEjp9uyZD/ahGNqokr1bREKzrih35O1tUTFvxrqwzyhlI32VXLdk0QWYPJ8r1Cm2Ekx8ZdnfvRfV4WwGkIoVbml8sk2hkeAWqZ+ks1svwEyfgrOd296K/HV9uH76IbdB1aPUN2tPvXGAOUk5pnMDQ30CcIE3VReWBtK2h3IvWb5DvrGx0qE= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: ad482b16-9f22-429e-ce6d-08d6d0f14917 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:25.4538 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Add access functions to core dump register to enable trigger FW core dump. Signed-off-by: Moshe Shemesh Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/diag/fw_tracer.c | 34 +++++++++++++++++++ include/linux/mlx5/driver.h | 1 + include/linux/mlx5/mlx5_ifc.h | 17 +++++++++- 3 files changed, 51 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c index 6999f4486e9e..56025797cd1e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c @@ -786,6 +786,40 @@ static void mlx5_fw_tracer_ownership_change(struct work_struct *work) mlx5_fw_tracer_start(tracer); } +static int mlx5_fw_tracer_set_core_dump_reg(struct mlx5_core_dev *dev, + u32 *in, int size_in) +{ + u32 out[MLX5_ST_SZ_DW(core_dump_reg)] = {}; + + if (!MLX5_CAP_DEBUG(dev, core_dump_general) && + !MLX5_CAP_DEBUG(dev, core_dump_qp)) + return -EOPNOTSUPP; + + return mlx5_core_access_reg(dev, in, size_in, out, sizeof(out), + MLX5_REG_CORE_DUMP, 0, 1); +} + +int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev) +{ + struct mlx5_fw_tracer *tracer = dev->tracer; + u32 in[MLX5_ST_SZ_DW(core_dump_reg)] = {}; + int err; + + if (!MLX5_CAP_DEBUG(dev, core_dump_general) || !tracer) + return -EOPNOTSUPP; + if (!tracer->owner) + return -EPERM; + + MLX5_SET(core_dump_reg, in, core_dump_type, 0x0); + + err = mlx5_fw_tracer_set_core_dump_reg(dev, in, sizeof(in)); + if (err) + return err; + queue_work(tracer->work_queue, &tracer->handle_traces_work); + flush_workqueue(tracer->work_queue); + return 0; +} + /* Create software resources (Buffers, etc ..) */ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev) { diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index a362aa6c799c..ebda70984601 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -108,6 +108,7 @@ enum { MLX5_REG_FPGA_CAP = 0x4022, MLX5_REG_FPGA_CTRL = 0x4023, MLX5_REG_FPGA_ACCESS_REG = 0x4024, + MLX5_REG_CORE_DUMP = 0x402e, MLX5_REG_PCAP = 0x5001, MLX5_REG_PMTU = 0x5003, MLX5_REG_PTYS = 0x5004, diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 82612741b29e..9baee29b7124 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -715,7 +715,9 @@ struct mlx5_ifc_qos_cap_bits { }; struct mlx5_ifc_debug_cap_bits { - u8 reserved_at_0[0x20]; + u8 core_dump_general[0x1]; + u8 core_dump_qp[0x1]; + u8 reserved_at_2[0x1e]; u8 reserved_at_20[0x2]; u8 stall_detect[0x1]; @@ -2531,6 +2533,7 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_e_switch_cap_bits e_switch_cap; struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap; struct mlx5_ifc_qos_cap_bits qos_cap; + struct mlx5_ifc_debug_cap_bits debug_cap; struct mlx5_ifc_fpga_cap_bits fpga_cap; u8 reserved_at_0[0x8000]; }; @@ -8546,6 +8549,18 @@ struct mlx5_ifc_qcam_reg_bits { u8 reserved_at_1c0[0x80]; }; +struct mlx5_ifc_core_dump_reg_bits { + u8 reserved_at_0[0x18]; + u8 core_dump_type[0x8]; + + u8 reserved_at_20[0x30]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x8]; + u8 qpn[0x18]; + u8 reserved_at_80[0x180]; +}; + struct mlx5_ifc_pcap_reg_bits { u8 reserved_at_0[0x8]; u8 local_port[0x8]; From patchwork Sun May 5 00:33:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095331 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="lyw7uv4m"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhS3Qhcz9s4V for ; Sun, 5 May 2019 10:34:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727448AbfEEAea (ORCPT ); Sat, 4 May 2019 20:34:30 -0400 Received: from mail-eopbgr70052.outbound.protection.outlook.com ([40.107.7.52]:1287 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727425AbfEEAea (ORCPT ); Sat, 4 May 2019 20:34:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=U9YVW0LcdR4bYmNyDO8PmQi1DITm6Sd5sZXZRe2Rkqo=; b=lyw7uv4mFHnO9gzEl2B3ANNB5qykpl8qTq4goLLMZCUc0m+4YPzXgtbWVa8i2acOxJZWjln60vfFOiylKG7LEsRddjmNafXhsnBrPC3HdYn7CqxHznOPXvaPOBZQWaO83TyBP000Ddq6QypUk0GBFZULuM9dZSfG2iMRIrtdKJY= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:27 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:27 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Eran Ben Elisha , Saeed Mahameed Subject: [net-next 11/15] net/mlx5: Add support for FW reporter dump Thread-Topic: [net-next 11/15] net/mlx5: Add support for FW reporter dump Thread-Index: AQHVAton26WYhA1KEUucrX0hQSsEEA== Date: Sun, 5 May 2019 00:33:27 +0000 Message-ID: <20190505003207.1353-12-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b3320c58-c6dd-4271-29b4-08d6d0f14a39 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:261; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(51234002)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 7+JCfCpphqnscqqrIaI/U8Sjzl+cyF45NQyuW9t00Dimxh2npRiribiDp0UlgWHaNi+aVICq+ZOHZgVee0356KtcBX4e3Mm/h4tNaL4gXavBDoTf48PU1PznPk4eNNle8lG4OCodlmtC0VJXMjOR1bARWz4u+CF+IL12pUyuM2F+2z1Fb0c54M5FsK4axsXp9MLfKyUuXQDs7pZorSxUBZy0GEgkfGQgKLhXwbfb5VL4AgcR3OWKCpyLTYdtNmjzrAd41DE/NHfKc9C031S2C0/amYR6aTIAPw4DCRWAZI0tgELFGW8Jk77M6ThevKg7Gzhx5HW9rIQbvkR1Wk+/MxK7UYvCEnJ/uTjWMM7QcbRyqMF4+gIWaBQWRonEBS5Qr6VfkfxG03mQNHOfwWfPjSAsfuVia60lxdNXCBfBzaM= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: b3320c58-c6dd-4271-29b4-08d6d0f14a39 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:27.4762 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Add support of dump callback for mlx5 FW reporter. Once we trigger FW dump, the FW will write the core dump to its raw data buffer. The tracer translates the raw data to traces and save it to a buffer. Once dump is done, the saved traces data is filled as objects into the dump buffer. FW dump example: $ devlink health dump show pci/0000:82:00.1 reporter fw dump traces: trace: 0000:82:00.1 [0x69cd6c5283e] 0 [0xb8] dump general info GVMI=0x0001 trace: 0000:82:00.1 [0x69cd6c53bec] 0 [0xb8] GVMI management info, gvmi_management context: trace: 0000:82:00.1 [0x69cd6c55eff] 0 [0xb8] [000]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c5657f] 0 [0xb8] [010]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c56608] 0 [0xb8] [020]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c566ff] 0 [0xb8] [030]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c5677f] 0 [0xb8] [040]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c5687f] 0 [0xb8] [050]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c568ff] 0 [0xb8] [060]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c569a5] 0 [0xb8] [070]: 00000000 00000000 00000000 00000000 trace: 0000:82:00.1 [0x69cd6c57021] 0 [0xb8] CMDIF dbase from IRON: active_dbase_slots = 0x00000000 trace: 0000:82:00.1 [0x69cd6c58dae] 0 [0xb8] GVMI=0x0001 hw_toc context: trace: 0000:82:00.1 [0x69cd6c58e7f] 0 [0xb8] [000]: 00400100 00000000 00000000 fffff000 trace: 0000:82:00.1 [0x69cd6c58f7f] 0 [0xb8] [010]: 00000000 00000000 00000000 00000000 ... ... Signed-off-by: Moshe Shemesh Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/diag/fw_tracer.c | 109 ++++++++++++++++++ .../mellanox/mlx5/core/diag/fw_tracer.h | 14 +++ .../net/ethernet/mellanox/mlx5/core/health.c | 46 ++++++++ 3 files changed, 169 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c index 56025797cd1e..8c3e6727a984 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c @@ -243,6 +243,27 @@ static int mlx5_fw_tracer_allocate_strings_db(struct mlx5_fw_tracer *tracer) return -ENOMEM; } +static int +mlx5_fw_tracer_allocate_saved_traces_buff(struct mlx5_fw_tracer *tracer) +{ + int traces_buff_size = SAVED_TRACES_BUFFER_SIZE_BYTE; + + tracer->sbuff.traces_buff = kzalloc(traces_buff_size, GFP_KERNEL); + if (!tracer->sbuff.traces_buff) + return -ENOMEM; + tracer->sbuff.saved_traces_index = 0; + mutex_init(&tracer->sbuff.lock); + + return 0; +} + +static void +mlx5_fw_tracer_free_saved_traces_buff(struct mlx5_fw_tracer *tracer) +{ + kfree(tracer->sbuff.traces_buff); + tracer->sbuff.traces_buff = NULL; +} + static void mlx5_tracer_read_strings_db(struct work_struct *work) { struct mlx5_fw_tracer *tracer = container_of(work, struct mlx5_fw_tracer, @@ -522,6 +543,24 @@ static void mlx5_fw_tracer_clean_ready_list(struct mlx5_fw_tracer *tracer) list_del(&str_frmt->list); } +static void mlx5_fw_tracer_save_trace(struct mlx5_fw_tracer *tracer, + u64 timestamp, bool lost, + u8 event_id, char *msg) +{ + char *saved_traces = tracer->sbuff.traces_buff; + u32 offset; + + mutex_lock(&tracer->sbuff.lock); + offset = tracer->sbuff.saved_traces_index * TRACE_STR_LINE; + snprintf(saved_traces + offset, TRACE_STR_LINE, + "%s [0x%llx] %d [0x%x] %s", dev_name(&tracer->dev->pdev->dev), + timestamp, lost, event_id, msg); + + tracer->sbuff.saved_traces_index = + (tracer->sbuff.saved_traces_index + 1) & (SAVED_TRACES_NUM - 1); + mutex_unlock(&tracer->sbuff.lock); +} + static void mlx5_tracer_print_trace(struct tracer_string_format *str_frmt, struct mlx5_core_dev *dev, u64 trace_timestamp) @@ -540,6 +579,9 @@ static void mlx5_tracer_print_trace(struct tracer_string_format *str_frmt, trace_mlx5_fw(dev->tracer, trace_timestamp, str_frmt->lost, str_frmt->event_id, tmp); + mlx5_fw_tracer_save_trace(dev->tracer, trace_timestamp, + str_frmt->lost, str_frmt->event_id, tmp); + /* remove it from hash */ mlx5_tracer_clean_message(str_frmt); } @@ -820,6 +862,64 @@ int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev) return 0; } +static int +mlx5_devlink_fmsg_fill_trace(struct devlink_fmsg *fmsg, + char *trace) +{ + int err; + + err = devlink_fmsg_obj_nest_start(fmsg); + if (err) + return err; + + err = devlink_fmsg_string_pair_put(fmsg, "trace", trace); + if (err) + return err; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + return 0; +} + +int mlx5_fw_tracer_get_saved_traces_objects(struct mlx5_fw_tracer *tracer, + struct devlink_fmsg *fmsg) +{ + char *saved_traces = tracer->sbuff.traces_buff; + u32 index, start_index, end_index; + u32 saved_traces_index; + int err; + + if (!saved_traces[0]) + return -ENOMSG; + + mutex_lock(&tracer->sbuff.lock); + saved_traces_index = tracer->sbuff.saved_traces_index; + if (saved_traces[saved_traces_index * TRACE_STR_LINE]) + start_index = saved_traces_index; + else + start_index = 0; + end_index = (saved_traces_index - 1) & (SAVED_TRACES_NUM - 1); + + err = devlink_fmsg_arr_pair_nest_start(fmsg, "dump traces"); + if (err) + goto unlock; + index = start_index; + while (index != end_index) { + err = mlx5_devlink_fmsg_fill_trace(fmsg, + saved_traces + index * TRACE_STR_LINE); + if (err) + goto unlock; + + index = (index + 1) & (SAVED_TRACES_NUM - 1); + } + + err = devlink_fmsg_arr_pair_nest_end(fmsg); +unlock: + mutex_unlock(&tracer->sbuff.lock); + return err; +} + /* Create software resources (Buffers, etc ..) */ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev) { @@ -867,10 +967,18 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev) goto free_log_buf; } + err = mlx5_fw_tracer_allocate_saved_traces_buff(tracer); + if (err) { + mlx5_core_warn(dev, "FWTracer: Create saved traces buffer failed %d\n", err); + goto free_strings_db; + } + mlx5_core_dbg(dev, "FWTracer: Tracer created\n"); return tracer; +free_strings_db: + mlx5_fw_tracer_free_strings_db(tracer); free_log_buf: mlx5_fw_tracer_destroy_log_buf(tracer); destroy_workqueue: @@ -951,6 +1059,7 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer) cancel_work_sync(&tracer->read_fw_strings_work); mlx5_fw_tracer_clean_ready_list(tracer); mlx5_fw_tracer_clean_print_hash(tracer); + mlx5_fw_tracer_free_saved_traces_buff(tracer); mlx5_fw_tracer_free_strings_db(tracer); mlx5_fw_tracer_destroy_log_buf(tracer); flush_workqueue(tracer->work_queue); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h index a8b8747f2b61..9dcf40a43399 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h @@ -46,6 +46,10 @@ #define TRACER_BLOCK_SIZE_BYTE 256 #define TRACES_PER_BLOCK 32 +#define TRACE_STR_LINE 256 +#define SAVED_TRACES_NUM 1024 +#define SAVED_TRACES_BUFFER_SIZE_BYTE (SAVED_TRACES_NUM * TRACE_STR_LINE) + #define TRACER_MAX_PARAMS 7 #define MESSAGE_HASH_BITS 6 #define MESSAGE_HASH_SIZE BIT(MESSAGE_HASH_BITS) @@ -83,6 +87,13 @@ struct mlx5_fw_tracer { u32 consumer_index; } buff; + /* Saved Traces Buffer */ + struct { + void *traces_buff; + u32 saved_traces_index; + struct mutex lock; /* Protect sbuff access */ + } sbuff; + u64 last_timestamp; struct work_struct handle_traces_work; struct hlist_head hash[MESSAGE_HASH_SIZE]; @@ -171,5 +182,8 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev); int mlx5_fw_tracer_init(struct mlx5_fw_tracer *tracer); void mlx5_fw_tracer_cleanup(struct mlx5_fw_tracer *tracer); void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer); +int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev); +int mlx5_fw_tracer_get_saved_traces_objects(struct mlx5_fw_tracer *tracer, + struct devlink_fmsg *fmsg); #endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 9ffa9c7f81a0..34b8252afad5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -41,6 +41,7 @@ #include "lib/eq.h" #include "lib/mlx5.h" #include "lib/pci_vsc.h" +#include "diag/fw_tracer.h" enum { MLX5_HEALTH_POLL_INTERVAL = 2 * HZ, @@ -452,9 +453,54 @@ mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter, return err; } +struct mlx5_fw_reporter_ctx { + u8 err_synd; + int miss_counter; +}; + +static int +mlx5_fw_reporter_ctx_pairs_put(struct devlink_fmsg *fmsg, + struct mlx5_fw_reporter_ctx *fw_reporter_ctx) +{ + int err; + + err = devlink_fmsg_u8_pair_put(fmsg, "Syndrome", + fw_reporter_ctx->err_synd); + if (err) + return err; + err = devlink_fmsg_u32_pair_put(fmsg, "fw_miss_counter", + fw_reporter_ctx->miss_counter); + if (err) + return err; + return 0; +} + +static int +mlx5_fw_reporter_dump(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, void *priv_ctx) +{ + struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter); + int err; + + err = mlx5_fw_tracer_trigger_core_dump_general(dev); + if (err) + return err; + + if (priv_ctx) { + struct mlx5_fw_reporter_ctx *fw_reporter_ctx = priv_ctx; + + err = mlx5_fw_reporter_ctx_pairs_put(fmsg, fw_reporter_ctx); + if (err) + return err; + } + + return mlx5_fw_tracer_get_saved_traces_objects(dev->tracer, fmsg); +} + static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = { .name = "fw", .diagnose = mlx5_fw_reporter_diagnose, + .dump = mlx5_fw_reporter_dump, }; static void mlx5_fw_reporter_create(struct mlx5_core_dev *dev) From patchwork Sun May 5 00:33:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095333 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="Dl8BrpkR"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhW4Rz5z9s4V for ; Sun, 5 May 2019 10:34:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727479AbfEEAee (ORCPT ); Sat, 4 May 2019 20:34:34 -0400 Received: from mail-eopbgr70052.outbound.protection.outlook.com ([40.107.7.52]:1287 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727434AbfEEAec (ORCPT ); Sat, 4 May 2019 20:34:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MzhWk4Mla3bIZq2XqF+SQzeJcWIUqpotm3sObNQkmBc=; b=Dl8BrpkRPEuzIViANBa77jWKW2V/0dLoKOMq7EVI8B9gaNOhQBBaRsiVIGSof8dA4yshttvFEPMZl9yLbStm+4UqbThL6ovDP5vKMNAXlLApMwavoT1l2ymdDBorC17heFGiixi8mSA4PJRiibDoidwGgw2USoGFVLGeMOnZ2AQ= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:29 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:29 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Eran Ben Elisha , Saeed Mahameed Subject: [net-next 12/15] net/mlx5: Report devlink health on FW issues Thread-Topic: [net-next 12/15] net/mlx5: Report devlink health on FW issues Thread-Index: AQHVAtopIO6TRZFjIUqNCGMmW93RRA== Date: Sun, 5 May 2019 00:33:29 +0000 Message-ID: <20190505003207.1353-13-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 64aff3ec-328a-44d7-b06a-08d6d0f14b6c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 2sIUn/Ql7M+r1FHduN+i6f69OOnFyEugr/X+9EL//BuNhTKQyMFXKx7UPN8eEbYqwqzKdCsNLnLDR5W+9scRg7Yrf0SBE0YzVqx06TMfVxbX/feNFsv1dKoaW+Iqoq+/sl/npUtD41OYqzfjINQJ9z5ISqYFxsH5II6FvupSNi+PhhdTqWlOhm70fz/dXdqUaCsxYZtL4z9VeE4cqXGYnyTbl0YCAehRdXWaXR9Npxt0gV/P2twl9Y4WkvfsCv6feN7W0UmFrouoC7UaT/4+9LHWI1mjksC2QTsPPCNbZsUpJljHZp0qHzz681kOOCdLFzyJI3fd36AWZoRIUDmPrvKuPKbxdIcslwOcdsUmiIxwzAnKUoKhLxVqffgQNZxJmOIWx2NbRY6zNQsM6fu3LgP4oC3lQPAlAkdD0mQAzoM= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 64aff3ec-328a-44d7-b06a-08d6d0f14b6c X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:29.3285 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Use devlink_health_report() to report any symptom of FW issue as FW counter miss or new health syndrom. The FW issues detected in mlx5 during poll_health which is called in timer atomic context and so health work queue is used to schedule the reports. Signed-off-by: Moshe Shemesh Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/health.c | 33 +++++++++++++++++++ include/linux/mlx5/driver.h | 2 ++ 2 files changed, 35 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 34b8252afad5..03b9fc9ebd6e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -497,6 +497,29 @@ mlx5_fw_reporter_dump(struct devlink_health_reporter *reporter, return mlx5_fw_tracer_get_saved_traces_objects(dev->tracer, fmsg); } +static void mlx5_fw_reporter_err_work(struct work_struct *work) +{ + struct mlx5_fw_reporter_ctx fw_reporter_ctx; + struct mlx5_core_health *health; + + health = container_of(work, struct mlx5_core_health, report_work); + + if (IS_ERR_OR_NULL(health->fw_reporter)) + return; + + fw_reporter_ctx.err_synd = health->synd; + fw_reporter_ctx.miss_counter = health->miss_counter; + if (fw_reporter_ctx.err_synd) { + devlink_health_report(health->fw_reporter, + "FW syndrome reported", &fw_reporter_ctx); + return; + } + if (fw_reporter_ctx.miss_counter) + devlink_health_report(health->fw_reporter, + "FW miss counter reported", + &fw_reporter_ctx); +} + static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = { .name = "fw", .diagnose = mlx5_fw_reporter_diagnose, @@ -554,8 +577,10 @@ static void poll_health(struct timer_list *t) { struct mlx5_core_dev *dev = from_timer(dev, t, priv.health.timer); struct mlx5_core_health *health = &dev->priv.health; + struct health_buffer __iomem *h = health->health; u32 fatal_error; u32 count; + u8 prev_synd; if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) goto out; @@ -570,8 +595,14 @@ static void poll_health(struct timer_list *t) if (health->miss_counter == MAX_MISSES) { mlx5_core_err(dev, "device's health compromised - reached miss count\n"); mlx5_print_health_info(dev); + queue_work(health->wq, &health->report_work); } + prev_synd = health->synd; + health->synd = ioread8(&h->synd); + if (health->synd && health->synd != prev_synd) + queue_work(health->wq, &health->report_work); + fatal_error = check_fatal_sensors(dev); if (fatal_error && !health->fatal_error) { @@ -621,6 +652,7 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev) spin_lock_irqsave(&health->wq_lock, flags); set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); spin_unlock_irqrestore(&health->wq_lock, flags); + cancel_work_sync(&health->report_work); cancel_work_sync(&health->work); } @@ -658,6 +690,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) return -ENOMEM; spin_lock_init(&health->wq_lock); INIT_WORK(&health->work, health_care); + INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work); health->crdump = NULL; health->info_buf = kmalloc(HEALTH_INFO_MAX_BUFF, GFP_KERNEL); if (!health->info_buf) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index ebda70984601..604079b4706c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -437,12 +437,14 @@ struct mlx5_core_health { struct timer_list timer; u32 prev; int miss_counter; + u8 synd; u32 fatal_error; /* wq spinlock to synchronize draining */ spinlock_t wq_lock; struct workqueue_struct *wq; unsigned long flags; struct work_struct work; + struct work_struct report_work; struct delayed_work recover_work; struct mlx5_fw_crdump *crdump; char *info_buf; From patchwork Sun May 5 00:33:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095334 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="PllwYxVY"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhZ1pc4z9s4V for ; Sun, 5 May 2019 10:34:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727489AbfEEAeg (ORCPT ); Sat, 4 May 2019 20:34:36 -0400 Received: from mail-eopbgr70077.outbound.protection.outlook.com ([40.107.7.77]:8128 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727425AbfEEAed (ORCPT ); Sat, 4 May 2019 20:34:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=llhbcD0EJXV3RvB+Ok803XFKDYyybhhAmpItl4hGf2I=; b=PllwYxVYOEK8vMcJSjvlYtKJhlDMQVf/pMvd8OkeRK8Hg9CQvPDwi+9TRhVw6q6ai6QQMFjyNHFzIWPJ8po2KgdrvxEi6vYrMW6wV0CWNZYkmDAiUJ22sRJelm9K8GvznzUqHaMTJhsJbbhGnjHMFx0/wsNbcShoZfPDzk7Q+hc= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:31 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:31 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Eran Ben Elisha , Saeed Mahameed Subject: [net-next 13/15] net/mlx5: Add fw fatal devlink health reporter Thread-Topic: [net-next 13/15] net/mlx5: Add fw fatal devlink health reporter Thread-Index: AQHVAtoqMTB3Gfv750CXhciXrkQZEg== Date: Sun, 5 May 2019 00:33:31 +0000 Message-ID: <20190505003207.1353-14-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e80d605a-0134-4ae1-3356-08d6d0f14ca2 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:6108; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(979002)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: FGaQcO65njGk1r+KzPq9zH+DquPIT9ZxAIYN7nlqfV2ZAvO5FUjF9Vn3MYmQptdGxlWlcwMGL7SXEwWCTgLgxpv2Zp9r0Hx7RBr9eSC3SEg4wbU5JLht0dfzCAruMTZIEOIOAdVjdZh5adVpmmdrRqdO4mo5gf8aJCmXqdwRPzGWAb5kI8E44Xb6BwdYPBaIqk0jjGoY/n1btoCkSpqyEsasFszZR9NRSeaH9xZDh82C9wRb97IUY01EsvNXw5YSOUpPMeFkjNb7KaeYzClT2Fdv7OPRVj5+4+L5eXTJ4jpnfihqT3rrJA63AI5SaiSeDr+tCcObKnt888Zk8YALqYJZFrelWbPUbRWiQKtjzgrs1bFpBZgYGJbgb6wf4IjNLkT+dioIDzOGpe3MIkqKuRYpWfM05mpcCtNW6qD6qTE= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: e80d605a-0134-4ae1-3356-08d6d0f14ca2 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:31.3649 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Create mlx5_devlink_health_reporter for fw fatal reporter. The fw fatal reporter is added in addition to the fw reporter and implements the recover callback. The point of having two reporters for FW issues, is that we don't want to run FW recover on any issue, but only fatal ones. Signed-off-by: Moshe Shemesh Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/health.c | 70 ++++++++++++++----- include/linux/mlx5/driver.h | 1 + 2 files changed, 54 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 03b9fc9ebd6e..e64f0e32cd67 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -301,31 +301,43 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev) /* How much time to wait until health resetting the driver (in msecs) */ #define MLX5_RECOVERY_WAIT_MSECS 60000 -static void health_care(struct work_struct *work) +static int mlx5_health_care(struct mlx5_core_dev *dev) { - struct mlx5_core_health *health; - struct mlx5_core_dev *dev; - struct mlx5_priv *priv; unsigned long end; - health = container_of(work, struct mlx5_core_health, work); - priv = container_of(health, struct mlx5_priv, health); - dev = container_of(priv, struct mlx5_core_dev, priv); mlx5_core_warn(dev, "handling bad device here\n"); mlx5_handle_bad_state(dev); - end = jiffies + msecs_to_jiffies(MLX5_RECOVERY_WAIT_MSECS); while (sensor_pci_not_working(dev)) { if (time_after(jiffies, end)) { mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n"); - return; + return -EIO; } msleep(100); } mlx5_core_err(dev, "starting health recovery flow\n"); mlx5_recover_device(dev); + if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state) || + check_fatal_sensors(dev)) { + mlx5_core_err(dev, "health recovery failed\n"); + return -EIO; + } + return 0; +} + +static void health_care_work(struct work_struct *work) +{ + struct mlx5_core_health *health; + struct mlx5_core_dev *dev; + struct mlx5_priv *priv; + + health = container_of(work, struct mlx5_core_health, work); + priv = container_of(health, struct mlx5_priv, health); + dev = container_of(priv, struct mlx5_core_dev, priv); + + mlx5_health_care(dev); } static const char *hsynd_str(u8 synd) @@ -526,7 +538,22 @@ static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = { .dump = mlx5_fw_reporter_dump, }; -static void mlx5_fw_reporter_create(struct mlx5_core_dev *dev) +static int +mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter, + void *priv_ctx) +{ + struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter); + + return mlx5_health_care(dev); +} + +static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = { + .name = "fw_fatal", + .recover = mlx5_fw_fatal_reporter_recover, +}; + +#define MLX5_REPORTER_FW_GRACEFUL_PERIOD 1200000 +static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev) { struct mlx5_core_health *health = &dev->priv.health; struct devlink *devlink = priv_to_devlink(dev); @@ -537,16 +564,25 @@ static void mlx5_fw_reporter_create(struct mlx5_core_dev *dev) if (IS_ERR(health->fw_reporter)) mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n", PTR_ERR(health->fw_reporter)); + + health->fw_fatal_reporter = + devlink_health_reporter_create(devlink, &mlx5_fw_fatal_reporter_ops, + MLX5_REPORTER_FW_GRACEFUL_PERIOD, + true, dev); + if (IS_ERR(health->fw_fatal_reporter)) + mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %ld\n", + PTR_ERR(health->fw_fatal_reporter)); } -static void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev) +static void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev) { struct mlx5_core_health *health = &dev->priv.health; - if (IS_ERR_OR_NULL(health->fw_reporter)) - return; + if (!IS_ERR_OR_NULL(health->fw_reporter)) + devlink_health_reporter_destroy(health->fw_reporter); - devlink_health_reporter_destroy(health->fw_reporter); + if (!IS_ERR_OR_NULL(health->fw_fatal_reporter)) + devlink_health_reporter_destroy(health->fw_fatal_reporter); } static unsigned long get_next_poll_jiffies(void) @@ -669,7 +705,7 @@ void mlx5_health_cleanup(struct mlx5_core_dev *dev) kfree(health->info_buf); destroy_workqueue(health->wq); - mlx5_fw_reporter_destroy(dev); + mlx5_fw_reporters_destroy(dev); } int mlx5_health_init(struct mlx5_core_dev *dev) @@ -689,7 +725,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) if (!health->wq) return -ENOMEM; spin_lock_init(&health->wq_lock); - INIT_WORK(&health->work, health_care); + INIT_WORK(&health->work, health_care_work); INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work); health->crdump = NULL; health->info_buf = kmalloc(HEALTH_INFO_MAX_BUFF, GFP_KERNEL); @@ -697,7 +733,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) goto err_alloc_buff; mutex_init(&health->info_buf_lock); - mlx5_fw_reporter_create(dev); + mlx5_fw_reporters_create(dev); return 0; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 604079b4706c..6f65787bf91b 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -452,6 +452,7 @@ struct mlx5_core_health { /* protect info buf access */ struct mutex info_buf_lock; struct devlink_health_reporter *fw_reporter; + struct devlink_health_reporter *fw_fatal_reporter; }; struct mlx5_qp_table { From patchwork Sun May 5 00:33:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095335 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="gRhsuZJv"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhc0bLSz9s4V for ; Sun, 5 May 2019 10:34:40 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727232AbfEEAei (ORCPT ); Sat, 4 May 2019 20:34:38 -0400 Received: from mail-eopbgr70052.outbound.protection.outlook.com ([40.107.7.52]:1287 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727404AbfEEAef (ORCPT ); Sat, 4 May 2019 20:34:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=i/B/09CNMTKAJYF8rs7p6+Ynm4v6ODbE3XAgx+ewOAc=; b=gRhsuZJvlnBYgs5e2+PyKUlNAL9h19H1DQZSxfklnlrx0f5Nl56iT8Iaz7wB7AfGC8JPAThp+a3JM3GcFwo0MjEvlg31HREQ/sIuJcOljWHxoZc5Ycr2Rp1qzeGv7SiQNUzQQTVQyybqHHu7f2NvQQStP4celRpJhC+2DIsRv8M= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:33 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:33 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Saeed Mahameed Subject: [net-next 14/15] net/mlx5: Add support for FW fatal reporter dump Thread-Topic: [net-next 14/15] net/mlx5: Add support for FW fatal reporter dump Thread-Index: AQHVAtorycA5HCVlPkqR2luU2dyXFA== Date: Sun, 5 May 2019 00:33:33 +0000 Message-ID: <20190505003207.1353-15-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 195f859a-dbb3-4191-76a5-08d6d0f14dbb x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:7219; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: wjxc/gEiIBR+w0D0kkiiaBo7jmCQD83TqMkxyzvSjXmQxN+8B5509PsKfMAOCzl5NZ+Jy+GotxAYr6rp1bzy/w87Q+gQp8VCHpfd0FZGN1hcAEc2zCWs4xv2b+ED16pLhVcKI/jBnS/8L11F3VNRkMxHW4gfmqt9l3XnSTcwYZ/rQ2HQtJOtpigBG+2vBzmOI/fOQOLWKq4Ls7KWapAAp0czJlXTzDRWGPmyL1cf3tVILgAQEZWK7Ylo5SSoG/26FU/d/qjyzZfANUunCc5GUrDll4Z1ASXGpAkasNnBdpdeRINaGrnMQLqocFso5ZOGDf9TTD4VaNuWq3JMjrc9gvs05+ztl+vD2ndlJNzhtf83OISndl1KCyNGDXPCMXO7LgRAlKE7Q+iCOi3RNp4A/u+SRPSOgy4iiROBOAKkYS8= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 195f859a-dbb3-4191-76a5-08d6d0f14dbb X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:33.2713 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Add support of dump callback for mlx5 FW fatal reporter. The FW fatal dump use cr-dump functionality to gather cr-space data for debug. The cr-dump uses vsc interface which is valid even if the FW command interface is not functional, which is the case in most FW fatal errors. The cr-dump is stored as a memory region snapshot to ease read by address. Command example and output: $ devlink health dump show pci/0000:82:00.0 reporter fw_fatal devlink_region_name: cr-space snapshot_id: 1 $ devlink region read pci/0000:82:00.0/cr-space snapshot 1 address 983064 length 8 00000000000f0018 e1 03 00 00 fb ae a9 3f Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/health.c | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index e64f0e32cd67..5271c88ef64c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -547,9 +547,48 @@ mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter, return mlx5_health_care(dev); } +static int +mlx5_fw_fatal_reporter_dump(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, void *priv_ctx) +{ + struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter); + char crdump_region[20]; + u32 snapshot_id; + int err; + + if (!mlx5_core_is_pf(dev)) { + mlx5_core_err(dev, "Only PF is permitted run FW fatal dump\n"); + return -EPERM; + } + + err = mlx5_crdump_collect(dev, crdump_region, &snapshot_id); + if (err) + return err; + + if (priv_ctx) { + struct mlx5_fw_reporter_ctx *fw_reporter_ctx = priv_ctx; + + err = mlx5_fw_reporter_ctx_pairs_put(fmsg, fw_reporter_ctx); + if (err) + return err; + } + + err = devlink_fmsg_string_pair_put(fmsg, "devlink_region_name", + crdump_region); + if (err) + return err; + + err = devlink_fmsg_u32_pair_put(fmsg, "snapshot_id", snapshot_id); + if (err) + return err; + + return 0; +} + static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = { .name = "fw_fatal", .recover = mlx5_fw_fatal_reporter_recover, + .dump = mlx5_fw_fatal_reporter_dump, }; #define MLX5_REPORTER_FW_GRACEFUL_PERIOD 1200000 From patchwork Sun May 5 00:33:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095336 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="YsCj0UVe"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhf4fywz9s4V for ; Sun, 5 May 2019 10:34:42 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727414AbfEEAel (ORCPT ); Sat, 4 May 2019 20:34:41 -0400 Received: from mail-eopbgr70077.outbound.protection.outlook.com ([40.107.7.77]:8128 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727469AbfEEAei (ORCPT ); Sat, 4 May 2019 20:34:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOzUJOG42MtFzxUguEMmGLywtmyi5IKiEdHVD8ftZ9U=; b=YsCj0UVeRj7/iqeHWAJ4fSUiq5b3fnKmSEvioUvLKNBuMHTPO/z0bkYxMJ6zQFze4hM+s8J84kk/+Jna/MOO4HhliHaSKgK1adp1sChl95Pg+GZDO5HlRZxA7FwfdOd64mHMmdXo+tJH0kB8GxCEIpYMVGy/ck9QcQFUcAKMXjs= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:35 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:35 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Saeed Mahameed Subject: [net-next 15/15] net/mlx5: Report devlink health on FW fatal issues Thread-Topic: [net-next 15/15] net/mlx5: Report devlink health on FW fatal issues Thread-Index: AQHVAtosBlFUcVCI2EKKz1oRQe2JyQ== Date: Sun, 5 May 2019 00:33:34 +0000 Message-ID: <20190505003207.1353-16-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: bee62673-58ee-45e7-2cee-08d6d0f14ecc x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: vhRbW0IJoIsoNaWxiyMvQwLqSnNWTg+OCyVOfjLhmH60Jos1VnpZEiSkVqkmKFzC8TScTNaPNayFJht4BK6FoeIQ21E4K7v2LtwBJa+FFSebU1Bu82F/dfVQ0agBzXCnTAqlcHPGnpRZ3E0IINdz1SeDBUF0hoMjyRSHgnvbLnM3bA8q8LsvkBUzuo2o8IDZ9/ITf+mjvP1QfXeHUrq2k26+BaM0vBlB1QIxez4WSGuf2BsMOljI0ABeVJdQvHX5Mm6QGLvvL3Sy9bEDLNQAvBSkkAl6ufkMnj6a7JBe7vhuqaLsOOuHv+helnA4SfccnUufx5oQzThPZz6vP3r8mTvq3YuEY/l97E22NVgVWIvMKMqz8nHB3krGvY7GlL4isfTIf7o2G5AwIXYjONZl9/5toK7rfwutxItZDUGwev4= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: bee62673-58ee-45e7-2cee-08d6d0f14ecc X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:34.9875 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Report devlink health on FW fatal issues via fw_fatal_reporter. The driver recover flow for FW fatal error is now being handled by the devlink health. Having the recovery controlled by devlink health, the user has the ability to cancel the auto-recovery for debug session and run it manually. Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/health.c | 42 ++++++++++++------- .../net/ethernet/mellanox/mlx5/core/main.c | 3 +- include/linux/mlx5/driver.h | 2 +- 3 files changed, 29 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 5271c88ef64c..e3c4e927945d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -327,19 +327,6 @@ static int mlx5_health_care(struct mlx5_core_dev *dev) return 0; } -static void health_care_work(struct work_struct *work) -{ - struct mlx5_core_health *health; - struct mlx5_core_dev *dev; - struct mlx5_priv *priv; - - health = container_of(work, struct mlx5_core_health, work); - priv = container_of(health, struct mlx5_priv, health); - dev = container_of(priv, struct mlx5_core_dev, priv); - - mlx5_health_care(dev); -} - static const char *hsynd_str(u8 synd) { switch (synd) { @@ -585,6 +572,29 @@ mlx5_fw_fatal_reporter_dump(struct devlink_health_reporter *reporter, return 0; } +static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work) +{ + struct mlx5_fw_reporter_ctx fw_reporter_ctx; + struct mlx5_core_health *health; + struct mlx5_core_dev *dev; + struct mlx5_priv *priv; + + health = container_of(work, struct mlx5_core_health, fatal_report_work); + priv = container_of(health, struct mlx5_priv, health); + dev = container_of(priv, struct mlx5_core_dev, priv); + + mlx5_enter_error_state(dev, false); + if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) { + if (mlx5_health_care(dev)) + mlx5_core_err(dev, "health recovery failed\n"); + return; + } + fw_reporter_ctx.err_synd = health->synd; + fw_reporter_ctx.miss_counter = health->miss_counter; + devlink_health_report(health->fw_fatal_reporter, + "FW fatal error reported", &fw_reporter_ctx); +} + static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = { .name = "fw_fatal", .recover = mlx5_fw_fatal_reporter_recover, @@ -642,7 +652,7 @@ void mlx5_trigger_health_work(struct mlx5_core_dev *dev) spin_lock_irqsave(&health->wq_lock, flags); if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) - queue_work(health->wq, &health->work); + queue_work(health->wq, &health->fatal_report_work); else mlx5_core_err(dev, "new health works are not permitted at this stage\n"); spin_unlock_irqrestore(&health->wq_lock, flags); @@ -728,7 +738,7 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev) set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); spin_unlock_irqrestore(&health->wq_lock, flags); cancel_work_sync(&health->report_work); - cancel_work_sync(&health->work); + cancel_work_sync(&health->fatal_report_work); } void mlx5_health_flush(struct mlx5_core_dev *dev) @@ -764,7 +774,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) if (!health->wq) return -ENOMEM; spin_lock_init(&health->wq_lock); - INIT_WORK(&health->work, health_care_work); + INIT_WORK(&health->fatal_report_work, mlx5_fw_fatal_reporter_err_work); INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work); health->crdump = NULL; health->info_buf = kmalloc(HEALTH_INFO_MAX_BUFF, GFP_KERNEL); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index c22ff9a58ec5..b1ad7369e014 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1367,7 +1367,8 @@ static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev, mlx5_core_info(dev, "%s was called\n", __func__); - mlx5_enter_error_state(dev, false); + if (state) + mlx5_enter_error_state(dev, false); mlx5_error_sw_reset(dev); mlx5_unload_one(dev, false); /* In case of kernel call drain the health wq */ diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 6f65787bf91b..b5b5baca5aee 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -443,7 +443,7 @@ struct mlx5_core_health { spinlock_t wq_lock; struct workqueue_struct *wq; unsigned long flags; - struct work_struct work; + struct work_struct fatal_report_work; struct work_struct report_work; struct delayed_work recover_work; struct mlx5_fw_crdump *crdump;