From patchwork Sun May 5 00:33:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 1095336 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=mellanox.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=Mellanox.com header.i=@Mellanox.com header.b="YsCj0UVe"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44xRhf4fywz9s4V for ; Sun, 5 May 2019 10:34:42 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727414AbfEEAel (ORCPT ); Sat, 4 May 2019 20:34:41 -0400 Received: from mail-eopbgr70077.outbound.protection.outlook.com ([40.107.7.77]:8128 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727469AbfEEAei (ORCPT ); Sat, 4 May 2019 20:34:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOzUJOG42MtFzxUguEMmGLywtmyi5IKiEdHVD8ftZ9U=; b=YsCj0UVeRj7/iqeHWAJ4fSUiq5b3fnKmSEvioUvLKNBuMHTPO/z0bkYxMJ6zQFze4hM+s8J84kk/+Jna/MOO4HhliHaSKgK1adp1sChl95Pg+GZDO5HlRZxA7FwfdOd64mHMmdXo+tJH0kB8GxCEIpYMVGy/ck9QcQFUcAKMXjs= Received: from DB8PR05MB5898.eurprd05.prod.outlook.com (20.179.9.32) by DB8PR05MB5881.eurprd05.prod.outlook.com (20.179.10.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Sun, 5 May 2019 00:33:35 +0000 Received: from DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07]) by DB8PR05MB5898.eurprd05.prod.outlook.com ([fe80::ed24:8317:76e4:1a07%5]) with mapi id 15.20.1856.012; Sun, 5 May 2019 00:33:35 +0000 From: Saeed Mahameed To: "David S. Miller" CC: "netdev@vger.kernel.org" , Jiri Pirko , Moshe Shemesh , Saeed Mahameed Subject: [net-next 15/15] net/mlx5: Report devlink health on FW fatal issues Thread-Topic: [net-next 15/15] net/mlx5: Report devlink health on FW fatal issues Thread-Index: AQHVAtosBlFUcVCI2EKKz1oRQe2JyQ== Date: Sun, 5 May 2019 00:33:34 +0000 Message-ID: <20190505003207.1353-16-saeedm@mellanox.com> References: <20190505003207.1353-1-saeedm@mellanox.com> In-Reply-To: <20190505003207.1353-1-saeedm@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.20.1 x-originating-ip: [73.15.39.150] x-clientproxiedby: BY5PR13CA0008.namprd13.prod.outlook.com (2603:10b6:a03:180::21) To DB8PR05MB5898.eurprd05.prod.outlook.com (2603:10a6:10:a4::32) authentication-results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: bee62673-58ee-45e7-2cee-08d6d0f14ecc x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:DB8PR05MB5881; x-ms-traffictypediagnostic: DB8PR05MB5881: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 00286C0CA6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39850400004)(136003)(396003)(199004)(189003)(305945005)(52116002)(76176011)(36756003)(316002)(25786009)(6486002)(478600001)(14454004)(446003)(50226002)(476003)(11346002)(2616005)(26005)(7736002)(4326008)(99286004)(86362001)(6916009)(53936002)(66476007)(186003)(68736007)(66446008)(64756008)(66556008)(6436002)(66946007)(73956011)(6512007)(14444005)(1076003)(66066001)(71190400001)(71200400001)(54906003)(256004)(102836004)(81156014)(81166006)(8936002)(3846002)(6506007)(386003)(107886003)(2906002)(8676002)(5660300002)(6116002)(486006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB8PR05MB5881; H:DB8PR05MB5898.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: vhRbW0IJoIsoNaWxiyMvQwLqSnNWTg+OCyVOfjLhmH60Jos1VnpZEiSkVqkmKFzC8TScTNaPNayFJht4BK6FoeIQ21E4K7v2LtwBJa+FFSebU1Bu82F/dfVQ0agBzXCnTAqlcHPGnpRZ3E0IINdz1SeDBUF0hoMjyRSHgnvbLnM3bA8q8LsvkBUzuo2o8IDZ9/ITf+mjvP1QfXeHUrq2k26+BaM0vBlB1QIxez4WSGuf2BsMOljI0ABeVJdQvHX5Mm6QGLvvL3Sy9bEDLNQAvBSkkAl6ufkMnj6a7JBe7vhuqaLsOOuHv+helnA4SfccnUufx5oQzThPZz6vP3r8mTvq3YuEY/l97E22NVgVWIvMKMqz8nHB3krGvY7GlL4isfTIf7o2G5AwIXYjONZl9/5toK7rfwutxItZDUGwev4= MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: bee62673-58ee-45e7-2cee-08d6d0f14ecc X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2019 00:33:34.9875 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR05MB5881 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Moshe Shemesh Report devlink health on FW fatal issues via fw_fatal_reporter. The driver recover flow for FW fatal error is now being handled by the devlink health. Having the recovery controlled by devlink health, the user has the ability to cancel the auto-recovery for debug session and run it manually. Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/health.c | 42 ++++++++++++------- .../net/ethernet/mellanox/mlx5/core/main.c | 3 +- include/linux/mlx5/driver.h | 2 +- 3 files changed, 29 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 5271c88ef64c..e3c4e927945d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -327,19 +327,6 @@ static int mlx5_health_care(struct mlx5_core_dev *dev) return 0; } -static void health_care_work(struct work_struct *work) -{ - struct mlx5_core_health *health; - struct mlx5_core_dev *dev; - struct mlx5_priv *priv; - - health = container_of(work, struct mlx5_core_health, work); - priv = container_of(health, struct mlx5_priv, health); - dev = container_of(priv, struct mlx5_core_dev, priv); - - mlx5_health_care(dev); -} - static const char *hsynd_str(u8 synd) { switch (synd) { @@ -585,6 +572,29 @@ mlx5_fw_fatal_reporter_dump(struct devlink_health_reporter *reporter, return 0; } +static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work) +{ + struct mlx5_fw_reporter_ctx fw_reporter_ctx; + struct mlx5_core_health *health; + struct mlx5_core_dev *dev; + struct mlx5_priv *priv; + + health = container_of(work, struct mlx5_core_health, fatal_report_work); + priv = container_of(health, struct mlx5_priv, health); + dev = container_of(priv, struct mlx5_core_dev, priv); + + mlx5_enter_error_state(dev, false); + if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) { + if (mlx5_health_care(dev)) + mlx5_core_err(dev, "health recovery failed\n"); + return; + } + fw_reporter_ctx.err_synd = health->synd; + fw_reporter_ctx.miss_counter = health->miss_counter; + devlink_health_report(health->fw_fatal_reporter, + "FW fatal error reported", &fw_reporter_ctx); +} + static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = { .name = "fw_fatal", .recover = mlx5_fw_fatal_reporter_recover, @@ -642,7 +652,7 @@ void mlx5_trigger_health_work(struct mlx5_core_dev *dev) spin_lock_irqsave(&health->wq_lock, flags); if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) - queue_work(health->wq, &health->work); + queue_work(health->wq, &health->fatal_report_work); else mlx5_core_err(dev, "new health works are not permitted at this stage\n"); spin_unlock_irqrestore(&health->wq_lock, flags); @@ -728,7 +738,7 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev) set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); spin_unlock_irqrestore(&health->wq_lock, flags); cancel_work_sync(&health->report_work); - cancel_work_sync(&health->work); + cancel_work_sync(&health->fatal_report_work); } void mlx5_health_flush(struct mlx5_core_dev *dev) @@ -764,7 +774,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) if (!health->wq) return -ENOMEM; spin_lock_init(&health->wq_lock); - INIT_WORK(&health->work, health_care_work); + INIT_WORK(&health->fatal_report_work, mlx5_fw_fatal_reporter_err_work); INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work); health->crdump = NULL; health->info_buf = kmalloc(HEALTH_INFO_MAX_BUFF, GFP_KERNEL); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index c22ff9a58ec5..b1ad7369e014 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1367,7 +1367,8 @@ static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev, mlx5_core_info(dev, "%s was called\n", __func__); - mlx5_enter_error_state(dev, false); + if (state) + mlx5_enter_error_state(dev, false); mlx5_error_sw_reset(dev); mlx5_unload_one(dev, false); /* In case of kernel call drain the health wq */ diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 6f65787bf91b..b5b5baca5aee 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -443,7 +443,7 @@ struct mlx5_core_health { spinlock_t wq_lock; struct workqueue_struct *wq; unsigned long flags; - struct work_struct work; + struct work_struct fatal_report_work; struct work_struct report_work; struct delayed_work recover_work; struct mlx5_fw_crdump *crdump;