From patchwork Mon Oct 16 14:28:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiri Pirko X-Patchwork-Id: 826323 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=resnulli-us.20150623.gappssmtp.com header.i=@resnulli-us.20150623.gappssmtp.com header.b="byZ2x9IW"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yG1010mXwz9sP1 for ; Tue, 17 Oct 2017 01:28:33 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753241AbdJPO2b (ORCPT ); Mon, 16 Oct 2017 10:28:31 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:51300 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753173AbdJPO2a (ORCPT ); Mon, 16 Oct 2017 10:28:30 -0400 Received: by mail-wm0-f67.google.com with SMTP id f4so3976012wme.0 for ; Mon, 16 Oct 2017 07:28:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=resnulli-us.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=4Lv18Yop0b/KOifrDxoKiH/U6EvaAXlhTzc55rdP5Kw=; b=byZ2x9IWd9G2pVzqys4bAtcSfgqTuH85W2HTFTyCMevtbZ14EP/I1anodBBiFQd5iS 9TJFoqtbv9EVt6rF30I7HtgGQI4MTZUokzrVmqD5JVZrxFYbJBsVxM9LOhK9xDPQVXgh RAlmtwY6o9MM21+B1A00jDv6EJYwxxgaoYsOw3PVc6iREDNxYKQdyc6mEByFoyxMPAPL 8wI26JBbbCBWBryAJWevxiAdQXlb2pKr2h4ar67ra64UpPxnSrxwhRy55bqwkt9o1Q1J 5BGTgkWZloR0DoiQkZiySZGX1PqHzcjVLqBzujfXVJWGaMdtmJDuZrTHBLyWsWR6jfsi UOpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=4Lv18Yop0b/KOifrDxoKiH/U6EvaAXlhTzc55rdP5Kw=; b=GA5IS6f9UW1VIf4N5fhXFUctqjuEfcPhEzkgsfySuw/1e/SwctHje3OdqhPrhVJWQe cI5Xe5TTkGu3b2kMWoWqGZRljvvtUvDvaoy3Z+TFxbBGCFSiEzmDkZdmiLBx4LrdHU3N zOLxYWSR9U5R2WBdJZ1Ti95frqKu5DuOqNk0C4EZEnw6+xw+4T9n2+RnvnTBHGQyDRa/ 26iOHdFW4LL5gZgjy0/hvRRqCGTr70WreP3FaYkxhe05VCzM4fQykRzBrKYEQ2YYtPMe wOdkYtagCUeXdma9s9WkfuuLj6jE+L/E05a9fs/Yi+8OtOmiGz3bBRfm/R0q04UpMGFV UxCw== X-Gm-Message-State: AMCzsaXVQbJV9ISi3j0fexLCDyYtFOt02LEeGypLS+e+hCZMy5VAReo1 zV74XMSy5ohc4LrTiifc+cnLKzvV X-Google-Smtp-Source: ABhQp+TG6T6Tmy8pmlID1vjN57O/CS71fXgguFt8fvqa9rKKQ4CBMh1hc17zWQ8pfqFkgFgCtyCD3Q== X-Received: by 10.28.127.206 with SMTP id a197mr1011627wmd.12.1508164109087; Mon, 16 Oct 2017 07:28:29 -0700 (PDT) Received: from localhost (ip-89-177-136-69.net.upcbroadband.cz. [89.177.136.69]) by smtp.gmail.com with ESMTPSA id g206sm5086712wme.23.2017.10.16.07.28.28 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 16 Oct 2017 07:28:28 -0700 (PDT) From: Jiri Pirko To: netdev@vger.kernel.org Cc: davem@davemloft.net, idosch@mellanox.com, mlxsw@mellanox.com Subject: [patch net] mlxsw: core: Fix possible deadlock Date: Mon, 16 Oct 2017 16:28:28 +0200 Message-Id: <20171016142828.2742-1-jiri@resnulli.us> X-Mailer: git-send-email 2.9.5 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ido Schimmel When an EMAD is transmitted, a timeout work item is scheduled with a delay of 200ms, so that another EMAD will be retried until a maximum of five retries. In certain situations, it's possible for the function waiting on the EMAD to be associated with a work item that is queued on the same workqueue (`mlxsw_core`) as the timeout work item. This results in flushing a work item on the same workqueue. According to commit e159489baa71 ("workqueue: relax lockdep annotation on flush_work()") the above may lead to a deadlock in case the workqueue has only one worker active or if the system in under memory pressure and the rescue worker is in use. The latter explains the very rare and random nature of the lockdep splats we have been seeing: [ 52.730240] ============================================ [ 52.736179] WARNING: possible recursive locking detected [ 52.742119] 4.14.0-rc3jiri+ #4 Not tainted [ 52.746697] -------------------------------------------- [ 52.752635] kworker/1:3/599 is trying to acquire lock: [ 52.758378] (mlxsw_core_driver_name){+.+.}, at: [] flush_work+0x3a4/0x5e0 [ 52.767837] but task is already holding lock: [ 52.774360] (mlxsw_core_driver_name){+.+.}, at: [] process_one_work+0x7d4/0x12f0 [ 52.784495] other info that might help us debug this: [ 52.791794] Possible unsafe locking scenario: [ 52.798413] CPU0 [ 52.801144] ---- [ 52.803875] lock(mlxsw_core_driver_name); [ 52.808556] lock(mlxsw_core_driver_name); [ 52.813236] *** DEADLOCK *** [ 52.819857] May be due to missing lock nesting notation [ 52.827450] 3 locks held by kworker/1:3/599: [ 52.832221] #0: (mlxsw_core_driver_name){+.+.}, at: [] process_one_work+0x7d4/0x12f0 [ 52.842846] #1: ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [] process_one_work+0x7d4/0x12f0 [ 52.854537] #2: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20 [ 52.863021] stack backtrace: [ 52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4 [ 52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 [ 52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum] [ 52.894060] Call Trace: [ 52.909122] __lock_acquire+0xf6f/0x2a10 [ 53.025412] lock_acquire+0x158/0x440 [ 53.047557] flush_work+0x3c4/0x5e0 [ 53.087571] __cancel_work_timer+0x3ca/0x5e0 [ 53.177051] cancel_delayed_work_sync+0x13/0x20 [ 53.182142] mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core] [ 53.194571] mlxsw_core_reg_access+0x586/0x990 [mlxsw_core] [ 53.225365] mlxsw_reg_query+0x10/0x20 [mlxsw_core] [ 53.230882] mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum] [ 53.237801] process_one_work+0x8f1/0x12f0 [ 53.321804] worker_thread+0x1fd/0x10c0 [ 53.435158] kthread+0x28e/0x370 [ 53.448703] ret_from_fork+0x2a/0x40 [ 53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774) [ 53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770) [ 53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed)) [ 53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications Fix this by creating another workqueue for EMAD timeouts, thereby preventing the situation of a work item trying to flush a work item queued on the same workqueue. Fixes: caf7297e7ab5f ("mlxsw: core: Introduce support for asynchronous EMAD register access") Signed-off-by: Ido Schimmel Reported-by: Jiri Pirko Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/core.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index 9d5e7cf..f3315bc 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -96,6 +96,7 @@ struct mlxsw_core { const struct mlxsw_bus *bus; void *bus_priv; const struct mlxsw_bus_info *bus_info; + struct workqueue_struct *emad_wq; struct list_head rx_listener_list; struct list_head event_listener_list; struct { @@ -465,7 +466,7 @@ static void mlxsw_emad_trans_timeout_schedule(struct mlxsw_reg_trans *trans) { unsigned long timeout = msecs_to_jiffies(MLXSW_EMAD_TIMEOUT_MS); - mlxsw_core_schedule_dw(&trans->timeout_dw, timeout); + queue_delayed_work(trans->core->emad_wq, &trans->timeout_dw, timeout); } static int mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core, @@ -587,12 +588,18 @@ static const struct mlxsw_listener mlxsw_emad_rx_listener = static int mlxsw_emad_init(struct mlxsw_core *mlxsw_core) { + struct workqueue_struct *emad_wq; u64 tid; int err; if (!(mlxsw_core->bus->features & MLXSW_BUS_F_TXRX)) return 0; + emad_wq = alloc_workqueue("mlxsw_core_emad", WQ_MEM_RECLAIM, 0); + if (!emad_wq) + return -ENOMEM; + mlxsw_core->emad_wq = emad_wq; + /* Set the upper 32 bits of the transaction ID field to a random * number. This allows us to discard EMADs addressed to other * devices. @@ -619,6 +626,7 @@ static int mlxsw_emad_init(struct mlxsw_core *mlxsw_core) err_emad_trap_set: mlxsw_core_trap_unregister(mlxsw_core, &mlxsw_emad_rx_listener, mlxsw_core); + destroy_workqueue(mlxsw_core->emad_wq); return err; } @@ -631,6 +639,7 @@ static void mlxsw_emad_fini(struct mlxsw_core *mlxsw_core) mlxsw_core->emad.use_emad = false; mlxsw_core_trap_unregister(mlxsw_core, &mlxsw_emad_rx_listener, mlxsw_core); + destroy_workqueue(mlxsw_core->emad_wq); } static struct sk_buff *mlxsw_emad_alloc(const struct mlxsw_core *mlxsw_core,