From patchwork Mon Jul 10 20:23:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jacob Keller X-Patchwork-Id: 786377 X-Patchwork-Delegate: jeffrey.t.kirsher@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3x5xW272hBz9s06 for ; Tue, 11 Jul 2017 06:23:42 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 531C88873D; Mon, 10 Jul 2017 20:23:41 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jqVBiEfQBKgU; Mon, 10 Jul 2017 20:23:38 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by hemlock.osuosl.org (Postfix) with ESMTP id 32FCD884A6; Mon, 10 Jul 2017 20:23:36 +0000 (UTC) X-Original-To: intel-wired-lan@lists.osuosl.org Delivered-To: intel-wired-lan@lists.osuosl.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by ash.osuosl.org (Postfix) with ESMTP id A242F1CEAC7 for ; Mon, 10 Jul 2017 20:23:28 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 83AC4887E5 for ; Mon, 10 Jul 2017 20:23:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id F0lqv3YMKr6d for ; Mon, 10 Jul 2017 20:23:22 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by hemlock.osuosl.org (Postfix) with ESMTPS id 98B32875DF for ; Mon, 10 Jul 2017 20:23:22 +0000 (UTC) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jul 2017 13:23:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,342,1496127600"; d="scan'208";a="285198867" Received: from jekeller-desk.amr.corp.intel.com (HELO jekeller-desk.jekeller.internal) ([134.134.177.230]) by fmsmga004.fm.intel.com with ESMTP; 10 Jul 2017 13:23:21 -0700 From: Jacob Keller To: jtkirhse@osuosl.org, Intel Wired LAN Date: Mon, 10 Jul 2017 13:23:13 -0700 Message-Id: <20170710202319.22110-10-jacob.e.keller@intel.com> X-Mailer: git-send-email 2.13.0.615.gb09ed6e59a40 In-Reply-To: <20170710202319.22110-1-jacob.e.keller@intel.com> References: <20170710202319.22110-1-jacob.e.keller@intel.com> Cc: jekeller@osuosl.org Subject: [Intel-wired-lan] [PATCH v3 10/16] fm10k: prevent race condition of __FM10K_SERVICE_SCHED X-BeenThere: intel-wired-lan@osuosl.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Intel Wired Ethernet Linux Kernel Driver Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" Although very unlikely, it is possible that cancel_work_sync() may stop the service_task before it actually started. In this case, the __FM10K_SERVICE_SCHED bit will never be cleared. This results in the service task being unable to reschedule in the future. Add a helper function which sets the service disable bit, waits for the service task to stop and clears the schedule bit, thus avoiding the race condition. We know the schedule bit is safe to clear because the cancel_work_sync() guarantees the service task is not running. Add a helper function also to restart the service task, for symmetry. This is not strictly needed but helps the mental model of how to stop and start the service task. This race could only happen in fm10k_suspend/fm10k_resume as this is the only place where the service task is actually restarted. Thus, suspend/resume testing would be ideal. However, note that the chance of this happening is very slim as the service event is scheduled for immediate execution, and you would have to trigger a suspend at almost the exact same time as the service task was scheduled. Signed-off-by: Jacob Keller Tested-by: Krishneil Singh --- drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 32 ++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c index 6a7b4c5429ae..cfbe25e90006 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c @@ -118,6 +118,27 @@ static void fm10k_service_event_complete(struct fm10k_intfc *interface) fm10k_service_event_schedule(interface); } +static void fm10k_stop_service_event(struct fm10k_intfc *interface) +{ + set_bit(__FM10K_SERVICE_DISABLE, interface->state); + cancel_work_sync(&interface->service_task); + + /* It's possible that cancel_work_sync stopped the service task from + * running before it could actually start. In this case the + * __FM10K_SERVICE_SCHED bit will never be cleared. Since we know that + * the service task cannot be running at this point, we need to clear + * the scheduled bit, as otherwise the service task may never be + * restarted. + */ + clear_bit(__FM10K_SERVICE_SCHED, interface->state); +} + +static void fm10k_start_service_event(struct fm10k_intfc *interface) +{ + clear_bit(__FM10K_SERVICE_DISABLE, interface->state); + fm10k_service_event_schedule(interface); +} + /** * fm10k_service_timer - Timer Call-back * @data: pointer to interface cast into an unsigned long @@ -2116,8 +2137,7 @@ static void fm10k_remove(struct pci_dev *pdev) del_timer_sync(&interface->service_timer); - set_bit(__FM10K_SERVICE_DISABLE, interface->state); - cancel_work_sync(&interface->service_task); + fm10k_stop_service_event(interface); /* free netdev, this may bounce the interrupts due to setup_tc */ if (netdev->reg_state == NETREG_REGISTERED) @@ -2155,8 +2175,7 @@ static void fm10k_prepare_suspend(struct fm10k_intfc *interface) * stopped. We stop the watchdog task until after we resume software * activity. */ - set_bit(__FM10K_SERVICE_DISABLE, interface->state); - cancel_work_sync(&interface->service_task); + fm10k_stop_service_event(interface); fm10k_prepare_for_reset(interface); } @@ -2183,9 +2202,8 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface) interface->link_down_event = jiffies + (HZ); set_bit(__FM10K_LINK_DOWN, interface->state); - /* clear the service task disable bit to allow service task to start */ - clear_bit(__FM10K_SERVICE_DISABLE, interface->state); - fm10k_service_event_schedule(interface); + /* restart the service task */ + fm10k_start_service_event(interface); return err; }