From patchwork Thu Sep 13 15:03:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Hemminger X-Patchwork-Id: 969442 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=networkplumber.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=networkplumber-org.20150623.gappssmtp.com header.i=@networkplumber-org.20150623.gappssmtp.com header.b="mkaT+O5T"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42B25P6H0Cz9s4V for ; Fri, 14 Sep 2018 01:05:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728320AbeIMUPW (ORCPT ); Thu, 13 Sep 2018 16:15:22 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:45944 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728012AbeIMUPW (ORCPT ); Thu, 13 Sep 2018 16:15:22 -0400 Received: by mail-pf1-f196.google.com with SMTP id i26-v6so2786181pfo.12 for ; Thu, 13 Sep 2018 08:05:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=7K6iYqxTeK4N6TYAjW8HDIqDmXZ9tNfyYVmezuUbMjU=; b=mkaT+O5TGTAtkP6Y/kaNgW09dSFPbfGH3oq8DT2mXdZqgYhXr2z85GA9Lyw10RMl1Q SUCGe6DzjfvDK7s70ruVkNeTbuv03x3D3uWG47G6EkCcqxZ0K4vCilNNvB3lJIF/o+yd azx88WJw7vLHiKClEYMamdpw3TQsHeqLhga8Ab5pVZO2sPOZA9OHpYa6tCeKWBBfKy1b 3os58Z/KJKJezeeBFpIBowxEv3/jMZFwzLSbhOUg8c7+0juhN6bCyzQSWZsz0R/o+xK7 VE70KRZvesFqMFv1w9kYceR0WYjrU2Sq6R6Wsj6lJHXYIfqtb8QiRA+iELBD7NEhH0oY rFew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=7K6iYqxTeK4N6TYAjW8HDIqDmXZ9tNfyYVmezuUbMjU=; b=R9pHcHH2Hr6OETzgAGDuHYgCKwHWAUdgUlKrn6F2Pl5U97WQxELWbVborl32pj/hCh Bi7U7LockIlKjJBC7AB1izFasPX0FL9BtdJaIjiYjgRc+OPp0Ux0EdqjKpLHSnzjSsmJ oJGBH0zQEdW0vC5fAp73bje/3sVVh4711Er87ZD8UVSrx+obwnb9sOhkVyLRVEJsZhS2 uB+ZkcGGuVRnrIXXy5nNsrd0pDmF52/OGHWbVNqCIas96Uya307ifsRn/ChF07M3haL0 +GD+GmQVDPpNQf/KCMjln5wf0qXdgyrcHk5Dk4DswKGOSyvCUp1ic03HQR1ofA5ZXFwU okaw== X-Gm-Message-State: APzg51CzJJ5+7w93/VG/SO5F6kqklnby809JrUzNqk3DMiRvR8Ou8d9d kIZOGkzuE3EvWwkr2s0IvR6J+w== X-Google-Smtp-Source: ANB0Vda4B60wo0OkGauz8UQOR58YA6TBz2WMma/fpxWKq/IFq1v/jzBQsRvZvditFsqNtIFqpF2JbA== X-Received: by 2002:a63:fa0c:: with SMTP id y12-v6mr7688799pgh.177.1536851127138; Thu, 13 Sep 2018 08:05:27 -0700 (PDT) Received: from xeon-e3.lan (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id n83-v6sm10330087pfk.19.2018.09.13.08.05.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Sep 2018 08:05:26 -0700 (PDT) From: Stephen Hemminger X-Google-Original-From: Stephen Hemminger To: kys@microsoft.com, haiyangz@microsoft.com Cc: netdev@vger.kernel.org, Stephen Hemminger Subject: [PATCH] hv_netvsc: fix schedule in RCU context Date: Thu, 13 Sep 2018 08:03:43 -0700 Message-Id: <20180913150342.32101-1-sthemmin@microsoft.com> X-Mailer: git-send-email 2.18.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When netvsc device is removed it can call reschedule in RCU context. This happens because canceling the subchannel setup work could (in theory) cause a reschedule when manipulating the timer. To reproduce, run with lockdep enabled kernel and unbind a network device from hv_netvsc (via sysfs). [ 160.682011] WARNING: suspicious RCU usage [ 160.707466] 4.19.0-rc3-uio+ #2 Not tainted [ 160.709937] ----------------------------- [ 160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 160.723691] [ 160.723691] other info that might help us debug this: [ 160.723691] [ 160.730955] [ 160.730955] rcu_scheduler_active = 2, debug_locks = 1 [ 160.762813] 5 locks held by rebind-eth.sh/1812: [ 160.766851] #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0 [ 160.773416] #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0 [ 160.783766] #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0 [ 160.787465] #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250 [ 160.816987] #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.828629] [ 160.828629] stack backtrace: [ 160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2 [ 160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 160.832952] Call Trace: [ 160.832952] dump_stack+0x85/0xcb [ 160.832952] ___might_sleep+0x1a3/0x240 [ 160.832952] __flush_work+0x57/0x2e0 [ 160.832952] ? __mutex_lock+0x83/0x990 [ 160.832952] ? __kernfs_remove+0x24f/0x2e0 [ 160.832952] ? __kernfs_remove+0x1b2/0x2e0 [ 160.832952] ? mark_held_locks+0x50/0x80 [ 160.832952] ? get_work_pool+0x90/0x90 [ 160.832952] __cancel_work_timer+0x13c/0x1e0 [ 160.832952] ? netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.832952] ? __lock_is_held+0x55/0x90 [ 160.832952] netvsc_remove+0x9a/0x250 [hv_netvsc] [ 160.832952] vmbus_remove+0x26/0x30 [ 160.832952] device_release_driver_internal+0x18a/0x250 [ 160.832952] unbind_store+0xb4/0x180 [ 160.832952] kernfs_fop_write+0x113/0x1a0 [ 160.832952] __vfs_write+0x36/0x1a0 [ 160.832952] ? rcu_read_lock_sched_held+0x6b/0x80 [ 160.832952] ? rcu_sync_lockdep_assert+0x2e/0x60 [ 160.832952] ? __sb_start_write+0x141/0x1a0 [ 160.832952] ? vfs_write+0x184/0x1b0 [ 160.832952] vfs_write+0xbe/0x1b0 [ 160.832952] ksys_write+0x55/0xc0 [ 160.832952] do_syscall_64+0x60/0x1b0 [ 160.832952] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 160.832952] RIP: 0033:0x7fe48f4c8154 Resolve this by getting RTNL earlier. This is safe because the subchannel work queue does trylock on RTNL and will detect the race. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Stephen Hemminger Reviewed-by: Haiyang Zhang --- drivers/net/hyperv/netvsc_drv.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 70921bbe0e28..915fbd66a02b 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2272,17 +2272,15 @@ static int netvsc_remove(struct hv_device *dev) cancel_delayed_work_sync(&ndev_ctx->dwork); - rcu_read_lock(); - nvdev = rcu_dereference(ndev_ctx->nvdev); - - if (nvdev) + rtnl_lock(); + nvdev = rtnl_dereference(ndev_ctx->nvdev); + if (nvdev) cancel_work_sync(&nvdev->subchan_work); /* * Call to the vsc driver to let it know that the device is being * removed. Also blocks mtu and channel changes. */ - rtnl_lock(); vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev); if (vf_netdev) netvsc_unregister_vf(vf_netdev); @@ -2294,7 +2292,6 @@ static int netvsc_remove(struct hv_device *dev) list_del(&ndev_ctx->list); rtnl_unlock(); - rcu_read_unlock(); hv_set_drvdata(dev, NULL);