From patchwork Wed Aug 7 01:23:54 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amerigo Wang X-Patchwork-Id: 265300 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id A7F2C2C00A4 for ; Wed, 7 Aug 2013 11:24:12 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756723Ab3HGBYI (ORCPT ); Tue, 6 Aug 2013 21:24:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30414 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756654Ab3HGBYH (ORCPT ); Tue, 6 Aug 2013 21:24:07 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r771O6Li009994 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 6 Aug 2013 21:24:06 -0400 Received: from [10.66.115.118] (vpn1-115-118.nay.redhat.com [10.66.115.118]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r771Nuah024530 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 6 Aug 2013 21:24:03 -0400 Message-ID: <1375838634.11370.13.camel@cr0> Subject: A soft lockup in vxlan module From: Cong Wang To: Stephen Hemminger Cc: netdev@vger.kernel.org Date: Wed, 07 Aug 2013 09:23:54 +0800 Mime-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi, Stephen You introduced a soft lockup in vxlan module in commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8 Author: stephen hemminger Date: Sat Jul 13 10:18:18 2013 -0700 vxlan: add necessary locking on device removal The problem is that vxlan_dellink(), which is called with RTNL lock held, tries to flush the workqueue synchronously, but apparently igmp_join and igmp_leave work need to hold RTNL lock too, therefore we have a soft lockup! This is 100% reproducible on my 2.6.32 backport while running `modprobe -r vxlan`. A quick but perhaps ugly fix is just releasing RTNL lock before calling flush_workqueue(): hlist_del_rcu(&vxlan->hlist); However, I think a better way is still what I did, that is, removing RTNL lock from ip_mc_join_group() and ip_mc_leave_group(). What do you think? Any other idea to fix it? Thanks. --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 8bf31d9..581d3d5 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1837,7 +1837,9 @@ static void vxlan_dellink(struct net_device *dev, struct list_head *head) struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); struct vxlan_dev *vxlan = netdev_priv(dev); + rtnl_unlock(); flush_workqueue(vxlan_wq); + rtnl_lock(); spin_lock(&vn->sock_lock);