From patchwork Tue Dec 22 17:36:42 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jarek Poplawski X-Patchwork-Id: 41616 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id C15FEB6F1C for ; Wed, 23 Dec 2009 04:37:31 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754503AbZLVRgx (ORCPT ); Tue, 22 Dec 2009 12:36:53 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754497AbZLVRgw (ORCPT ); Tue, 22 Dec 2009 12:36:52 -0500 Received: from mail-fx0-f213.google.com ([209.85.220.213]:40590 "EHLO mail-fx0-f213.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753801AbZLVRgv (ORCPT ); Tue, 22 Dec 2009 12:36:51 -0500 Received: by fxm5 with SMTP id 5so6159301fxm.28 for ; Tue, 22 Dec 2009 09:36:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=ZJL1vEaxjujolWuqaqnONv2cRw6ph9sKQe4JDdrKilQ=; b=QJ/gcJskY71FCbsx0TU4bQPar16yN9pv4ZaLAHyDLeRlb3cWFlk8p2h+oOEZH9TPwX ckoyH7OCMsZPQJu2UJ3dRF/M3KWuJhjomfdjPEgEyanVDCCBPNp+9jPWnO38BRvB5tYC ZaiQ1X/gXZxEqfb12MM8XiGXFDpDOEiKnVDDM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=ffzZJ6yFDNUFmLGyfZTJ/toFeUPvMw/ZK+BPQ76kIs1yECVXcHa09XuDoiJM/Hq5pX +1qKcPfX9x+OPvbZGaLWQ3AYPajd3S4JGvE7m5FHlyyK/58FnMrudIAgUfXgunGyRxDI /LYmu8hQSeiIsPieyruXHnfCj8zk6jQJDCPkU= Received: by 10.223.27.79 with SMTP id h15mr5179203fac.23.1261503409619; Tue, 22 Dec 2009 09:36:49 -0800 (PST) Received: from del.dom.local (public71827.xdsl.centertel.pl [79.162.152.147]) by mx.google.com with ESMTPS id g28sm10222939fkg.8.2009.12.22.09.36.47 (version=SSLv3 cipher=RC4-MD5); Tue, 22 Dec 2009 09:36:48 -0800 (PST) Date: Tue, 22 Dec 2009 18:36:42 +0100 From: Jarek Poplawski To: David Miller Cc: Roger Luethi , Andrey Rahmatullin , Christian Kujau , LKML , netdev@vger.kernel.org Subject: [PATCH] net/via-rhine: Fix scheduling while atomic bugs Message-ID: <20091222173641.GA3093@del.dom.local> References: <20091222123211.GA8546@ff.dom.local> <20091222132107.GA9060@ff.dom.local> <20091222133817.GA9208@ff.dom.local> <20091222150045.GA5355@wrars-comp.wrarsdomain> <20091222152658.GA16043@core.hellgate.ch> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20091222152658.GA16043@core.hellgate.ch> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, Dec 22, 2009 at 04:26:59PM +0100, Roger Luethi wrote: > On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote: > > On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote: > > > > > It looks like napi_disable() should be illegal in ndo_tx_timeout(). > > > > > Here is a patch which moves most of the timeout work to a workqueue, > > > > > similarly to tg3 etc. It should prevent at least one of reported > > > > > bugs. Alas I can't even check-compile it at the moment, so let me > > > > > know on any problems. > > > > It seems I needlessly changed locking btw, so here it is again. > > > Hmm... On the other hand, it definitely needs at least _bh now... > > I've tried this patch. There are lots of "Transmit timed out", but no > > crashes. > > ACK. Looks like you guys tracked down the crashing and fixed it (thanks!). > I suspect we shouldn't have to reset due to timeouts that often, but that's > another story. > > Roger Thanks everybody, Jarek P. -------------------> There are BUGs "scheduling while atomic" triggered by the timer rhine_tx_timeout(). They are caused by calling napi_disable() (with msleep()). This patch fixes it by moving most of the timer content to the workqueue function (similarly to other drivers, like tg3), with spin_lock() changed to BH version. Additionally, there is spin_lock_irq() moved in rhine_close() to exclude napi_disable() etc., also tg3's way. Reported-by: Andrey Rahmatullin Tested-by: Andrey Rahmatullin Signed-off-by: Jarek Poplawski Cc: Christian Kujau Cc: Roger Luethi --- drivers/net/via-rhine.c | 41 ++++++++++++++++++++++++++++------------- 1 files changed, 28 insertions(+), 13 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c index 593e01f..125406b 100644 --- a/drivers/net/via-rhine.c +++ b/drivers/net/via-rhine.c @@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32; #include #include #include +#include #include /* Processor type for cache alignment. */ #include #include @@ -389,6 +390,7 @@ struct rhine_private { struct net_device *dev; struct napi_struct napi; spinlock_t lock; + struct work_struct reset_task; /* Frequently used values: keep some adjacent for cache effect. */ u32 quirks; @@ -407,6 +409,7 @@ struct rhine_private { static int mdio_read(struct net_device *dev, int phy_id, int location); static void mdio_write(struct net_device *dev, int phy_id, int location, int value); static int rhine_open(struct net_device *dev); +static void rhine_reset_task(struct work_struct *work); static void rhine_tx_timeout(struct net_device *dev); static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev); @@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev, dev->irq = pdev->irq; spin_lock_init(&rp->lock); + INIT_WORK(&rp->reset_task, rhine_reset_task); + rp->mii_if.dev = dev; rp->mii_if.mdio_read = mdio_read; rp->mii_if.mdio_write = mdio_write; @@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev) return 0; } -static void rhine_tx_timeout(struct net_device *dev) +static void rhine_reset_task(struct work_struct *work) { - struct rhine_private *rp = netdev_priv(dev); - void __iomem *ioaddr = rp->base; - - printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " - "%4.4x, resetting...\n", - dev->name, ioread16(ioaddr + IntrStatus), - mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + struct rhine_private *rp = container_of(work, struct rhine_private, + reset_task); + struct net_device *dev = rp->dev; /* protect against concurrent rx interrupts */ disable_irq(rp->pdev->irq); napi_disable(&rp->napi); - spin_lock(&rp->lock); + spin_lock_bh(&rp->lock); /* clear all descriptors */ free_tbufs(dev); @@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev) rhine_chip_reset(dev); init_registers(dev); - spin_unlock(&rp->lock); + spin_unlock_bh(&rp->lock); enable_irq(rp->pdev->irq); dev->trans_start = jiffies; @@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev) netif_wake_queue(dev); } +static void rhine_tx_timeout(struct net_device *dev) +{ + struct rhine_private *rp = netdev_priv(dev); + void __iomem *ioaddr = rp->base; + + printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " + "%4.4x, resetting...\n", + dev->name, ioread16(ioaddr + IntrStatus), + mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + + schedule_work(&rp->reset_task); +} + static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev) { @@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev) struct rhine_private *rp = netdev_priv(dev); void __iomem *ioaddr = rp->base; - spin_lock_irq(&rp->lock); - - netif_stop_queue(dev); napi_disable(&rp->napi); + cancel_work_sync(&rp->reset_task); + netif_stop_queue(dev); + + spin_lock_irq(&rp->lock); if (debug > 1) printk(KERN_DEBUG "%s: Shutting down ethercard, "