From patchwork Fri Jan 8 05:26:17 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Conn O'Griofa X-Patchwork-Id: 564603 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from arrakis.dune.hu (arrakis.dune.hu [78.24.191.176]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id EAAD91400DE for ; Fri, 8 Jan 2016 16:27:01 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=wyI4XaOP; dkim-atps=neutral Received: from arrakis.dune.hu (localhost [127.0.0.1]) by arrakis.dune.hu (Postfix) with ESMTP id 51F4228107B; Fri, 8 Jan 2016 06:25:53 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on arrakis.dune.hu X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,FREEMAIL_FROM, T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from arrakis.dune.hu (localhost [127.0.0.1]) by arrakis.dune.hu (Postfix) with ESMTP id 8E10528040D for ; Fri, 8 Jan 2016 06:25:48 +0100 (CET) X-policyd-weight: NOT_IN_SBL_XBL_SPAMHAUS=-1.5 NOT_IN_SPAMCOP=-1.5 NOT_IN_BL_NJABL=-1.5 CL_IP_EQ_HELO_IP=-2 (check from: .gmail. - helo: .mail-wm0-f48.google. - helo-domain: .google.) FROM/MX_MATCHES_HELO(DOMAIN)=-2; rate: -8.5 Received: from mail-wm0-f48.google.com (mail-wm0-f48.google.com [74.125.82.48]) by arrakis.dune.hu (Postfix) with ESMTPS for ; Fri, 8 Jan 2016 06:25:47 +0100 (CET) Received: by mail-wm0-f48.google.com with SMTP id f206so122010291wmf.0 for ; Thu, 07 Jan 2016 21:26:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:references:to:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=W9w4ai7m9sCpTvpWUohhnpWrEYLIsEA3Vfx7knElLKY=; b=wyI4XaOPBnOVvCecBpPcPwAtAOklBB5kJu023idWG1EM3FxNGq2nk6rlBdh+vSqwBw 2ESHeQn+Uw4DRx/vn+qQfmBfLtLOXCHP/ToLN44AxMTHxWZ6rHH2+/HlD4mNethljPTq iXkyOXPSx27nByRwjn+bYk7vc3WzwqSwdHUfZ18lZ2Wm2XD9Db+R93O/FAncY3psbv4F fTpfaVHIl/ghZo+iFTqkTWrUCZ02Ci8xSYChn2D+AyWQ/yxbYWwX0nOHG+A3BPRiyjYy zQ598muOO+4Kd8MNBUZlr7WD9ERuwThCm3jVUIALvKdnb/X4FiBzfroBKWdEreJwtN67 kIxQ== X-Received: by 10.194.5.227 with SMTP id v3mr77867946wjv.59.1452230778290; Thu, 07 Jan 2016 21:26:18 -0800 (PST) Received: from [192.168.1.190] ([83.141.127.50]) by smtp.gmail.com with ESMTPSA id kb5sm103812065wjc.20.2016.01.07.21.26.17 for (version=TLSv1/SSLv3 cipher=OTHER); Thu, 07 Jan 2016 21:26:17 -0800 (PST) References: <568F4778.1010803@gmail.com> To: openwrt-devel@lists.openwrt.org From: Conn O'Griofa X-Forwarded-Message-Id: <568F4778.1010803@gmail.com> Message-ID: <568F4879.600@gmail.com> Date: Fri, 8 Jan 2016 05:26:17 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <568F4778.1010803@gmail.com> Subject: [OpenWrt-Devel] [PATCH] ar71xx: check for stuck DMA on AR724x & fix sirq storm after recovery X-BeenThere: openwrt-devel@lists.openwrt.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: OpenWrt Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: openwrt-devel-bounces@lists.openwrt.org Sender: "openwrt-devel" Hi, I'm proposing the following patch to resolve ticket #18922 fully. With the current master revision, when a tx timeout condition occurs, the interface recovers successfully, but a soft irq storm occurs (causing ksoftirqd to peg the CPU, due to this goto being called without end: https://github.com/openwrt-mirror/openwrt/blob/master/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c#L1073 ). Forcing the tx and rx rings to be cleared and re-inited in ag71xx_restart_work_func seems to avoid the sirq storm, but I'd appreciate feedback on whether there's a more effective workaround. Additionally, ag71xx_check_dma_stuck *does* successfully detect the stuck DMA condition on AR7241 (TR-WL842ND v1), so enabling the check for this chipset series ensures a link adjust occurs *before* an actual tx timeout is detected. This avoids the brief network interruption that normally occurs during the DMA stuck -> tx timeout -> link adjust condition. Conn P.S. The sirq storm also occurs when ag71xx_check_dma_stuck is utilized on this chipset to avoid the tx timeout condition, so it appears that both changes are necessary (or at least, a better way to solve the sirq storm needs to be discovered). diff --git a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c index 31b38d7..8959701 100644 --- a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c +++ b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c @@ -183,6 +183,8 @@ static void ag71xx_ring_tx_init(struct ag71xx *ag) ring->curr = 0; ring->dirty = 0; netdev_reset_queue(ag->dev); + + ag71xx_wr(ag, AG71XX_REG_TX_DESC, ag->tx_ring.descs_dma); } static void ag71xx_ring_rx_clean(struct ag71xx *ag) @@ -272,6 +274,8 @@ static int ag71xx_ring_rx_init(struct ag71xx *ag) ring->curr = 0; ring->dirty = 0; + ag71xx_wr(ag, AG71XX_REG_RX_DESC, ag->rx_ring.descs_dma); + return ret; } @@ -652,9 +656,6 @@ static int ag71xx_open(struct net_device *dev) netif_carrier_off(dev); ag71xx_phy_start(ag); - ag71xx_wr(ag, AG71XX_REG_TX_DESC, ag->tx_ring.descs_dma); - ag71xx_wr(ag, AG71XX_REG_RX_DESC, ag->rx_ring.descs_dma); - ag71xx_hw_set_macaddr(ag, dev->dev_addr); netif_start_queue(dev); @@ -873,6 +874,8 @@ static void ag71xx_restart_work_func(struct work_struct *work) if (ag71xx_get_pdata(ag)->is_ar724x) { ag->link = 0; ag71xx_link_adjust(ag); + ag71xx_rings_cleanup(ag); + ag71xx_rings_init(ag); return; } @@ -919,7 +922,7 @@ static int ag71xx_tx_packets(struct ag71xx *ag, bool flush) struct sk_buff *skb = ring->buf[i].skb; if (!flush && !ag71xx_desc_empty(desc)) { - if (pdata->is_ar7240 && + if (pdata->is_ar724x && ag71xx_check_dma_stuck(ag, ring->buf[i].timestamp)) schedule_work(&ag->restart_work); break;