From patchwork Fri Jun 14 21:15:06 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nithin Sujir X-Patchwork-Id: 251537 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3CE152C007A for ; Sat, 15 Jun 2013 07:15:24 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751837Ab3FNVPT (ORCPT ); Fri, 14 Jun 2013 17:15:19 -0400 Received: from mms2.broadcom.com ([216.31.210.18]:4203 "EHLO mms2.broadcom.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751294Ab3FNVPS (ORCPT ); Fri, 14 Jun 2013 17:15:18 -0400 Received: from [10.9.208.57] by mms2.broadcom.com with ESMTP (Broadcom SMTP Relay (Email Firewall v6.5)); Fri, 14 Jun 2013 14:09:25 -0700 X-Server-Uuid: 4500596E-606A-40F9-852D-14843D8201B2 Received: from IRVEXCHSMTP2.corp.ad.broadcom.com (10.9.207.52) by IRVEXCHCAS08.corp.ad.broadcom.com (10.9.208.57) with Microsoft SMTP Server (TLS) id 14.1.438.0; Fri, 14 Jun 2013 14:15:08 -0700 Received: from mail-irva-13.broadcom.com (10.10.10.20) by IRVEXCHSMTP2.corp.ad.broadcom.com (10.9.207.52) with Microsoft SMTP Server id 14.1.438.0; Fri, 14 Jun 2013 14:15:08 -0700 Received: from dl1.broadcom.com (unknown [10.13.104.170]) by mail-irva-13.broadcom.com (Postfix) with ESMTP id D8B27F2DE9; Fri, 14 Jun 2013 14:15:07 -0700 (PDT) From: "Nithin Nayak Sujir" To: davem@davemloft.net cc: netdev@vger.kernel.org, "Michael Chan" , "Nithin Nayak Sujir" Subject: [PATCH net-next] tg3: Prevent system hang during repeated EEH errors. Date: Fri, 14 Jun 2013 14:15:06 -0700 Message-ID: <1371244506-18969-1-git-send-email-nsujir@broadcom.com> X-Mailer: git-send-email 1.8.1.4 MIME-Version: 1.0 X-WSS-ID: 7DA5590F1R031937901-01-01 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Michael Chan The current tg3 code assumes the pci_error_handlers to be always called in sequence. In particular, during ->error_detected(), NAPI is disabled and the device is shutdown. The device is later reset and NAPI re-enabled in ->slot_reset() and ->resume(). In EEH, if more than 6 errors are detected in a hour, only ->error_detected() will be called. This will leave the driver in an inconsistent state as NAPI is disabled but netif_running state is still true. When the device is later closed, we'll try to disable NAPI again and it will loop forever. We fix this by closing the device if we encounter any error conditions during the normal sequence of the pci_error_handlers. Signed-off-by: Michael Chan Signed-off-by: Nithin Nayak Sujir --- drivers/net/ethernet/broadcom/tg3.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 28a645f..bfe1831 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -17747,10 +17747,13 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev, tg3_full_unlock(tp); done: - if (state == pci_channel_io_perm_failure) + if (state == pci_channel_io_perm_failure) { + tg3_napi_enable(tp); + dev_close(netdev); err = PCI_ERS_RESULT_DISCONNECT; - else + } else { pci_disable_device(pdev); + } rtnl_unlock(); @@ -17796,6 +17799,10 @@ static pci_ers_result_t tg3_io_slot_reset(struct pci_dev *pdev) rc = PCI_ERS_RESULT_RECOVERED; done: + if (rc != PCI_ERS_RESULT_RECOVERED && netif_running(netdev)) { + tg3_napi_enable(tp); + dev_close(netdev); + } rtnl_unlock(); return rc; @@ -17826,6 +17833,8 @@ static void tg3_io_resume(struct pci_dev *pdev) if (err) { tg3_full_unlock(tp); netdev_err(netdev, "Cannot restart hardware after reset.\n"); + tg3_napi_enable(tp); + dev_close(netdev); goto done; }