From patchwork Tue Jun 10 01:41:55 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Shan X-Patchwork-Id: 357662 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 81DDF1400E2 for ; Tue, 10 Jun 2014 11:42:07 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932262AbaFJBmG (ORCPT ); Mon, 9 Jun 2014 21:42:06 -0400 Received: from e23smtp08.au.ibm.com ([202.81.31.141]:55840 "EHLO e23smtp08.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932345AbaFJBmF (ORCPT ); Mon, 9 Jun 2014 21:42:05 -0400 Received: from /spool/local by e23smtp08.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 10 Jun 2014 11:42:01 +1000 Received: from d23dlp01.au.ibm.com (202.81.31.203) by e23smtp08.au.ibm.com (202.81.31.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 10 Jun 2014 11:41:59 +1000 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21]) by d23dlp01.au.ibm.com (Postfix) with ESMTP id 4650C2CE8057 for ; Tue, 10 Jun 2014 11:41:59 +1000 (EST) Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s5A1fhV111927886 for ; Tue, 10 Jun 2014 11:41:43 +1000 Received: from d23av03.au.ibm.com (localhost [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s5A1fvH8007058 for ; Tue, 10 Jun 2014 11:41:58 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s5A1fvhv007050; Tue, 10 Jun 2014 11:41:57 +1000 Received: from shangw (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with ESMTP id 6F5BAA022E; Tue, 10 Jun 2014 11:41:57 +1000 (EST) Received: by shangw (Postfix, from userid 1000) id 50CAF3E02B5; Tue, 10 Jun 2014 11:41:59 +1000 (EST) From: Gavin Shan To: kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Cc: alex.williamson@redhat.com, agraf@suse.de, benh@kernel.crashing.org, aik@ozlabs.ru, qiudayu@linux.vnet.ibm.com, Gavin Shan Subject: [PATCH v10 1/3] powerpc/eeh: Avoid event on passed PE Date: Tue, 10 Jun 2014 11:41:55 +1000 Message-Id: <1402364517-28561-2-git-send-email-gwshan@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.3.2 In-Reply-To: <1402364517-28561-1-git-send-email-gwshan@linux.vnet.ibm.com> References: <1402364517-28561-1-git-send-email-gwshan@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14061001-5140-0000-0000-000005468553 Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org We must not handle EEH error on devices which are passed to somebody else. Instead, we expect that the frozen device owner detects an EEH error and recovers from it. This avoids EEH error handling on passed through devices so the device owner gets a chance to handle them. Signed-off-by: Gavin Shan Acked-by: Alexander Graf --- arch/powerpc/include/asm/eeh.h | 7 +++++++ arch/powerpc/kernel/eeh.c | 8 ++++++++ arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 7782056..653d981 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -25,6 +25,7 @@ #include #include #include +#include struct pci_dev; struct pci_bus; @@ -84,6 +85,7 @@ struct eeh_pe { int freeze_count; /* Times of froze up */ struct timeval tstamp; /* Time on first-time freeze */ int false_positives; /* Times of reported #ff's */ + atomic_t pass_dev_cnt; /* Count of passed through devs */ struct eeh_pe *parent; /* Parent PE */ struct list_head child_list; /* Link PE to the child list */ struct list_head edevs; /* Link list of EEH devices */ @@ -93,6 +95,11 @@ struct eeh_pe { #define eeh_pe_for_each_dev(pe, edev, tmp) \ list_for_each_entry_safe(edev, tmp, &pe->edevs, list) +static inline bool eeh_pe_passed(struct eeh_pe *pe) +{ + return pe ? !!atomic_read(&pe->pass_dev_cnt) : false; +} + /* * The struct is used to trace EEH state for the associated * PCI device node or PCI device. In future, it might diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 9c6b899..3bc8b12 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev) if (ret > 0) return ret; + /* + * If the PE isn't owned by us, we shouldn't check the + * state. Instead, let the owner handle it if the PE has + * been frozen. + */ + if (eeh_pe_passed(pe)) + return 0; + /* If we already have a pending isolation event for this * slot, we know it's bad already, we don't need to check. * Do this checking under a lock; as multiple PCI devices diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index cab3e62..79193eb 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -892,7 +892,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); ret = EEH_NEXT_ERR_NONE; - } else if ((*pe)->state & EEH_PE_ISOLATED) { + } else if ((*pe)->state & EEH_PE_ISOLATED || + eeh_pe_passed(*pe)) { ret = EEH_NEXT_ERR_NONE; } else { pr_err("EEH: Frozen PHB#%x-PE#%x (%s) detected\n",