diff mbox series

[4/5,V4] PCI: only return true when dev io state is really changed

Message ID 20200927082736.14633-5-haifeng.zhao@intel.com
State New
Headers show
Series Fix DPC hotplug race and enhance error handling | expand

Commit Message

Ethan Zhao Sept. 27, 2020, 8:27 a.m. UTC
When uncorrectable error happens, AER driver and DPC driver interrupt
handlers likely call

   pcie_do_recovery()
   ->pci_walk_bus()
     ->report_frozen_detected()

with pci_channel_io_frozen the same time.
   If pci_dev_set_io_state() return true even if the original state is
pci_channel_io_frozen, that will cause AER or DPC handler re-enter
the error detecting and recovery procedure one after another.
   The result is the recovery flow mixed between AER and DPC.
So simplify the pci_dev_set_io_state() function to only return true
when dev->error_state is changed.

Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com>
Tested-by: Wen Jin <wen.jin@intel.com>
Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Reviewed-by: Joe Perches <joe@perches.com>
---
Changnes:
 V2: revise description and code according to suggestion from Andy.
 V3: change code to simpler.
 V4: no change.
 
 drivers/pci/pci.h | 37 +++++--------------------------------
 1 file changed, 5 insertions(+), 32 deletions(-)

Comments

Joe Perches Sept. 27, 2020, 9:14 a.m. UTC | #1
On Sun, 2020-09-27 at 04:27 -0400, Ethan Zhao wrote:
> When uncorrectable error happens, AER driver and DPC driver interrupt
> handlers likely call
> 
>    pcie_do_recovery()
>    ->pci_walk_bus()
>      ->report_frozen_detected()
> 
> with pci_channel_io_frozen the same time.
>    If pci_dev_set_io_state() return true even if the original state is
> pci_channel_io_frozen, that will cause AER or DPC handler re-enter
> the error detecting and recovery procedure one after another.
>    The result is the recovery flow mixed between AER and DPC.
> So simplify the pci_dev_set_io_state() function to only return true
> when dev->error_state is changed.
> 
> Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com>
> Tested-by: Wen Jin <wen.jin@intel.com>
> Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> Reviewed-by: Joe Perches <joe@perches.com>

Hi Ethan/Haifeng.

Like Andy, I did not "review" this patch and sign it.
I merely suggested another simplification.
Please do not add -by: lines unless actually received by you.
Ethan Zhao Sept. 28, 2020, 1:47 a.m. UTC | #2
Sorry for that offence, I should ask for your permission. 

-----Original Message-----
From: Joe Perches <joe@perches.com> 
Sent: Sunday, September 27, 2020 5:14 PM
To: Zhao, Haifeng <haifeng.zhao@intel.com>; bhelgaas@google.com; oohall@gmail.com; ruscur@russell.cc; lukas@wunner.de; andriy.shevchenko@linux.intel.com; stuart.w.hayes@gmail.com; mr.nuke.me@gmail.com; mika.westerberg@linux.intel.com
Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Jia, Pei P <pei.p.jia@intel.com>; ashok.raj@linux.intel.com; Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@intel.com>; hch@infradead.org
Subject: Re: [PATCH 4/5 V4] PCI: only return true when dev io state is really changed

On Sun, 2020-09-27 at 04:27 -0400, Ethan Zhao wrote:
> When uncorrectable error happens, AER driver and DPC driver interrupt 
> handlers likely call
> 
>    pcie_do_recovery()
>    ->pci_walk_bus()
>      ->report_frozen_detected()
> 
> with pci_channel_io_frozen the same time.
>    If pci_dev_set_io_state() return true even if the original state is 
> pci_channel_io_frozen, that will cause AER or DPC handler re-enter the 
> error detecting and recovery procedure one after another.
>    The result is the recovery flow mixed between AER and DPC.
> So simplify the pci_dev_set_io_state() function to only return true 
> when dev->error_state is changed.
> 
> Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com>
> Tested-by: Wen Jin <wen.jin@intel.com>
> Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> Reviewed-by: Joe Perches <joe@perches.com>

Hi Ethan/Haifeng.

Like Andy, I did not "review" this patch and sign it.
I merely suggested another simplification.
Please do not add -by: lines unless actually received by you.
diff mbox series

Patch

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fa12f7cbc1a0..a2c1c7d5f494 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -359,39 +359,12 @@  struct pci_sriov {
 static inline bool pci_dev_set_io_state(struct pci_dev *dev,
 					pci_channel_state_t new)
 {
-	bool changed = false;
-
 	device_lock_assert(&dev->dev);
-	switch (new) {
-	case pci_channel_io_perm_failure:
-		switch (dev->error_state) {
-		case pci_channel_io_frozen:
-		case pci_channel_io_normal:
-		case pci_channel_io_perm_failure:
-			changed = true;
-			break;
-		}
-		break;
-	case pci_channel_io_frozen:
-		switch (dev->error_state) {
-		case pci_channel_io_frozen:
-		case pci_channel_io_normal:
-			changed = true;
-			break;
-		}
-		break;
-	case pci_channel_io_normal:
-		switch (dev->error_state) {
-		case pci_channel_io_frozen:
-		case pci_channel_io_normal:
-			changed = true;
-			break;
-		}
-		break;
-	}
-	if (changed)
-		dev->error_state = new;
-	return changed;
+	if (dev->error_state == new)
+		return false;
+
+	dev->error_state = new;
+	return true;
 }
 
 static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)