[5/5] powerpc/eeh_sysfs: Make clearing EEH_DEV_SYSFS saner
diff mbox series

Message ID 20190715085612.8802-6-oohall@gmail.com
State New
Headers show
Series
  • [1/5] powerpc/eeh_cache: Don't use pci_dn when inserting new ranges
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 79 lines checked
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (f5c20693d8edcd665f1159dc941b9e7f87c17647)

Commit Message

Oliver O'Halloran July 15, 2019, 8:56 a.m. UTC
The eeh_sysfs_remove_device() function is supposed to clear the
EEH_DEV_SYSFS flag since it indicates the EEH sysfs entries have been added
for a pci_dev.

When the sysfs files are removed eeh_remove_device() the eeh_dev and the
pci_dev have already been de-associated. This then causes the
pci_dev_to_eeh_dev() call in eeh_sysfs_remove_device() to return NULL so
the flag can't be cleared from the still-live eeh_dev. This problem is
worked around in the caller by clearing the flag manually. However, this
behaviour doesn't make a whole lot of sense, so this patch fixes it by:

a) Re-ordering eeh_remove_device() so that eeh_sysfs_remove_device() is
   called before de-associating the pci_dev and eeh_dev.

b) Making eeh_sysfs_remove_device() emit a warning if there's no
   corresponding eeh_dev for a pci_dev. The paths where the sysfs
   files are only reachable if EEH was setup for the device
   for the device in the first place so hitting this warning
   indicates a programming error.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/kernel/eeh.c       | 30 +++++++++++++++++-------------
 arch/powerpc/kernel/eeh_sysfs.c | 15 ++++++++-------
 2 files changed, 25 insertions(+), 20 deletions(-)

Comments

Sam Bobroff July 16, 2019, 4 a.m. UTC | #1
On Mon, Jul 15, 2019 at 06:56:12PM +1000, Oliver O'Halloran wrote:
> The eeh_sysfs_remove_device() function is supposed to clear the
> EEH_DEV_SYSFS flag since it indicates the EEH sysfs entries have been added
> for a pci_dev.
> 
> When the sysfs files are removed eeh_remove_device() the eeh_dev and the
> pci_dev have already been de-associated. This then causes the
> pci_dev_to_eeh_dev() call in eeh_sysfs_remove_device() to return NULL so
> the flag can't be cleared from the still-live eeh_dev. This problem is
> worked around in the caller by clearing the flag manually. However, this
> behaviour doesn't make a whole lot of sense, so this patch fixes it by:
> 
> a) Re-ordering eeh_remove_device() so that eeh_sysfs_remove_device() is
>    called before de-associating the pci_dev and eeh_dev.
> 
> b) Making eeh_sysfs_remove_device() emit a warning if there's no
>    corresponding eeh_dev for a pci_dev. The paths where the sysfs
>    files are only reachable if EEH was setup for the device
>    for the device in the first place so hitting this warning
>    indicates a programming error.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Good cleanup, although it looks like "for the device" got duplicated in
the last part of the commit message.

Simple EEH tests still succeeed.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>

> +	if (!(edev->pe->state & EEH_PE_KEEP))
> +		eeh_rmv_from_parent_pe(edev);
> +	else
> +		edev->mode |= EEH_DEV_DISCONNECTED;

> ---
>  arch/powerpc/kernel/eeh.c       | 30 +++++++++++++++++-------------
>  arch/powerpc/kernel/eeh_sysfs.c | 15 ++++++++-------
>  2 files changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index f192d57..6e24896 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1203,7 +1203,6 @@ void eeh_add_device_late(struct pci_dev *dev)
>  		eeh_rmv_from_parent_pe(edev);
>  		eeh_addr_cache_rmv_dev(edev->pdev);
>  		eeh_sysfs_remove_device(edev->pdev);
> -		edev->mode &= ~EEH_DEV_SYSFS;
>  
>  		/*
>  		 * We definitely should have the PCI device removed
> @@ -1306,17 +1305,11 @@ void eeh_remove_device(struct pci_dev *dev)
>  	edev->pdev = NULL;
>  
>  	/*
> -	 * The flag "in_error" is used to trace EEH devices for VFs
> -	 * in error state or not. It's set in eeh_report_error(). If
> -	 * it's not set, eeh_report_{reset,resume}() won't be called
> -	 * for the VF EEH device.
> +	 * eeh_sysfs_remove_device() uses pci_dev_to_eeh_dev() so we need to
> +	 * remove the sysfs files before clearing dev.archdata.edev
>  	 */
> -	edev->in_error = false;
> -	dev->dev.archdata.edev = NULL;
> -	if (!(edev->pe->state & EEH_PE_KEEP))
> -		eeh_rmv_from_parent_pe(edev);
> -	else
> -		edev->mode |= EEH_DEV_DISCONNECTED;
> +	if (edev->mode & EEH_DEV_SYSFS)
> +		eeh_sysfs_remove_device(dev);
>  
>  	/*
>  	 * We're removing from the PCI subsystem, that means
> @@ -1327,8 +1320,19 @@ void eeh_remove_device(struct pci_dev *dev)
>  	edev->mode |= EEH_DEV_NO_HANDLER;
>  
>  	eeh_addr_cache_rmv_dev(dev);
> -	eeh_sysfs_remove_device(dev);
> -	edev->mode &= ~EEH_DEV_SYSFS;
> +
> +	/*
> +	 * The flag "in_error" is used to trace EEH devices for VFs
> +	 * in error state or not. It's set in eeh_report_error(). If
> +	 * it's not set, eeh_report_{reset,resume}() won't be called
> +	 * for the VF EEH device.
> +	 */
> +	edev->in_error = false;
> +	dev->dev.archdata.edev = NULL;
> +	if (!(edev->pe->state & EEH_PE_KEEP))
> +		eeh_rmv_from_parent_pe(edev);
> +	else
> +		edev->mode |= EEH_DEV_DISCONNECTED;
>  }
>  
>  int eeh_unfreeze_pe(struct eeh_pe *pe)
> diff --git a/arch/powerpc/kernel/eeh_sysfs.c b/arch/powerpc/kernel/eeh_sysfs.c
> index c4cc8fc..5614fd83 100644
> --- a/arch/powerpc/kernel/eeh_sysfs.c
> +++ b/arch/powerpc/kernel/eeh_sysfs.c
> @@ -175,22 +175,23 @@ void eeh_sysfs_remove_device(struct pci_dev *pdev)
>  {
>  	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
>  
> +	if (!edev) {
> +		WARN_ON(eeh_enabled());
> +		return;
> +	}
> +
> +	edev->mode &= ~EEH_DEV_SYSFS;
> +
>  	/*
>  	 * The parent directory might have been removed. We needn't
>  	 * continue for that case.
>  	 */
> -	if (!pdev->dev.kobj.sd) {
> -		if (edev)
> -			edev->mode &= ~EEH_DEV_SYSFS;
> +	if (!pdev->dev.kobj.sd)
>  		return;
> -	}
>  
>  	device_remove_file(&pdev->dev, &dev_attr_eeh_mode);
>  	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
>  	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_state);
>  
>  	eeh_notify_resume_remove(pdev);
> -
> -	if (edev)
> -		edev->mode &= ~EEH_DEV_SYSFS;
>  }
> -- 
> 2.9.5
>

Patch
diff mbox series

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f192d57..6e24896 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1203,7 +1203,6 @@  void eeh_add_device_late(struct pci_dev *dev)
 		eeh_rmv_from_parent_pe(edev);
 		eeh_addr_cache_rmv_dev(edev->pdev);
 		eeh_sysfs_remove_device(edev->pdev);
-		edev->mode &= ~EEH_DEV_SYSFS;
 
 		/*
 		 * We definitely should have the PCI device removed
@@ -1306,17 +1305,11 @@  void eeh_remove_device(struct pci_dev *dev)
 	edev->pdev = NULL;
 
 	/*
-	 * The flag "in_error" is used to trace EEH devices for VFs
-	 * in error state or not. It's set in eeh_report_error(). If
-	 * it's not set, eeh_report_{reset,resume}() won't be called
-	 * for the VF EEH device.
+	 * eeh_sysfs_remove_device() uses pci_dev_to_eeh_dev() so we need to
+	 * remove the sysfs files before clearing dev.archdata.edev
 	 */
-	edev->in_error = false;
-	dev->dev.archdata.edev = NULL;
-	if (!(edev->pe->state & EEH_PE_KEEP))
-		eeh_rmv_from_parent_pe(edev);
-	else
-		edev->mode |= EEH_DEV_DISCONNECTED;
+	if (edev->mode & EEH_DEV_SYSFS)
+		eeh_sysfs_remove_device(dev);
 
 	/*
 	 * We're removing from the PCI subsystem, that means
@@ -1327,8 +1320,19 @@  void eeh_remove_device(struct pci_dev *dev)
 	edev->mode |= EEH_DEV_NO_HANDLER;
 
 	eeh_addr_cache_rmv_dev(dev);
-	eeh_sysfs_remove_device(dev);
-	edev->mode &= ~EEH_DEV_SYSFS;
+
+	/*
+	 * The flag "in_error" is used to trace EEH devices for VFs
+	 * in error state or not. It's set in eeh_report_error(). If
+	 * it's not set, eeh_report_{reset,resume}() won't be called
+	 * for the VF EEH device.
+	 */
+	edev->in_error = false;
+	dev->dev.archdata.edev = NULL;
+	if (!(edev->pe->state & EEH_PE_KEEP))
+		eeh_rmv_from_parent_pe(edev);
+	else
+		edev->mode |= EEH_DEV_DISCONNECTED;
 }
 
 int eeh_unfreeze_pe(struct eeh_pe *pe)
diff --git a/arch/powerpc/kernel/eeh_sysfs.c b/arch/powerpc/kernel/eeh_sysfs.c
index c4cc8fc..5614fd83 100644
--- a/arch/powerpc/kernel/eeh_sysfs.c
+++ b/arch/powerpc/kernel/eeh_sysfs.c
@@ -175,22 +175,23 @@  void eeh_sysfs_remove_device(struct pci_dev *pdev)
 {
 	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
 
+	if (!edev) {
+		WARN_ON(eeh_enabled());
+		return;
+	}
+
+	edev->mode &= ~EEH_DEV_SYSFS;
+
 	/*
 	 * The parent directory might have been removed. We needn't
 	 * continue for that case.
 	 */
-	if (!pdev->dev.kobj.sd) {
-		if (edev)
-			edev->mode &= ~EEH_DEV_SYSFS;
+	if (!pdev->dev.kobj.sd)
 		return;
-	}
 
 	device_remove_file(&pdev->dev, &dev_attr_eeh_mode);
 	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
 	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_state);
 
 	eeh_notify_resume_remove(pdev);
-
-	if (edev)
-		edev->mode &= ~EEH_DEV_SYSFS;
 }