diff mbox series

PCI: rework error checking in the reset path

Message ID 1508794608-15310-1-git-send-email-okaya@codeaurora.org
State Not Applicable
Headers show
Series PCI: rework error checking in the reset path | expand

Commit Message

Sinan Kaya Oct. 23, 2017, 9:36 p.m. UTC
The return codes from various reset types are not consistent. The code is
assuming that all reset types will return -ENOTTY when things go wrong.
Instead of relying on negative error status, let's bail out if the
operation is successful instead.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/pci.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Bjorn Helgaas Oct. 25, 2017, 1:45 p.m. UTC | #1
[+cc Alex]

On Mon, Oct 23, 2017 at 05:36:48PM -0400, Sinan Kaya wrote:
> The return codes from various reset types are not consistent. The code is
> assuming that all reset types will return -ENOTTY when things go wrong.
> Instead of relying on negative error status, let's bail out if the
> operation is successful instead.

I like this (no surprise since I suggested something similar at
http://lkml.kernel.org/r/20171011210057.GU25517@bhelgaas-glaptop.roam.corp.google.com),
but I'd like Alex's opinion before merging it.

Previously, we only tried the next reset method if one method failed
with -ENOTTY.  With this patch, we'll try the next reset method if one
method fails for any reason, not just -ENOTTY.

> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/pci/pci.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 6078dfc..a753e07 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4200,20 +4200,20 @@ int __pci_reset_function_locked(struct pci_dev *dev)
>  	might_sleep();
>  
>  	rc = pci_dev_specific_reset(dev, 0);
> -	if (rc != -ENOTTY)
> +	if (!rc)
>  		return rc;
>  	if (pcie_has_flr(dev)) {
>  		pcie_flr(dev);
>  		return 0;
>  	}
>  	rc = pci_af_flr(dev, 0);
> -	if (rc != -ENOTTY)
> +	if (!rc)
>  		return rc;
>  	rc = pci_pm_reset(dev, 0);
> -	if (rc != -ENOTTY)
> +	if (!rc)
>  		return rc;
>  	rc = pci_dev_reset_slot_function(dev, 0);
> -	if (rc != -ENOTTY)
> +	if (!rc)
>  		return rc;
>  	return pci_parent_bus_reset(dev, 0);
>  }
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Alex Williamson Oct. 25, 2017, 9:28 p.m. UTC | #2
On Wed, 25 Oct 2017 08:45:11 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Alex]
> 
> On Mon, Oct 23, 2017 at 05:36:48PM -0400, Sinan Kaya wrote:
> > The return codes from various reset types are not consistent. The code is
> > assuming that all reset types will return -ENOTTY when things go wrong.
> > Instead of relying on negative error status, let's bail out if the
> > operation is successful instead.  
> 
> I like this (no surprise since I suggested something similar at
> http://lkml.kernel.org/r/20171011210057.GU25517@bhelgaas-glaptop.roam.corp.google.com),
> but I'd like Alex's opinion before merging it.
> 
> Previously, we only tried the next reset method if one method failed
> with -ENOTTY.  With this patch, we'll try the next reset method if one
> method fails for any reason, not just -ENOTTY.

Hmm, I thought the return codes were pretty consistent.  -ENOTTY means
that the reset callback doesn't handle the device, move on.  Many
ioctls use the same return code to indicate an unknown ioctl.  This
allows us to differentiate success vs error vs unhandled.  In the code
below we lose the ability to, for instance, have a device specific
reset that returns -EINVAL to prevent the PCI core for triggering
further reset mechanisms which might be broken on the device.  So, I
don't see that this patch specifically fixes anything, but it does
remove what seems like useful functionality...  I'd veto it.  Thanks,

Alex
 
> > Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> > ---
> >  drivers/pci/pci.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 6078dfc..a753e07 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4200,20 +4200,20 @@ int __pci_reset_function_locked(struct pci_dev *dev)
> >  	might_sleep();
> >  
> >  	rc = pci_dev_specific_reset(dev, 0);
> > -	if (rc != -ENOTTY)
> > +	if (!rc)
> >  		return rc;
> >  	if (pcie_has_flr(dev)) {
> >  		pcie_flr(dev);
> >  		return 0;
> >  	}
> >  	rc = pci_af_flr(dev, 0);
> > -	if (rc != -ENOTTY)
> > +	if (!rc)
> >  		return rc;
> >  	rc = pci_pm_reset(dev, 0);
> > -	if (rc != -ENOTTY)
> > +	if (!rc)
> >  		return rc;
> >  	rc = pci_dev_reset_slot_function(dev, 0);
> > -	if (rc != -ENOTTY)
> > +	if (!rc)
> >  		return rc;
> >  	return pci_parent_bus_reset(dev, 0);
> >  }
> > -- 
> > 1.9.1
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Sinan Kaya Oct. 25, 2017, 9:42 p.m. UTC | #3
On 10/25/2017 5:28 PM, Alex Williamson wrote:
>> Previously, we only tried the next reset method if one method failed
>> with -ENOTTY.  With this patch, we'll try the next reset method if one
>> method fails for any reason, not just -ENOTTY.
> Hmm, I thought the return codes were pretty consistent.  -ENOTTY means
> that the reset callback doesn't handle the device, move on.  Many
> ioctls use the same return code to indicate an unknown ioctl.  This
> allows us to differentiate success vs error vs unhandled.  In the code
> below we lose the ability to, for instance, have a device specific
> reset that returns -EINVAL to prevent the PCI core for triggering
> further reset mechanisms which might be broken on the device.  So, I
> don't see that this patch specifically fixes anything, but it does
> remove what seems like useful functionality...  I'd veto it.  Thanks,
> 

OK, It was not obvious how the EINVAL and ENOTTY used by code inspection.

Thank your very much for the clarification. I'm dropping the patch unless
Bjorn has another idea.

> Alex
>
Bjorn Helgaas Oct. 25, 2017, 10:10 p.m. UTC | #4
On Wed, Oct 25, 2017 at 11:28:05PM +0200, Alex Williamson wrote:
> On Wed, 25 Oct 2017 08:45:11 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > [+cc Alex]
> > 
> > On Mon, Oct 23, 2017 at 05:36:48PM -0400, Sinan Kaya wrote:
> > > The return codes from various reset types are not consistent. The code is
> > > assuming that all reset types will return -ENOTTY when things go wrong.
> > > Instead of relying on negative error status, let's bail out if the
> > > operation is successful instead.  
> > 
> > I like this (no surprise since I suggested something similar at
> > http://lkml.kernel.org/r/20171011210057.GU25517@bhelgaas-glaptop.roam.corp.google.com),
> > but I'd like Alex's opinion before merging it.
> > 
> > Previously, we only tried the next reset method if one method failed
> > with -ENOTTY.  With this patch, we'll try the next reset method if one
> > method fails for any reason, not just -ENOTTY.
> 
> Hmm, I thought the return codes were pretty consistent.  -ENOTTY means
> that the reset callback doesn't handle the device, move on.  Many
> ioctls use the same return code to indicate an unknown ioctl.  This
> allows us to differentiate success vs error vs unhandled.  In the code
> below we lose the ability to, for instance, have a device specific
> reset that returns -EINVAL to prevent the PCI core for triggering
> further reset mechanisms which might be broken on the device.  So, I
> don't see that this patch specifically fixes anything, but it does
> remove what seems like useful functionality...  I'd veto it.  Thanks,

I didn't understand the  intention of -EINVAL vs -ENOTTY, so
that might be a reasonable argument.  The knowledge about mechanisms
being broken on a specific device seems like it would belong in
pci_dev_specific_reset() and not really applicable to other methods,
though.

But I'm not sure the current usage makes a lot of sense.  The only
places I found that return an error other than -ENOTTY are
reset_ivb_igd() and pci_pm_reset().  In reset_ivb_igd(), we return
-ENOMEM if an ioremap() fails.  That's not a case of "other reset
mechanisms are broken and we shouldn't try them."

pci_pm_reset() returns -EINVAL if the device is not in D0.  Maybe it
makes sense to not try any other reset methods in that case, but I
really don't know.

If we leave it as-is, maybe a comment like the following would be
useful.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f0d68066c726..2c98f309bc8a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4170,6 +4170,13 @@ int __pci_reset_function_locked(struct pci_dev *dev)
 
 	might_sleep();
 
+	/*
+	 * Reset method return values:
+	 *   0:		    Device was successfully reset
+	 *   -ENOTTY:	    Method doesn't support resetting this device;
+	 *		    try the next method
+	 *   anything else: Reset failed; don't try any other mechanisms
+	 */
 	rc = pci_dev_specific_reset(dev, 0);
 	if (rc != -ENOTTY)
 		return rc;
Alex Williamson Oct. 25, 2017, 10:34 p.m. UTC | #5
On Wed, 25 Oct 2017 17:10:46 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Wed, Oct 25, 2017 at 11:28:05PM +0200, Alex Williamson wrote:
> > On Wed, 25 Oct 2017 08:45:11 -0500
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >   
> > > [+cc Alex]
> > > 
> > > On Mon, Oct 23, 2017 at 05:36:48PM -0400, Sinan Kaya wrote:  
> > > > The return codes from various reset types are not consistent. The code is
> > > > assuming that all reset types will return -ENOTTY when things go wrong.
> > > > Instead of relying on negative error status, let's bail out if the
> > > > operation is successful instead.    
> > > 
> > > I like this (no surprise since I suggested something similar at
> > > http://lkml.kernel.org/r/20171011210057.GU25517@bhelgaas-glaptop.roam.corp.google.com),
> > > but I'd like Alex's opinion before merging it.
> > > 
> > > Previously, we only tried the next reset method if one method failed
> > > with -ENOTTY.  With this patch, we'll try the next reset method if one
> > > method fails for any reason, not just -ENOTTY.  
> > 
> > Hmm, I thought the return codes were pretty consistent.  -ENOTTY means
> > that the reset callback doesn't handle the device, move on.  Many
> > ioctls use the same return code to indicate an unknown ioctl.  This
> > allows us to differentiate success vs error vs unhandled.  In the code
> > below we lose the ability to, for instance, have a device specific
> > reset that returns -EINVAL to prevent the PCI core for triggering
> > further reset mechanisms which might be broken on the device.  So, I
> > don't see that this patch specifically fixes anything, but it does
> > remove what seems like useful functionality...  I'd veto it.  Thanks,  
> 
> I didn't understand the  intention of -EINVAL vs -ENOTTY, so
> that might be a reasonable argument.  The knowledge about mechanisms
> being broken on a specific device seems like it would belong in
> pci_dev_specific_reset() and not really applicable to other methods,
> though.
> 
> But I'm not sure the current usage makes a lot of sense.  The only
> places I found that return an error other than -ENOTTY are
> reset_ivb_igd() and pci_pm_reset().  In reset_ivb_igd(), we return
> -ENOMEM if an ioremap() fails.  That's not a case of "other reset
> mechanisms are broken and we shouldn't try them."

Well, by the fact that we have a device specific reset here, we can
probably deduce that the standard reset mechanisms do not work or are
undesirable for some reason.  Therefore if we cannot perform the
necessary ioremap in this case, it's probably better to stop and return
error.

> pci_pm_reset() returns -EINVAL if the device is not in D0.  Maybe it
> makes sense to not try any other reset methods in that case, but I
> really don't know.

Yeah, that one could probably be re-worked since it's a standard reset
mechanism.  I wonder if the logic here is to avoid a bus reset for a
device that reports NoSoftRst- but is simply in the wrong state for it.
 
> If we leave it as-is, maybe a comment like the following would be
> useful.
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f0d68066c726..2c98f309bc8a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4170,6 +4170,13 @@ int __pci_reset_function_locked(struct pci_dev *dev)
>  
>  	might_sleep();
>  
> +	/*
> +	 * Reset method return values:
> +	 *   0:		    Device was successfully reset
> +	 *   -ENOTTY:	    Method doesn't support resetting this device;
> +	 *		    try the next method
> +	 *   anything else: Reset failed; don't try any other mechanisms
> +	 */
>  	rc = pci_dev_specific_reset(dev, 0);
>  	if (rc != -ENOTTY)
>  		return rc;

Yep, that's helpful.  The standard reset mechanisms also use the
-ENOTTY convention, but maybe don't have the same authority to indicate
whether to abort or move on to the next method as device specific
resets.  Thanks,

Alex
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6078dfc..a753e07 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4200,20 +4200,20 @@  int __pci_reset_function_locked(struct pci_dev *dev)
 	might_sleep();
 
 	rc = pci_dev_specific_reset(dev, 0);
-	if (rc != -ENOTTY)
+	if (!rc)
 		return rc;
 	if (pcie_has_flr(dev)) {
 		pcie_flr(dev);
 		return 0;
 	}
 	rc = pci_af_flr(dev, 0);
-	if (rc != -ENOTTY)
+	if (!rc)
 		return rc;
 	rc = pci_pm_reset(dev, 0);
-	if (rc != -ENOTTY)
+	if (!rc)
 		return rc;
 	rc = pci_dev_reset_slot_function(dev, 0);
-	if (rc != -ENOTTY)
+	if (!rc)
 		return rc;
 	return pci_parent_bus_reset(dev, 0);
 }