diff mbox series

[v2,1/2] driver core: platform: Clarify that IRQ 0 is invalid

Message ID 20200501224042.141366-2-helgaas@kernel.org
State New
Headers show
Series PCI: Check for platform_get_irq() failure consistently | expand

Commit Message

Bjorn Helgaas May 1, 2020, 10:40 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>

These interfaces return a negative error number or an IRQ:

  platform_get_irq()
  platform_get_irq_optional()
  platform_get_irq_byname()
  platform_get_irq_byname_optional()

The function comments suggest checking for error like this:

  irq = platform_get_irq(...);
  if (irq < 0)
    return irq;

which is what most callers (~900 of 1400) do, so it's implicit that IRQ 0
is invalid.  But some callers check for "irq <= 0", and it's not obvious
from the source that we never return an IRQ 0.

Make this more explicit by updating the comments to say that an IRQ number
is always non-zero and adding a WARN() if we ever do return zero.  If we do
return IRQ 0, it likely indicates a bug in the arch-specific parts of
platform_get_irq().

Relevant prior discussion at [1, 2].

[1] https://lore.kernel.org/r/Pine.LNX.4.64.0701250940220.25027@woody.linux-foundation.org/
[2] https://lore.kernel.org/r/Pine.LNX.4.64.0701252029570.25027@woody.linux-foundation.org/
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/base/platform.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

Comments

Greg Kroah-Hartman May 2, 2020, 6:15 a.m. UTC | #1
On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> These interfaces return a negative error number or an IRQ:
> 
>   platform_get_irq()
>   platform_get_irq_optional()
>   platform_get_irq_byname()
>   platform_get_irq_byname_optional()
> 
> The function comments suggest checking for error like this:
> 
>   irq = platform_get_irq(...);
>   if (irq < 0)
>     return irq;
> 
> which is what most callers (~900 of 1400) do, so it's implicit that IRQ 0
> is invalid.  But some callers check for "irq <= 0", and it's not obvious
> from the source that we never return an IRQ 0.
> 
> Make this more explicit by updating the comments to say that an IRQ number
> is always non-zero and adding a WARN() if we ever do return zero.  If we do
> return IRQ 0, it likely indicates a bug in the arch-specific parts of
> platform_get_irq().

I worry about adding WARN() as there are systems that do panic_on_warn()
and syzbot trips over this as well.  I don't think that for this issue
it would be a problem, but what really is this warning about that
someone could do anything with?

Other than that minor thing, this looks good to me, thanks for finally
clearing this up.

greg k-h
Bjorn Helgaas May 4, 2020, 6:08 p.m. UTC | #2
On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote:
> On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <bhelgaas@google.com>
> > 
> > These interfaces return a negative error number or an IRQ:
> > 
> >   platform_get_irq()
> >   platform_get_irq_optional()
> >   platform_get_irq_byname()
> >   platform_get_irq_byname_optional()
> > 
> > The function comments suggest checking for error like this:
> > 
> >   irq = platform_get_irq(...);
> >   if (irq < 0)
> >     return irq;
> > 
> > which is what most callers (~900 of 1400) do, so it's implicit that IRQ 0
> > is invalid.  But some callers check for "irq <= 0", and it's not obvious
> > from the source that we never return an IRQ 0.
> > 
> > Make this more explicit by updating the comments to say that an IRQ number
> > is always non-zero and adding a WARN() if we ever do return zero.  If we do
> > return IRQ 0, it likely indicates a bug in the arch-specific parts of
> > platform_get_irq().
> 
> I worry about adding WARN() as there are systems that do panic_on_warn()
> and syzbot trips over this as well.  I don't think that for this issue
> it would be a problem, but what really is this warning about that
> someone could do anything with?
> 
> Other than that minor thing, this looks good to me, thanks for finally
> clearing this up.

What I'm concerned about is an arch that returns 0.  Most drivers
don't check for 0 so they'll just try to use it, and things will fail
in some obscure way.  My assumption is that if there really is no IRQ,
we should return -ENOENT or similar instead of 0.

I could be convinced that it's not worth warning about at all, or we
could do something like the following:

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 084cf1d23d3f..4afa5875e14d 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
 	ret = -ENXIO;
 #endif
 out:
-	WARN(ret == 0, "0 is an invalid IRQ number\n");
+	/* Returning zero here is likely a bug in the arch IRQ code */
+	if (ret == 0) {
+		pr_warn("0 is an invalid IRQ number\n");
+		dump_stack();
+	}
 	return ret;
 }
 EXPORT_SYMBOL_GPL(platform_get_irq_optional);
@@ -312,7 +316,11 @@ static int __platform_get_irq_byname(struct platform_device *dev,
 
 	r = platform_get_resource_byname(dev, IORESOURCE_IRQ, name);
 	if (r) {
-		WARN(r->start == 0, "0 is an invalid IRQ number\n");
+		/* Returning zero here is likely a bug in the arch IRQ code */
+		if (r->start == 0) {
+			pr_warn("0 is an invalid IRQ number\n");
+			dump_stack();
+		}
 		return r->start;
 	}
Greg Kroah-Hartman May 4, 2020, 7:07 p.m. UTC | #3
On Mon, May 04, 2020 at 01:08:22PM -0500, Bjorn Helgaas wrote:
> On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote:
> > > From: Bjorn Helgaas <bhelgaas@google.com>
> > > 
> > > These interfaces return a negative error number or an IRQ:
> > > 
> > >   platform_get_irq()
> > >   platform_get_irq_optional()
> > >   platform_get_irq_byname()
> > >   platform_get_irq_byname_optional()
> > > 
> > > The function comments suggest checking for error like this:
> > > 
> > >   irq = platform_get_irq(...);
> > >   if (irq < 0)
> > >     return irq;
> > > 
> > > which is what most callers (~900 of 1400) do, so it's implicit that IRQ 0
> > > is invalid.  But some callers check for "irq <= 0", and it's not obvious
> > > from the source that we never return an IRQ 0.
> > > 
> > > Make this more explicit by updating the comments to say that an IRQ number
> > > is always non-zero and adding a WARN() if we ever do return zero.  If we do
> > > return IRQ 0, it likely indicates a bug in the arch-specific parts of
> > > platform_get_irq().
> > 
> > I worry about adding WARN() as there are systems that do panic_on_warn()
> > and syzbot trips over this as well.  I don't think that for this issue
> > it would be a problem, but what really is this warning about that
> > someone could do anything with?
> > 
> > Other than that minor thing, this looks good to me, thanks for finally
> > clearing this up.
> 
> What I'm concerned about is an arch that returns 0.  Most drivers
> don't check for 0 so they'll just try to use it, and things will fail
> in some obscure way.  My assumption is that if there really is no IRQ,
> we should return -ENOENT or similar instead of 0.
> 
> I could be convinced that it's not worth warning about at all, or we
> could do something like the following:
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 084cf1d23d3f..4afa5875e14d 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
>  	ret = -ENXIO;
>  #endif
>  out:
> -	WARN(ret == 0, "0 is an invalid IRQ number\n");
> +	/* Returning zero here is likely a bug in the arch IRQ code */
> +	if (ret == 0) {
> +		pr_warn("0 is an invalid IRQ number\n");
> +		dump_stack();
> +	}
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(platform_get_irq_optional);
> @@ -312,7 +316,11 @@ static int __platform_get_irq_byname(struct platform_device *dev,
>  
>  	r = platform_get_resource_byname(dev, IORESOURCE_IRQ, name);
>  	if (r) {
> -		WARN(r->start == 0, "0 is an invalid IRQ number\n");
> +		/* Returning zero here is likely a bug in the arch IRQ code */
> +		if (r->start == 0) {
> +			pr_warn("0 is an invalid IRQ number\n");
> +			dump_stack();
> +		}
>  		return r->start;
>  	}
>  

I like that, but you said this is something that the platform people
should only see when bringing up a new system, so maybe the WARN() is
fine.  It's not user-triggerable, so your original is ok.

sorry for the noise,

greg k-h
Bjorn Helgaas May 4, 2020, 10:26 p.m. UTC | #4
On Mon, May 04, 2020 at 09:07:21PM +0200, Greg Kroah-Hartman wrote:
> On Mon, May 04, 2020 at 01:08:22PM -0500, Bjorn Helgaas wrote:
> > On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote:
> > > > From: Bjorn Helgaas <bhelgaas@google.com>
> > > > 
> > > > These interfaces return a negative error number or an IRQ:
> > > > 
> > > >   platform_get_irq()
> > > >   platform_get_irq_optional()
> > > >   platform_get_irq_byname()
> > > >   platform_get_irq_byname_optional()
> > > > 
> > > > The function comments suggest checking for error like this:
> > > > 
> > > >   irq = platform_get_irq(...);
> > > >   if (irq < 0)
> > > >     return irq;
> > > > 
> > > > which is what most callers (~900 of 1400) do, so it's implicit
> > > > that IRQ 0 is invalid.  But some callers check for "irq <= 0",
> > > > and it's not obvious from the source that we never return an
> > > > IRQ 0.
> > > > 
> > > > Make this more explicit by updating the comments to say that
> > > > an IRQ number is always non-zero and adding a WARN() if we
> > > > ever do return zero.  If we do return IRQ 0, it likely
> > > > indicates a bug in the arch-specific parts of
> > > > platform_get_irq().
> > > 
> > > I worry about adding WARN() as there are systems that do
> > > panic_on_warn() and syzbot trips over this as well.  I don't
> > > think that for this issue it would be a problem, but what really
> > > is this warning about that someone could do anything with?
> > > 
> > > Other than that minor thing, this looks good to me, thanks for
> > > finally clearing this up.
> > 
> > What I'm concerned about is an arch that returns 0.  Most drivers
> > don't check for 0 so they'll just try to use it, and things will
> > fail in some obscure way.  My assumption is that if there really
> > is no IRQ, we should return -ENOENT or similar instead of 0.
> > 
> > I could be convinced that it's not worth warning about at all, or
> > we could do something like the following:
> > 
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index 084cf1d23d3f..4afa5875e14d 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
> >  	ret = -ENXIO;
> >  #endif
> >  out:
> > -	WARN(ret == 0, "0 is an invalid IRQ number\n");
> > +	/* Returning zero here is likely a bug in the arch IRQ code */
> > +	if (ret == 0) {
> > +		pr_warn("0 is an invalid IRQ number\n");
> > +		dump_stack();
> > +	}
> >  	return ret;
> >  }
> > ...

> I like that, but you said this is something that the platform people
> should only see when bringing up a new system, so maybe the WARN() is
> fine.  It's not user-triggerable, so your original is ok.

Is that an ack?  Thomas, any thoughts?

I suspect we could see this given a broken DT, too, so I'm not sure
it's strictly a bringup problem.

I would probably argue that even this case would be an arch defect:
the kernel should validate data from a DT at least enough to avoid
giving a bogus, useless IRQ to a driver.

Bjorn
Greg Kroah-Hartman May 5, 2020, 1:35 p.m. UTC | #5
On Mon, May 04, 2020 at 05:26:59PM -0500, Bjorn Helgaas wrote:
> On Mon, May 04, 2020 at 09:07:21PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, May 04, 2020 at 01:08:22PM -0500, Bjorn Helgaas wrote:
> > > On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote:
> > > > > From: Bjorn Helgaas <bhelgaas@google.com>
> > > > > 
> > > > > These interfaces return a negative error number or an IRQ:
> > > > > 
> > > > >   platform_get_irq()
> > > > >   platform_get_irq_optional()
> > > > >   platform_get_irq_byname()
> > > > >   platform_get_irq_byname_optional()
> > > > > 
> > > > > The function comments suggest checking for error like this:
> > > > > 
> > > > >   irq = platform_get_irq(...);
> > > > >   if (irq < 0)
> > > > >     return irq;
> > > > > 
> > > > > which is what most callers (~900 of 1400) do, so it's implicit
> > > > > that IRQ 0 is invalid.  But some callers check for "irq <= 0",
> > > > > and it's not obvious from the source that we never return an
> > > > > IRQ 0.
> > > > > 
> > > > > Make this more explicit by updating the comments to say that
> > > > > an IRQ number is always non-zero and adding a WARN() if we
> > > > > ever do return zero.  If we do return IRQ 0, it likely
> > > > > indicates a bug in the arch-specific parts of
> > > > > platform_get_irq().
> > > > 
> > > > I worry about adding WARN() as there are systems that do
> > > > panic_on_warn() and syzbot trips over this as well.  I don't
> > > > think that for this issue it would be a problem, but what really
> > > > is this warning about that someone could do anything with?
> > > > 
> > > > Other than that minor thing, this looks good to me, thanks for
> > > > finally clearing this up.
> > > 
> > > What I'm concerned about is an arch that returns 0.  Most drivers
> > > don't check for 0 so they'll just try to use it, and things will
> > > fail in some obscure way.  My assumption is that if there really
> > > is no IRQ, we should return -ENOENT or similar instead of 0.
> > > 
> > > I could be convinced that it's not worth warning about at all, or
> > > we could do something like the following:
> > > 
> > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > > index 084cf1d23d3f..4afa5875e14d 100644
> > > --- a/drivers/base/platform.c
> > > +++ b/drivers/base/platform.c
> > > @@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
> > >  	ret = -ENXIO;
> > >  #endif
> > >  out:
> > > -	WARN(ret == 0, "0 is an invalid IRQ number\n");
> > > +	/* Returning zero here is likely a bug in the arch IRQ code */
> > > +	if (ret == 0) {
> > > +		pr_warn("0 is an invalid IRQ number\n");
> > > +		dump_stack();
> > > +	}
> > >  	return ret;
> > >  }
> > > ...
> 
> > I like that, but you said this is something that the platform people
> > should only see when bringing up a new system, so maybe the WARN() is
> > fine.  It's not user-triggerable, so your original is ok.
> 
> Is that an ack?  Thomas, any thoughts?

Sorry, yes:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff mbox series

Patch

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 5255550b7c34..084cf1d23d3f 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -152,23 +152,24 @@  EXPORT_SYMBOL_GPL(devm_platform_ioremap_resource_byname);
  *		if (irq < 0)
  *			return irq;
  *
- * Return: IRQ number on success, negative error number on failure.
+ * Return: non-zero IRQ number on success, negative error number on failure.
  */
 int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
 {
+	int ret;
 #ifdef CONFIG_SPARC
 	/* sparc does not have irqs represented as IORESOURCE_IRQ resources */
 	if (!dev || num >= dev->archdata.num_irqs)
 		return -ENXIO;
-	return dev->archdata.irqs[num];
+	ret = dev->archdata.irqs[num];
+	goto out;
 #else
 	struct resource *r;
-	int ret;
 
 	if (IS_ENABLED(CONFIG_OF_IRQ) && dev->dev.of_node) {
 		ret = of_irq_get(dev->dev.of_node, num);
 		if (ret > 0 || ret == -EPROBE_DEFER)
-			return ret;
+			goto out;
 	}
 
 	r = platform_get_resource(dev, IORESOURCE_IRQ, num);
@@ -176,7 +177,7 @@  int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
 		if (r && r->flags & IORESOURCE_DISABLED) {
 			ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
 			if (ret)
-				return ret;
+				goto out;
 		}
 	}
 
@@ -190,13 +191,17 @@  int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
 		struct irq_data *irqd;
 
 		irqd = irq_get_irq_data(r->start);
-		if (!irqd)
-			return -ENXIO;
+		if (!irqd) {
+			ret = -ENXIO;
+			goto out;
+		}
 		irqd_set_trigger_type(irqd, r->flags & IORESOURCE_BITS);
 	}
 
-	if (r)
-		return r->start;
+	if (r) {
+		ret = r->start;
+		goto out;
+	}
 
 	/*
 	 * For the index 0 interrupt, allow falling back to GpioInt
@@ -209,11 +214,14 @@  int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
 		ret = acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num);
 		/* Our callers expect -ENXIO for missing IRQs. */
 		if (ret >= 0 || ret == -EPROBE_DEFER)
-			return ret;
+			goto out;
 	}
 
-	return -ENXIO;
+	ret = -ENXIO;
 #endif
+out:
+	WARN(ret == 0, "0 is an invalid IRQ number\n");
+	return ret;
 }
 EXPORT_SYMBOL_GPL(platform_get_irq_optional);
 
@@ -231,7 +239,7 @@  EXPORT_SYMBOL_GPL(platform_get_irq_optional);
  *		if (irq < 0)
  *			return irq;
  *
- * Return: IRQ number on success, negative error number on failure.
+ * Return: non-zero IRQ number on success, negative error number on failure.
  */
 int platform_get_irq(struct platform_device *dev, unsigned int num)
 {
@@ -303,8 +311,10 @@  static int __platform_get_irq_byname(struct platform_device *dev,
 	}
 
 	r = platform_get_resource_byname(dev, IORESOURCE_IRQ, name);
-	if (r)
+	if (r) {
+		WARN(r->start == 0, "0 is an invalid IRQ number\n");
 		return r->start;
+	}
 
 	return -ENXIO;
 }
@@ -316,7 +326,7 @@  static int __platform_get_irq_byname(struct platform_device *dev,
  *
  * Get an IRQ like platform_get_irq(), but then by name rather then by index.
  *
- * Return: IRQ number on success, negative error number on failure.
+ * Return: non-zero IRQ number on success, negative error number on failure.
  */
 int platform_get_irq_byname(struct platform_device *dev, const char *name)
 {
@@ -338,7 +348,7 @@  EXPORT_SYMBOL_GPL(platform_get_irq_byname);
  * Get an optional IRQ by name like platform_get_irq_byname(). Except that it
  * does not print an error message if an IRQ can not be obtained.
  *
- * Return: IRQ number on success, negative error number on failure.
+ * Return: non-zero IRQ number on success, negative error number on failure.
  */
 int platform_get_irq_byname_optional(struct platform_device *dev,
 				     const char *name)