diff mbox series

[v4] gpio: Return EPROBE_DEFER if gc->to_irq is NULL

Message ID 20211116093833.245542-1-shreeya.patel@collabora.com
State New
Headers show
Series [v4] gpio: Return EPROBE_DEFER if gc->to_irq is NULL | expand

Commit Message

Shreeya Patel Nov. 16, 2021, 9:38 a.m. UTC
We are racing the registering of .to_irq when probing the
i2c driver. This results in random failure of touchscreen
devices.

Following errors could be seen in dmesg logs when gc->to_irq is NULL

[2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ
[2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22

To avoid this situation, defer probing until to_irq is registered.

This issue has been reported many times in past and people have been
using workarounds like changing the pinctrl_amd to built-in instead
of loading it as a module or by adding a softdep for pinctrl_amd into
the config file.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>

---
Changes in v4
  - Remove blank line and make the first letter of the sentence
capital.

Changes in v3
  - Fix the error reported by kernel test robot.

Changes in v2
  - Add a condition to check for irq chip to avoid bogus error.
---
 drivers/gpio/gpiolib.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Gabriel Krisman Bertazi Feb. 10, 2022, 4:36 p.m. UTC | #1
Shreeya Patel <shreeya.patel@collabora.com> writes:

> We are racing the registering of .to_irq when probing the
> i2c driver. This results in random failure of touchscreen
> devices.
>
> Following errors could be seen in dmesg logs when gc->to_irq is NULL
>
> [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ
> [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22
>
> To avoid this situation, defer probing until to_irq is registered.
>
> This issue has been reported many times in past and people have been
> using workarounds like changing the pinctrl_amd to built-in instead
> of loading it as a module or by adding a softdep for pinctrl_amd into
> the config file.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413
> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>

Hi guys,

This seems to not have reached the Linus tree on 5.17.  If I'm not
mistaken, it also hasn't reached linux-next as of today. Is there
anything I'm missing here?

This is required to prevent spurious probe crashes of devices like this
FocalTech touchscreen, FT3528, when using pinctrl-amd. We've been
carrying it downstream for quite a while.

Thanks,
Bartosz Golaszewski Feb. 10, 2022, 6 p.m. UTC | #2
On Thu, Feb 10, 2022 at 5:36 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Shreeya Patel <shreeya.patel@collabora.com> writes:
>
> > We are racing the registering of .to_irq when probing the
> > i2c driver. This results in random failure of touchscreen
> > devices.
> >
> > Following errors could be seen in dmesg logs when gc->to_irq is NULL
> >
> > [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ
> > [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22
> >
> > To avoid this situation, defer probing until to_irq is registered.
> >
> > This issue has been reported many times in past and people have been
> > using workarounds like changing the pinctrl_amd to built-in instead
> > of loading it as a module or by adding a softdep for pinctrl_amd into
> > the config file.
> >
> > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413
> > Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
> > Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
>
> Hi guys,
>
> This seems to not have reached the Linus tree on 5.17.  If I'm not
> mistaken, it also hasn't reached linux-next as of today. Is there
> anything I'm missing here?
>
> This is required to prevent spurious probe crashes of devices like this
> FocalTech touchscreen, FT3528, when using pinctrl-amd. We've been
> carrying it downstream for quite a while.
>
> Thanks,
>
> --
> Gabriel Krisman Bertazi

Hi Gabriel!

My email address changed in September, that's why I didn't see the
email you sent in November to my old one.

gpiod_to_irq() can be used in context other than driver probing, I'm
worried existing users would not know how to handle it. Also: how come
you can get the GPIO descriptor from the provider but its interrupts
are not yet set up?

Bart
Gabriel Krisman Bertazi Feb. 11, 2022, 1:26 a.m. UTC | #3
Bartosz Golaszewski <brgl@bgdev.pl> writes:

> My email address changed in September, that's why I didn't see the
> email you sent in November to my old one.

Hi Bart,

thanks for the prompt reply and sorry for the wrong email address.

> gpiod_to_irq() can be used in context other than driver probing, I'm
> worried existing users would not know how to handle it. Also: how come
> you can get the GPIO descriptor from the provider but its interrupts
> are not yet set up?

I'm definitely some context here, as its been quite a while.
Shreeya, feel free to pitch in. :)

This is one of the races we saw in gpiochip_add_irqchip, depending on
the probe order.  The gc is already visible while partially initialized,
if pinctrl-amd hasn't been probed yet.  Another device being probed can
hit an -ENXIO here if to_irq is yet uninitialized or enter .to_irq() and
oops.  Shreeya's patch workarounds the first issue, but is not a
solution for the second.

There is another patch that has been flying around to address the Oops.

https://lkml.org/lkml/2021/11/8/900

She's been working on a proper solution for that one, which might
actually address this too and replace the current patch.  Maybe you
could help us get to a proper solution there?  I'm quite unfamiliar with
this code myself :)
Shreeya Patel Feb. 11, 2022, 10:03 a.m. UTC | #4
On 11/02/22 6:56 am, Gabriel Krisman Bertazi wrote:
> Bartosz Golaszewski <brgl@bgdev.pl> writes:
>
>> My email address changed in September, that's why I didn't see the
>> email you sent in November to my old one.
> Hi Bart,
>
> thanks for the prompt reply and sorry for the wrong email address.
>
>> gpiod_to_irq() can be used in context other than driver probing, I'm
>> worried existing users would not know how to handle it. Also: how come
>> you can get the GPIO descriptor from the provider but its interrupts
>> are not yet set up?
> I'm definitely some context here, as its been quite a while.
> Shreeya, feel free to pitch in. :)


Existing users will probably receive -ENXIO in case to_irq is not
set and wasn't intended to be set.
We are trying to solve the race which happens frequently in cases
where I2C is set as built-in and pinctrl-amd is set as module.
There is no dependency between I2C and pinctrl-amd, while pinctrl-amd is
still trying to set the gc irq members through gpiochip_add_irqchip, I2C
calls gpiod_to_irq() which leads to returning -ENXIO since gc->to_irq is 
still NULL


There have also been cases where gc->to_irq is set successfully but 
other members
are yet to be initalized by gpiochip_add_irqchip like the domain 
variable which is
being used in .to_irq() and ultimately leads to a NULL pointer 
dereference as Gabriel
mentioned. I am working on a fix which would use mutex to not let gc irq 
members
be accessed until they all have been completely initialized.

I2C calls gpiod_to_irq through the following stack trace

kernel: Call Trace:
kernel:  gpiod_to_irq.cold+0x49/0x8f
kernel:  acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel:  i2c_acpi_get_irq+0xc0/0xd0
kernel:  i2c_device_probe+0x28a/0x2a0
kernel:  really_probe+0xf2/0x460
kernel:  driver_probe_device+0xe8/0x160

and pinctrl-amd makes gc visible through gpiochip_add_data_with_key()


Thanks,
Shreeya Patel


> This is one of the races we saw in gpiochip_add_irqchip, depending on
> the probe order.  The gc is already visible while partially initialized,
> if pinctrl-amd hasn't been probed yet.  Another device being probed can
> hit an -ENXIO here if to_irq is yet uninitialized or enter .to_irq() and
> oops.  Shreeya's patch workarounds the first issue, but is not a
> solution for the second.
>
> There is another patch that has been flying around to address the Oops.
>
> https://lkml.org/lkml/2021/11/8/900
>
> She's been working on a proper solution for that one, which might
> actually address this too and replace the current patch.  Maybe you
> could help us get to a proper solution there?  I'm quite unfamiliar with
> this code myself :)
>
Shreeya Patel Feb. 17, 2022, 8:10 p.m. UTC | #5
On 10/02/22 11:30 pm, Bartosz Golaszewski wrote:
> On Thu, Feb 10, 2022 at 5:36 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>> Shreeya Patel <shreeya.patel@collabora.com> writes:
>>
>>> We are racing the registering of .to_irq when probing the
>>> i2c driver. This results in random failure of touchscreen
>>> devices.
>>>
>>> Following errors could be seen in dmesg logs when gc->to_irq is NULL
>>>
>>> [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ
>>> [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22
>>>
>>> To avoid this situation, defer probing until to_irq is registered.
>>>
>>> This issue has been reported many times in past and people have been
>>> using workarounds like changing the pinctrl_amd to built-in instead
>>> of loading it as a module or by adding a softdep for pinctrl_amd into
>>> the config file.
>>>
>>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413
>>> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
>>> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
>> Hi guys,
>>
>> This seems to not have reached the Linus tree on 5.17.  If I'm not
>> mistaken, it also hasn't reached linux-next as of today. Is there
>> anything I'm missing here?
>>
>> This is required to prevent spurious probe crashes of devices like this
>> FocalTech touchscreen, FT3528, when using pinctrl-amd. We've been
>> carrying it downstream for quite a while.
>>
>> Thanks,
>>
>> --
>> Gabriel Krisman Bertazi
> Hi Gabriel!
>
> My email address changed in September, that's why I didn't see the
> email you sent in November to my old one.
>
> gpiod_to_irq() can be used in context other than driver probing, I'm
> worried existing users would not know how to handle it. Also: how come
> you can get the GPIO descriptor from the provider but its interrupts
> are not yet set up?

Hi Bartosz,

We would be only changing the error code here for gpio driver race 
condition.
Anything affected by this patch would have already been broken and might 
be returning
-ENXIO. There is a theoretical chance that a driver exists which uses 
gpiod_to_irq outside of
probe and affected by race condition. The more instructions between 
gpiod_get and gpiod_to_irq
the smaller the chance of hitting the race condition though. Also 
anything hitting the race condition
is broken right now and there hasn't been any issues reported so far. In 
any case that system is not
fixed by this patch, it is still a good idea to apply this patch since 
the proper fix is a lot more
invasive and probably might not be suitable for stable backporting. This 
minimal patch does
not make things worse.

I have sent a v5 for this patch with better commit message.

Thanks,
Shreeya Patel
>
> Bart
>
diff mbox series

Patch

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index abfbf546d159..7b3f7f4d1d06 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -3111,6 +3111,16 @@  int gpiod_to_irq(const struct gpio_desc *desc)
 
 		return retirq;
 	}
+#ifdef CONFIG_GPIOLIB_IRQCHIP
+	if (gc->irq.chip) {
+		/*
+		 * Avoid race condition with other code, which tries to lookup
+		 * an IRQ before the irqchip has been properly registered,
+		 * i.e. while gpiochip is still being brought up.
+		 */
+		return -EPROBE_DEFER;
+	}
+#endif
 	return -ENXIO;
 }
 EXPORT_SYMBOL_GPL(gpiod_to_irq);