diff mbox series

[v3] pinctrl: qcom: Handle broken/missing PDC dual edge IRQs on sc7180

Message ID 20200714080254.v3.1.Ie0d730120b232a86a4eac1e2909bcbec844d1766@changeid
State New
Headers show
Series [v3] pinctrl: qcom: Handle broken/missing PDC dual edge IRQs on sc7180 | expand

Commit Message

Doug Anderson July 14, 2020, 3:04 p.m. UTC
Depending on how you look at it, you can either say that:
a) There is a PDC hardware issue (with the specific IP rev that exists
   on sc7180) that causes the PDC not to work properly when configured
   to handle dual edges.
b) The dual edge feature of the PDC hardware was only added in later
   HW revisions and thus isn't in all hardware.

Regardless of how you look at it, let's work around the lack of dual
edge support by only ever letting our parent see requests for single
edge interrupts on affected hardware.

NOTE: it's possible that a driver requesting a dual edge interrupt
might get several edges coalesced into a single IRQ.  For instance if
a line starts low and then goes high and low again, the driver that
requested the IRQ is not guaranteed to be called twice.  However, it
is guaranteed that once the driver's interrupt handler starts running
its first instruction that any new edges coming in will cause the
interrupt to fire again.  This is relatively commonplace for dual-edge
gpio interrupts (many gpio controllers require software to emulate
dual edge with single edge) so client drivers should be setup to
handle it.

Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
As far as I can tell everything here should work and the limited
testing I'm able to give it shows that, in fact, I can detect both
edges.

I specifically left off Reviewed-by and Tested-by tags from v2 becuase
I felt that the implementation had changed just enough to invalidate
previous reviews / testing.  Hopefully it's not too much of a hassle
for folks to re-review and re-test.

Changes in v3:
- Rate limit the warning.

Changes in v2:
- Use handle_fasteoi_ack_irq() and switch edges in the Ack now.
- If we change types, switch back to the normal handle_fasteoi_irq().
- No extra locking.
- Properly print an error if we hit 100 loops w/ no stability.
- Beefed up the commit message.

 drivers/pinctrl/qcom/Kconfig          |  2 +
 drivers/pinctrl/qcom/pinctrl-msm.c    | 74 ++++++++++++++++++++++++++-
 drivers/pinctrl/qcom/pinctrl-msm.h    |  4 ++
 drivers/pinctrl/qcom/pinctrl-sc7180.c |  1 +
 4 files changed, 79 insertions(+), 2 deletions(-)

Comments

Marc Zyngier July 14, 2020, 3:09 p.m. UTC | #1
On 2020-07-14 16:04, Douglas Anderson wrote:
> Depending on how you look at it, you can either say that:
> a) There is a PDC hardware issue (with the specific IP rev that exists
>    on sc7180) that causes the PDC not to work properly when configured
>    to handle dual edges.
> b) The dual edge feature of the PDC hardware was only added in later
>    HW revisions and thus isn't in all hardware.
> 
> Regardless of how you look at it, let's work around the lack of dual
> edge support by only ever letting our parent see requests for single
> edge interrupts on affected hardware.
> 
> NOTE: it's possible that a driver requesting a dual edge interrupt
> might get several edges coalesced into a single IRQ.  For instance if
> a line starts low and then goes high and low again, the driver that
> requested the IRQ is not guaranteed to be called twice.  However, it
> is guaranteed that once the driver's interrupt handler starts running
> its first instruction that any new edges coming in will cause the
> interrupt to fire again.  This is relatively commonplace for dual-edge
> gpio interrupts (many gpio controllers require software to emulate
> dual edge with single edge) so client drivers should be setup to
> handle it.
> 
> Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Marc Zyngier <maz@kernel.org>

Linus, I assume you will get this one via the pinctrl tree once you
are happy with it?

Thanks,

         M.
Linus Walleij July 16, 2020, 1:42 p.m. UTC | #2
On Tue, Jul 14, 2020 at 5:04 PM Douglas Anderson <dianders@chromium.org> wrote:

> Depending on how you look at it, you can either say that:
> a) There is a PDC hardware issue (with the specific IP rev that exists
>    on sc7180) that causes the PDC not to work properly when configured
>    to handle dual edges.
> b) The dual edge feature of the PDC hardware was only added in later
>    HW revisions and thus isn't in all hardware.
>
> Regardless of how you look at it, let's work around the lack of dual
> edge support by only ever letting our parent see requests for single
> edge interrupts on affected hardware.
>
> NOTE: it's possible that a driver requesting a dual edge interrupt
> might get several edges coalesced into a single IRQ.  For instance if
> a line starts low and then goes high and low again, the driver that
> requested the IRQ is not guaranteed to be called twice.  However, it
> is guaranteed that once the driver's interrupt handler starts running
> its first instruction that any new edges coming in will cause the
> interrupt to fire again.  This is relatively commonplace for dual-edge
> gpio interrupts (many gpio controllers require software to emulate
> dual edge with single edge) so client drivers should be setup to
> handle it.
>
> Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> As far as I can tell everything here should work and the limited
> testing I'm able to give it shows that, in fact, I can detect both
> edges.
>
> I specifically left off Reviewed-by and Tested-by tags from v2 becuase
> I felt that the implementation had changed just enough to invalidate
> previous reviews / testing.  Hopefully it's not too much of a hassle
> for folks to re-review and re-test.
>
> Changes in v3:
> - Rate limit the warning.

Tentatively applied this to the fixes branch in the pinctrl tree
so we get some linux-next coverage.

Would be nice to get Bjorn's ACK on it as well!

Yours,
Linus Walleij
John Stultz Aug. 3, 2020, 9:06 p.m. UTC | #3
On Tue, Jul 14, 2020 at 8:08 AM Douglas Anderson <dianders@chromium.org> wrote:
>
> Depending on how you look at it, you can either say that:
> a) There is a PDC hardware issue (with the specific IP rev that exists
>    on sc7180) that causes the PDC not to work properly when configured
>    to handle dual edges.
> b) The dual edge feature of the PDC hardware was only added in later
>    HW revisions and thus isn't in all hardware.
>
> Regardless of how you look at it, let's work around the lack of dual
> edge support by only ever letting our parent see requests for single
> edge interrupts on affected hardware.
>
> NOTE: it's possible that a driver requesting a dual edge interrupt
> might get several edges coalesced into a single IRQ.  For instance if
> a line starts low and then goes high and low again, the driver that
> requested the IRQ is not guaranteed to be called twice.  However, it
> is guaranteed that once the driver's interrupt handler starts running
> its first instruction that any new edges coming in will cause the
> interrupt to fire again.  This is relatively commonplace for dual-edge
> gpio interrupts (many gpio controllers require software to emulate
> dual edge with single edge) so client drivers should be setup to
> handle it.
>
> Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Just as a heads up. I started seeing boot failures (crashes really
early before we get serial output) with db845c when testing with the
android-mainline tree that pulled v5.8 in.
I did some quick bisection and came down to this patch, and sure
enough things boot again with this patch reverted.

In my testing earlier today with v5.8 (+ just a few patches for db845c
support), I didn't see this failure, but the configs in use are
different there.

I'll try to spend a bit of time to understand exactly what is failing,
but if you have any initial suggestions for things to try, I'd
appreciate it.

thanks
-john
Doug Anderson Aug. 3, 2020, 9:57 p.m. UTC | #4
Hi,

On Mon, Aug 3, 2020 at 2:06 PM John Stultz <john.stultz@linaro.org> wrote:
>
> On Tue, Jul 14, 2020 at 8:08 AM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > Depending on how you look at it, you can either say that:
> > a) There is a PDC hardware issue (with the specific IP rev that exists
> >    on sc7180) that causes the PDC not to work properly when configured
> >    to handle dual edges.
> > b) The dual edge feature of the PDC hardware was only added in later
> >    HW revisions and thus isn't in all hardware.
> >
> > Regardless of how you look at it, let's work around the lack of dual
> > edge support by only ever letting our parent see requests for single
> > edge interrupts on affected hardware.
> >
> > NOTE: it's possible that a driver requesting a dual edge interrupt
> > might get several edges coalesced into a single IRQ.  For instance if
> > a line starts low and then goes high and low again, the driver that
> > requested the IRQ is not guaranteed to be called twice.  However, it
> > is guaranteed that once the driver's interrupt handler starts running
> > its first instruction that any new edges coming in will cause the
> > interrupt to fire again.  This is relatively commonplace for dual-edge
> > gpio interrupts (many gpio controllers require software to emulate
> > dual edge with single edge) so client drivers should be setup to
> > handle it.
> >
> > Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy")
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
>
> Just as a heads up. I started seeing boot failures (crashes really
> early before we get serial output) with db845c when testing with the
> android-mainline tree that pulled v5.8 in.

Even before earlycon?  Ick.  For me earlycon comes up way before
pinctrl and I thought that, by design, earlycon came up so dang early
that you could debug almost anything with it.

To confirm, I could even drop into earlycon_kgdb (which starts later
than earlycon), then set a breakpoint on msm_pinctrl_probe() and I'd
hit my breakpoint.  Enabling earlycon should be super easy these
days--just add the "earlycon" command line parameter and the kernel
seems to do the rest of the magic based on the "stdout-path".  I guess
if your bootloader doesn't cooperate and leave the system in an OK
state then you'll be in bad shape, but otherwise it should be nice...

NOTE: if you have earlycon and this is still causing crashes before
earlycon starts, the only things I can think of are side effects of
this patch.  Could it have made your kernel just a little too big and
now you're overflowing some hard limit of the bootloader?  Maybe
you're hitting a ccache bug and using some stale garbage (don't laugh,
this happened to me the other year)?  Maybe there's a pointer bug and
this moves addresses just enough to make it cause havoc?


> I did some quick bisection and came down to this patch, and sure
> enough things boot again with this patch reverted.
>
> In my testing earlier today with v5.8 (+ just a few patches for db845c
> support), I didn't see this failure, but the configs in use are
> different there.
>
> I'll try to spend a bit of time to understand exactly what is failing,
> but if you have any initial suggestions for things to try, I'd
> appreciate it.

So on SDM845 we aren't setting "wakeirq_dual_edge_errata", right?
It's possible that you also need it, but I didn't have an SDM845
device in front of me to test with--I only have remote access to one.
...but in any case, the fact that SDM845 doesn't have
"wakeirq_dual_edge_errata" set should eliminate a bunch of code.

Once you eliminate that there's almost nothing left of this patch.
You could try commenting out:

irq_set_handler_locked(d, handle_fasteoi_irq);

...and see if that helps?

NOTE: I just tried putting kernel 5.8 on my sdm845-cheza device.  It
booted up without crashing...  I'm probably not using the same config
you are, but at least it appears that sdm845 isn't totally broken or
anything...

-Doug
John Stultz Aug. 4, 2020, 12:48 a.m. UTC | #5
On Mon, Aug 3, 2020 at 2:58 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Aug 3, 2020 at 2:06 PM John Stultz <john.stultz@linaro.org> wrote:
> >
> > On Tue, Jul 14, 2020 at 8:08 AM Douglas Anderson <dianders@chromium.org> wrote:
> > >
> > > Depending on how you look at it, you can either say that:
> > > a) There is a PDC hardware issue (with the specific IP rev that exists
> > >    on sc7180) that causes the PDC not to work properly when configured
> > >    to handle dual edges.
> > > b) The dual edge feature of the PDC hardware was only added in later
> > >    HW revisions and thus isn't in all hardware.
> > >
> > > Regardless of how you look at it, let's work around the lack of dual
> > > edge support by only ever letting our parent see requests for single
> > > edge interrupts on affected hardware.
> > >
> > > NOTE: it's possible that a driver requesting a dual edge interrupt
> > > might get several edges coalesced into a single IRQ.  For instance if
> > > a line starts low and then goes high and low again, the driver that
> > > requested the IRQ is not guaranteed to be called twice.  However, it
> > > is guaranteed that once the driver's interrupt handler starts running
> > > its first instruction that any new edges coming in will cause the
> > > interrupt to fire again.  This is relatively commonplace for dual-edge
> > > gpio interrupts (many gpio controllers require software to emulate
> > > dual edge with single edge) so client drivers should be setup to
> > > handle it.
> > >
> > > Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy")
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> >
> > Just as a heads up. I started seeing boot failures (crashes really
> > early before we get serial output) with db845c when testing with the
> > android-mainline tree that pulled v5.8 in.
>
> Even before earlycon?  Ick.  For me earlycon comes up way before
> pinctrl and I thought that, by design, earlycon came up so dang early
> that you could debug almost anything with it.
>
> To confirm, I could even drop into earlycon_kgdb (which starts later
> than earlycon), then set a breakpoint on msm_pinctrl_probe() and I'd
> hit my breakpoint.  Enabling earlycon should be super easy these
> days--just add the "earlycon" command line parameter and the kernel
> seems to do the rest of the magic based on the "stdout-path".  I guess
> if your bootloader doesn't cooperate and leave the system in an OK
> state then you'll be in bad shape, but otherwise it should be nice...
>
> NOTE: if you have earlycon and this is still causing crashes before
> earlycon starts, the only things I can think of are side effects of
> this patch.  Could it have made your kernel just a little too big and
> now you're overflowing some hard limit of the bootloader?  Maybe
> you're hitting a ccache bug and using some stale garbage (don't laugh,
> this happened to me the other year)?  Maybe there's a pointer bug and
> this moves addresses just enough to make it cause havoc?
>

Sorry! False positive on this one. The android-mainline tree has
serial drivers as modules, so earlycon doesn't help right off.
I reworked the config so I could use earlycon and realized the trouble
was with the new selected configs in this patch which need to also be
selected in the GKI kernel.

Apologies for the noise.

thanks
-john
diff mbox series

Patch

diff --git a/drivers/pinctrl/qcom/Kconfig b/drivers/pinctrl/qcom/Kconfig
index ff1ee159dca2..f8ff30cdafa6 100644
--- a/drivers/pinctrl/qcom/Kconfig
+++ b/drivers/pinctrl/qcom/Kconfig
@@ -7,6 +7,8 @@  config PINCTRL_MSM
 	select PINCONF
 	select GENERIC_PINCONF
 	select GPIOLIB_IRQCHIP
+	select IRQ_DOMAIN_HIERARCHY
+	select IRQ_FASTEOI_HIERARCHY_HANDLERS
 
 config PINCTRL_APQ8064
 	tristate "Qualcomm APQ8064 pin controller driver"
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index 83b7d64bc4c1..c322f30a2064 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -832,6 +832,52 @@  static void msm_gpio_irq_unmask(struct irq_data *d)
 	msm_gpio_irq_clear_unmask(d, false);
 }
 
+/**
+ * msm_gpio_update_dual_edge_parent() - Prime next edge for IRQs handled by parent.
+ * @d: The irq dta.
+ *
+ * This is much like msm_gpio_update_dual_edge_pos() but for IRQs that are
+ * normally handled by the parent irqchip.  The logic here is slightly
+ * different due to what's easy to do with our parent, but in principle it's
+ * the same.
+ */
+static void msm_gpio_update_dual_edge_parent(struct irq_data *d)
+{
+	struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+	struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
+	const struct msm_pingroup *g = &pctrl->soc->groups[d->hwirq];
+	int loop_limit = 100;
+	unsigned int val;
+	unsigned int type;
+
+	/* Read the value and make a guess about what edge we need to catch */
+	val = msm_readl_io(pctrl, g) & BIT(g->in_bit);
+	type = val ? IRQ_TYPE_EDGE_FALLING : IRQ_TYPE_EDGE_RISING;
+
+	do {
+		/* Set the parent to catch the next edge */
+		irq_chip_set_type_parent(d, type);
+
+		/*
+		 * Possibly the line changed between when we last read "val"
+		 * (and decided what edge we needed) and when set the edge.
+		 * If the value didn't change (or changed and then changed
+		 * back) then we're done.
+		 */
+		val = msm_readl_io(pctrl, g) & BIT(g->in_bit);
+		if (type == IRQ_TYPE_EDGE_RISING) {
+			if (!val)
+				return;
+			type = IRQ_TYPE_EDGE_FALLING;
+		} else if (type == IRQ_TYPE_EDGE_FALLING) {
+			if (val)
+				return;
+			type = IRQ_TYPE_EDGE_RISING;
+		}
+	} while (loop_limit-- > 0);
+	dev_warn_once(pctrl->dev, "dual-edge irq failed to stabilize\n");
+}
+
 static void msm_gpio_irq_ack(struct irq_data *d)
 {
 	struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -840,8 +886,11 @@  static void msm_gpio_irq_ack(struct irq_data *d)
 	unsigned long flags;
 	u32 val;
 
-	if (test_bit(d->hwirq, pctrl->skip_wake_irqs))
+	if (test_bit(d->hwirq, pctrl->skip_wake_irqs)) {
+		if (test_bit(d->hwirq, pctrl->dual_edge_irqs))
+			msm_gpio_update_dual_edge_parent(d);
 		return;
+	}
 
 	g = &pctrl->soc->groups[d->hwirq];
 
@@ -860,6 +909,17 @@  static void msm_gpio_irq_ack(struct irq_data *d)
 	raw_spin_unlock_irqrestore(&pctrl->lock, flags);
 }
 
+static bool msm_gpio_needs_dual_edge_parent_workaround(struct irq_data *d,
+						       unsigned int type)
+{
+	struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+	struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
+
+	return type == IRQ_TYPE_EDGE_BOTH &&
+	       pctrl->soc->wakeirq_dual_edge_errata && d->parent_data &&
+	       test_bit(d->hwirq, pctrl->skip_wake_irqs);
+}
+
 static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
 {
 	struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -868,11 +928,21 @@  static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
 	unsigned long flags;
 	u32 val;
 
+	if (msm_gpio_needs_dual_edge_parent_workaround(d, type)) {
+		set_bit(d->hwirq, pctrl->dual_edge_irqs);
+		irq_set_handler_locked(d, handle_fasteoi_ack_irq);
+		msm_gpio_update_dual_edge_parent(d);
+		return 0;
+	}
+
 	if (d->parent_data)
 		irq_chip_set_type_parent(d, type);
 
-	if (test_bit(d->hwirq, pctrl->skip_wake_irqs))
+	if (test_bit(d->hwirq, pctrl->skip_wake_irqs)) {
+		clear_bit(d->hwirq, pctrl->dual_edge_irqs);
+		irq_set_handler_locked(d, handle_fasteoi_irq);
 		return 0;
+	}
 
 	g = &pctrl->soc->groups[d->hwirq];
 
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.h b/drivers/pinctrl/qcom/pinctrl-msm.h
index 9452da18a78b..7486fe08eb9b 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.h
+++ b/drivers/pinctrl/qcom/pinctrl-msm.h
@@ -113,6 +113,9 @@  struct msm_gpio_wakeirq_map {
  * @pull_no_keeper: The SoC does not support keeper bias.
  * @wakeirq_map:    The map of wakeup capable GPIOs and the pin at PDC/MPM
  * @nwakeirq_map:   The number of entries in @wakeirq_map
+ * @wakeirq_dual_edge_errata: If true then GPIOs using the wakeirq_map need
+ *                            to be aware that their parent can't handle dual
+ *                            edge interrupts.
  */
 struct msm_pinctrl_soc_data {
 	const struct pinctrl_pin_desc *pins;
@@ -128,6 +131,7 @@  struct msm_pinctrl_soc_data {
 	const int *reserved_gpios;
 	const struct msm_gpio_wakeirq_map *wakeirq_map;
 	unsigned int nwakeirq_map;
+	bool wakeirq_dual_edge_errata;
 };
 
 extern const struct dev_pm_ops msm_pinctrl_dev_pm_ops;
diff --git a/drivers/pinctrl/qcom/pinctrl-sc7180.c b/drivers/pinctrl/qcom/pinctrl-sc7180.c
index 1b6465a882f2..1d9acad3c1ce 100644
--- a/drivers/pinctrl/qcom/pinctrl-sc7180.c
+++ b/drivers/pinctrl/qcom/pinctrl-sc7180.c
@@ -1147,6 +1147,7 @@  static const struct msm_pinctrl_soc_data sc7180_pinctrl = {
 	.ntiles = ARRAY_SIZE(sc7180_tiles),
 	.wakeirq_map = sc7180_pdc_map,
 	.nwakeirq_map = ARRAY_SIZE(sc7180_pdc_map),
+	.wakeirq_dual_edge_errata = true,
 };
 
 static int sc7180_pinctrl_probe(struct platform_device *pdev)