diff mbox series

[6/9] gpio: Add support for hierarchical IRQ domains

Message ID 20180921102546.12745-7-thierry.reding@gmail.com
State Changes Requested
Headers show
Series Implement wake event support on Tegra186 and later | expand

Commit Message

Thierry Reding Sept. 21, 2018, 10:25 a.m. UTC
From: Thierry Reding <treding@nvidia.com>

Hierarchical IRQ domains can be used to stack different IRQ controllers
on top of each other. One specific use-case where this can be useful is
if a power management controller has top-level controls for wakeup
interrupts. In such cases, the power management controller can be a
parent to other interrupt controllers and program additional registers
when an IRQ has its wake capability enabled or disabled.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpio/gpiolib.c      | 15 +++++++++++----
 include/linux/gpio/driver.h |  6 ++++++
 2 files changed, 17 insertions(+), 4 deletions(-)

Comments

Linus Walleij Sept. 25, 2018, 8:11 a.m. UTC | #1
Hi Thierry!

Thanks for the patch!

I am a bit ignorant about irqdomains so I will probably need an ACK
from some irq maintainer before I can apply this.

On Fri, Sep 21, 2018 at 12:25 PM Thierry Reding
<thierry.reding@gmail.com> wrote:

> From: Thierry Reding <treding@nvidia.com>
>
> Hierarchical IRQ domains can be used to stack different IRQ controllers
> on top of each other. One specific use-case where this can be useful is
> if a power management controller has top-level controls for wakeup
> interrupts. In such cases, the power management controller can be a
> parent to other interrupt controllers and program additional registers
> when an IRQ has its wake capability enabled or disabled.
>
> Signed-off-by: Thierry Reding <treding@nvidia.com>

While I think it is really important that we start supporting hierarchical
irqdomains in the gpiolib core, I want a more complete approach,
so that drivers that need hierarchical handling of irqdomains
can get the same support from gpiolib as they get for simple
domains.

> @@ -1918,7 +1918,9 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
>                 type = IRQ_TYPE_NONE;
>         }
>
> -       gpiochip->to_irq = gpiochip_to_irq;
> +       if (!gpiochip->to_irq)
> +               gpiochip->to_irq = gpiochip_to_irq;

So here you let the drivers override the .to_irq() function and that
I think gets confusing as we are asking gpiolib to handle our
irqchip.


> -       gpiochip->irq.domain = irq_domain_add_simple(np, gpiochip->ngpio,
> -                                                    gpiochip->irq.first,
> -                                                    ops, gpiochip);
> +       if (gpiochip->irq.parent_domain)
> +               gpiochip->irq.domain = irq_domain_add_hierarchy(gpiochip->irq.parent_domain,
> +                                                               0, gpiochip->ngpio,
> +                                                               np, ops, gpiochip);
> +       else
> +               gpiochip->irq.domain = irq_domain_add_simple(np, gpiochip->ngpio,
> +                                                            gpiochip->irq.first,
> +                                                            ops, gpiochip);

So this part is great: if we pass in a parent domain the core helps us
create the hierarchy.

But the stuff in .to_irq() should also be handled in the gpiolib core:
the irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &spec) for
example. That way you should not need any external .to_irq() function.

I can't see if you need to pull more stuff into the core to accomplish
that, but I think in essence the core gpiolib needs to be more helpful
with hierarchies.

Yours,
Linus Walleij
Thierry Reding Sept. 25, 2018, 9:33 a.m. UTC | #2
On Tue, Sep 25, 2018 at 10:11:06AM +0200, Linus Walleij wrote:
> Hi Thierry!
> 
> Thanks for the patch!
> 
> I am a bit ignorant about irqdomains so I will probably need an ACK
> from some irq maintainer before I can apply this.
> 
> On Fri, Sep 21, 2018 at 12:25 PM Thierry Reding
> <thierry.reding@gmail.com> wrote:
> 
> > From: Thierry Reding <treding@nvidia.com>
> >
> > Hierarchical IRQ domains can be used to stack different IRQ controllers
> > on top of each other. One specific use-case where this can be useful is
> > if a power management controller has top-level controls for wakeup
> > interrupts. In such cases, the power management controller can be a
> > parent to other interrupt controllers and program additional registers
> > when an IRQ has its wake capability enabled or disabled.
> >
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> 
> While I think it is really important that we start supporting hierarchical
> irqdomains in the gpiolib core, I want a more complete approach,
> so that drivers that need hierarchical handling of irqdomains
> can get the same support from gpiolib as they get for simple
> domains.
> 
> > @@ -1918,7 +1918,9 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
> >                 type = IRQ_TYPE_NONE;
> >         }
> >
> > -       gpiochip->to_irq = gpiochip_to_irq;
> > +       if (!gpiochip->to_irq)
> > +               gpiochip->to_irq = gpiochip_to_irq;
> 
> So here you let the drivers override the .to_irq() function and that
> I think gets confusing as we are asking gpiolib to handle our
> irqchip.
> 
> 
> > -       gpiochip->irq.domain = irq_domain_add_simple(np, gpiochip->ngpio,
> > -                                                    gpiochip->irq.first,
> > -                                                    ops, gpiochip);
> > +       if (gpiochip->irq.parent_domain)
> > +               gpiochip->irq.domain = irq_domain_add_hierarchy(gpiochip->irq.parent_domain,
> > +                                                               0, gpiochip->ngpio,
> > +                                                               np, ops, gpiochip);
> > +       else
> > +               gpiochip->irq.domain = irq_domain_add_simple(np, gpiochip->ngpio,
> > +                                                            gpiochip->irq.first,
> > +                                                            ops, gpiochip);
> 
> So this part is great: if we pass in a parent domain the core helps us
> create the hierarchy.
> 
> But the stuff in .to_irq() should also be handled in the gpiolib core:
> the irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &spec) for
> example. That way you should not need any external .to_irq() function.
> 
> I can't see if you need to pull more stuff into the core to accomplish
> that, but I think in essence the core gpiolib needs to be more helpful
> with hierarchies.

This is not as trivial as it sounds. I think we could probably provide a
simple helper in the core that may work for the majority of GPIO
controllers, and would be similar to irq_domain_xlate_twocell(). The
problem is that ->gpio_to_irq() effectively needs to convert the offset
of the pin in the GPIO controller to an IRQ specifier. If the IRQ domain
can use irq_domain_xlate_twocell(), that should be easy, but if it does
not, then we likely need a custom implementation as well.

For example, as you may remember, the Tegra186 GPIO controller is
somewhat quirky in that it has a number of banks, each of which can have
any number of pins up to 8. However, in order to prevent users from
attempting to use one of the non-existent GPIOs, we resorted to
compacting the GPIO number space so that the GPIO specifier uses
basically a (bank, pin) pair that is converted into a GPIO offset. The
same is done for interrupt specifiers.

Since there is no 1:1 relationship between the value in the specifier
and the GPIO offset, we can't use irq_domain_xlate_twocell(). Similarly
we won't be able to use the standard gpiochip_to_irq() counterpart for
two cell specifiers to construct the IRQ specifier, but instead we'll
have to convert it based on the offset and the number of pins per bank
(that is, the inverse of what we do in tegra186_gpio_of_xlate()).

I think we can probably just implement the simple two-cell version in
gpiochip_to_irq() directly and leave it up to drivers that require
something more to override ->to_irq().

Another alternative that I had pondered about was to add another level
of indirection and have a generic implementation in gpiochip_to_irq()
that calls back into a new ->to_irq_fwspec() function that drivers can
implement to fill in the struct irq_fwspec as required by the
irq_domain_alloc_irqs() function. We could still provide a default
implementation for the common two-cell case, so most drivers wouldn't
have to know about it.

I don't think any of the above has massive advantages over the other.
The latter will be slightly more compact, I would assume, but the former
gives us more flexibility if we need it.

Thierry
Linus Walleij Sept. 25, 2018, 10:33 a.m. UTC | #3
On Tue, Sep 25, 2018 at 11:33 AM Thierry Reding
<thierry.reding@gmail.com> wrote:
> On Tue, Sep 25, 2018 at 10:11:06AM +0200, Linus Walleij wrote:

> > While I think it is really important that we start supporting hierarchical
> > irqdomains in the gpiolib core, I want a more complete approach,
> > so that drivers that need hierarchical handling of irqdomains
> > can get the same support from gpiolib as they get for simple
> > domains.
(...)
> > I can't see if you need to pull more stuff into the core to accomplish
> > that, but I think in essence the core gpiolib needs to be more helpful
> > with hierarchies.
>
> This is not as trivial as it sounds. I think we could probably provide a
> simple helper in the core that may work for the majority of GPIO
> controllers, and would be similar to irq_domain_xlate_twocell(). The
> problem is that ->gpio_to_irq() effectively needs to convert the offset
> of the pin in the GPIO controller to an IRQ specifier. If the IRQ domain
> can use irq_domain_xlate_twocell(), that should be easy, but if it does
> not, then we likely need a custom implementation as well.

This sounds a lot like the "gpio-ranges" we have in the
gpiochip DT bindings, mapping gpio offsets to pin offsets.

I assume that we could just introduce a cross-mapping
array from IRQ to IRQ in struct gpio_irq_chip for the
hierarchical irqchip? Is it any
more complicated than an array of [(int, int)] tuples?

I guess this is what you have in mind for twocell?

> For example, as you may remember, the Tegra186 GPIO controller is
> somewhat quirky in that it has a number of banks, each of which can have
> any number of pins up to 8. However, in order to prevent users from
> attempting to use one of the non-existent GPIOs, we resorted to
> compacting the GPIO number space so that the GPIO specifier uses
> basically a (bank, pin) pair that is converted into a GPIO offset. The
> same is done for interrupt specifiers.

I guess this stuff is what you refer to?

#define TEGRA_MAIN_GPIO_PORT(port, base, count, controller)     \
        [TEGRA_MAIN_GPIO_PORT_##port] = {                       \
                .name = #port,                                  \
                .offset = base,                                 \
                .pins = count,                                  \
                .irq = controller,                              \
        }

static const struct tegra_gpio_port tegra186_main_ports[] = {
        TEGRA_MAIN_GPIO_PORT( A, 0x2000, 7, 2),
        TEGRA_MAIN_GPIO_PORT( B, 0x3000, 7, 3),
        TEGRA_MAIN_GPIO_PORT( C, 0x3200, 7, 3),
        TEGRA_MAIN_GPIO_PORT( D, 0x3400, 6, 3),
(...)

Maybe things have changed slightly.

As I see it there are some ways to go about this:

1. Create one gpiochip per bank and just register the number of
GPIOs actually accessible per bank offset from 0. This works
if one does not insist on having one gpiochip covering all pins,
and as long as all usable pins are stacked from offset 0..n.
(Tegra186 doesn't do this, it is registering one chip for all.)

2. If the above cannot be met, register enough pins to cover all
(e.g. 32 pins for a 32bit GPIO register) then mask off the
unused pins in .valid_mask in the gpio_chip. This was fairly
recently introduced to add ACPI support for Qualcomm, as
there were valid, but unusable GPIO offsets, but it can be
used to cut arbitrary holes in any range of offsets.

3. Some driver-specific way. Which seems to be what Tegra186
is doing.

Would you consider to move over to using method (2) to
get a more linear numberspace? I.e. register 8 GPIOs/IRQs
per port/bank and then just mask off the invalid ones?
.valid_mask in gpio_chip can be used for the GPIOs and
.valid_mask in the gpio_irq_chip can be used for IRQs.

Or do you think it is nonelegant?

> Since there is no 1:1 relationship between the value in the specifier
> and the GPIO offset, we can't use irq_domain_xlate_twocell().

Am I right that if you switch to method (2) above this is solved
and we get rid of the custom tegra186 xlate function == big win?

> I think we can probably just implement the simple two-cell version in
> gpiochip_to_irq() directly and leave it up to drivers that require
> something more to override ->to_irq().

And if what I assume (in my naive thinking) you can do with
.valid_mask is correct, then you can convert tegra186 to use
common twocell translation.

Sorry for being a pest, I just have a feeling we are reinventing
wheels here, I really want to pull as many fringe cases as
possible into gpiolib if I can so the maintenance gets
simpler.

Yours,
Linus Walleij
Thierry Reding Sept. 25, 2018, 11:17 a.m. UTC | #4
On Tue, Sep 25, 2018 at 12:33:54PM +0200, Linus Walleij wrote:
> On Tue, Sep 25, 2018 at 11:33 AM Thierry Reding
> <thierry.reding@gmail.com> wrote:
> > On Tue, Sep 25, 2018 at 10:11:06AM +0200, Linus Walleij wrote:
> 
> > > While I think it is really important that we start supporting hierarchical
> > > irqdomains in the gpiolib core, I want a more complete approach,
> > > so that drivers that need hierarchical handling of irqdomains
> > > can get the same support from gpiolib as they get for simple
> > > domains.
> (...)
> > > I can't see if you need to pull more stuff into the core to accomplish
> > > that, but I think in essence the core gpiolib needs to be more helpful
> > > with hierarchies.
> >
> > This is not as trivial as it sounds. I think we could probably provide a
> > simple helper in the core that may work for the majority of GPIO
> > controllers, and would be similar to irq_domain_xlate_twocell(). The
> > problem is that ->gpio_to_irq() effectively needs to convert the offset
> > of the pin in the GPIO controller to an IRQ specifier. If the IRQ domain
> > can use irq_domain_xlate_twocell(), that should be easy, but if it does
> > not, then we likely need a custom implementation as well.
> 
> This sounds a lot like the "gpio-ranges" we have in the
> gpiochip DT bindings, mapping gpio offsets to pin offsets.
> 
> I assume that we could just introduce a cross-mapping
> array from IRQ to IRQ in struct gpio_irq_chip for the
> hierarchical irqchip? Is it any
> more complicated than an array of [(int, int)] tuples?
> 
> I guess this is what you have in mind for twocell?

For twocell I think it would be even easier, because the IRQ specifier
can just be reconstructed from (offset, irq_type). That's all the simple
two-cell does, right? It's a 1:1 mapping of specifier to GPIO offset to
IRQ offset.

> > For example, as you may remember, the Tegra186 GPIO controller is
> > somewhat quirky in that it has a number of banks, each of which can have
> > any number of pins up to 8. However, in order to prevent users from
> > attempting to use one of the non-existent GPIOs, we resorted to
> > compacting the GPIO number space so that the GPIO specifier uses
> > basically a (bank, pin) pair that is converted into a GPIO offset. The
> > same is done for interrupt specifiers.
> 
> I guess this stuff is what you refer to?
> 
> #define TEGRA_MAIN_GPIO_PORT(port, base, count, controller)     \
>         [TEGRA_MAIN_GPIO_PORT_##port] = {                       \
>                 .name = #port,                                  \
>                 .offset = base,                                 \
>                 .pins = count,                                  \
>                 .irq = controller,                              \
>         }
> 
> static const struct tegra_gpio_port tegra186_main_ports[] = {
>         TEGRA_MAIN_GPIO_PORT( A, 0x2000, 7, 2),
>         TEGRA_MAIN_GPIO_PORT( B, 0x3000, 7, 3),
>         TEGRA_MAIN_GPIO_PORT( C, 0x3200, 7, 3),
>         TEGRA_MAIN_GPIO_PORT( D, 0x3400, 6, 3),
> (...)
> 
> Maybe things have changed slightly.
> 
> As I see it there are some ways to go about this:
> 
> 1. Create one gpiochip per bank and just register the number of
> GPIOs actually accessible per bank offset from 0. This works
> if one does not insist on having one gpiochip covering all pins,
> and as long as all usable pins are stacked from offset 0..n.
> (Tegra186 doesn't do this, it is registering one chip for all.)
> 
> 2. If the above cannot be met, register enough pins to cover all
> (e.g. 32 pins for a 32bit GPIO register) then mask off the
> unused pins in .valid_mask in the gpio_chip. This was fairly
> recently introduced to add ACPI support for Qualcomm, as
> there were valid, but unusable GPIO offsets, but it can be
> used to cut arbitrary holes in any range of offsets.
> 
> 3. Some driver-specific way. Which seems to be what Tegra186
> is doing.
> 
> Would you consider to move over to using method (2) to
> get a more linear numberspace? I.e. register 8 GPIOs/IRQs
> per port/bank and then just mask off the invalid ones?
> .valid_mask in gpio_chip can be used for the GPIOs and
> .valid_mask in the gpio_irq_chip can be used for IRQs.
> 
> Or do you think it is nonelegant?

This is all pretty much the same discussion that I remember we had
earlier this year. Nothing's changed so far.

Back at the time I had pointed out that we'd be wasting a lot of memory
by registering 8 GPIOs/IRQs per bank, because on average only about 60-
75% of the GPIOs are actually used. In addition we waste processing
resources by having to check the GPIO offset against the valid mask
every time we want to access a GPIO.

I think that's inelegant, but from the rest of what you're saying you
don't see it that way.

> > Since there is no 1:1 relationship between the value in the specifier
> > and the GPIO offset, we can't use irq_domain_xlate_twocell().
> 
> Am I right that if you switch to method (2) above this is solved
> and we get rid of the custom tegra186 xlate function == big win?
> 
> > I think we can probably just implement the simple two-cell version in
> > gpiochip_to_irq() directly and leave it up to drivers that require
> > something more to override ->to_irq().
> 
> And if what I assume (in my naive thinking) you can do with
> .valid_mask is correct, then you can convert tegra186 to use
> common twocell translation.
> 
> Sorry for being a pest, I just have a feeling we are reinventing
> wheels here, I really want to pull as many fringe cases as
> possible into gpiolib if I can so the maintenance gets
> simpler.

Like I said, we had this very discussion a couple of months ago, and I
don't think I want to go through all of it again. I think, yes, I could
make Tegra186 work with .valid_mask, even if I consider it wasteful. So
if that's what you want, I'll go rewrite the driver so we'll never have
to repeat this again.

Thierry
Linus Walleij Oct. 3, 2018, 7:52 a.m. UTC | #5
On Tue, Sep 25, 2018 at 1:17 PM Thierry Reding <thierry.reding@gmail.com> wrote:
> On Tue, Sep 25, 2018 at 12:33:54PM +0200, Linus Walleij wrote:
> > On Tue, Sep 25, 2018 at 11:33 AM Thierry Reding
> > <thierry.reding@gmail.com> wrote:
> > > On Tue, Sep 25, 2018 at 10:11:06AM +0200, Linus Walleij wrote:

> > As I see it there are some ways to go about this:
> >
> > 1. Create one gpiochip per bank and just register the number of
> > GPIOs actually accessible per bank offset from 0. This works
> > if one does not insist on having one gpiochip covering all pins,
> > and as long as all usable pins are stacked from offset 0..n.
> > (Tegra186 doesn't do this, it is registering one chip for all.)
> >
> > 2. If the above cannot be met, register enough pins to cover all
> > (e.g. 32 pins for a 32bit GPIO register) then mask off the
> > unused pins in .valid_mask in the gpio_chip. This was fairly
> > recently introduced to add ACPI support for Qualcomm, as
> > there were valid, but unusable GPIO offsets, but it can be
> > used to cut arbitrary holes in any range of offsets.
> >
> > 3. Some driver-specific way. Which seems to be what Tegra186
> > is doing.
> >
> > Would you consider to move over to using method (2) to
> > get a more linear numberspace? I.e. register 8 GPIOs/IRQs
> > per port/bank and then just mask off the invalid ones?
> > .valid_mask in gpio_chip can be used for the GPIOs and
> > .valid_mask in the gpio_irq_chip can be used for IRQs.
> >
> > Or do you think it is nonelegant?
>
> This is all pretty much the same discussion that I remember we had
> earlier this year. Nothing's changed so far.
>
> Back at the time I had pointed out that we'd be wasting a lot of memory
> by registering 8 GPIOs/IRQs per bank, because on average only about 60-
> 75% of the GPIOs are actually used. In addition we waste processing
> resources by having to check the GPIO offset against the valid mask
> every time we want to access a GPIO.
>
> I think that's inelegant, but from the rest of what you're saying you
> don't see it that way.

(...)

> > Sorry for being a pest, I just have a feeling we are reinventing
> > wheels here, I really want to pull as many fringe cases as
> > possible into gpiolib if I can so the maintenance gets
> > simpler.
>
> Like I said, we had this very discussion a couple of months ago, and I
> don't think I want to go through all of it again. I think, yes, I could
> make Tegra186 work with .valid_mask, even if I consider it wasteful. So
> if that's what you want, I'll go rewrite the driver so we'll never have
> to repeat this again.

Well, IMO rolling custom code into every driver that needs a
hierarchical interrupt is pretty inelegant too, so I guess it is one
of those choose the lesser evil situations for me as a maintainer.

I am very happy if this hairy hierarchical irqdomain handling can
be standardized and pushed into the core, and as it seems this
memory optimization stops that and forces a local solution with
some necessarilily different code, also for xlate since a while
back and now for the hierarchical irqdomain.

So if I let this pass then I will just be grumpy again the next
time another local kludge needs to be added.

I am worried that there will be more and more of this special
code just to account for the nonlinear blocks of GPIOs.

I agree memory footprint matters, as does performance,
as does maintainability and reuse. So it's not like there is
a hard factor determining this.

How much memory are we talking about for a tegra186, and
how much is that in proportion to how much memory a
typical tegra186 system has connected to it? If it is significant
and hurting the kernel and/or applications on that platform
I would agree we need to look into memory optimizations.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index a53d17745d21..94146093ee95 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1918,7 +1918,9 @@  static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
 		type = IRQ_TYPE_NONE;
 	}
 
-	gpiochip->to_irq = gpiochip_to_irq;
+	if (!gpiochip->to_irq)
+		gpiochip->to_irq = gpiochip_to_irq;
+
 	gpiochip->irq.default_type = type;
 	gpiochip->irq.lock_key = lock_key;
 	gpiochip->irq.request_key = request_key;
@@ -1928,9 +1930,14 @@  static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
 	else
 		ops = &gpiochip_domain_ops;
 
-	gpiochip->irq.domain = irq_domain_add_simple(np, gpiochip->ngpio,
-						     gpiochip->irq.first,
-						     ops, gpiochip);
+	if (gpiochip->irq.parent_domain)
+		gpiochip->irq.domain = irq_domain_add_hierarchy(gpiochip->irq.parent_domain,
+								0, gpiochip->ngpio,
+								np, ops, gpiochip);
+	else
+		gpiochip->irq.domain = irq_domain_add_simple(np, gpiochip->ngpio,
+							     gpiochip->irq.first,
+							     ops, gpiochip);
 	if (!gpiochip->irq.domain)
 		return -EINVAL;
 
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index d8dcd0e44cab..fcd09a396d76 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -47,6 +47,12 @@  struct gpio_irq_chip {
 	 */
 	const struct irq_domain_ops *domain_ops;
 
+	/**
+	 * @parent_domain:
+	 *
+	 */
+	struct irq_domain *parent_domain;
+
 	/**
 	 * @handler:
 	 *