Patchwork [RFCv1,07/11] irqchip: armada-370-xp: add MSI support to interrupt controller driver

login
register
mail settings
Submitter Thomas Petazzoni
Date March 26, 2013, 4:52 p.m.
Message ID <1364316746-8702-8-git-send-email-thomas.petazzoni@free-electrons.com>
Download mbox | patch
Permalink /patch/231465/
State Not Applicable
Headers show

Comments

Thomas Petazzoni - March 26, 2013, 4:52 p.m.
This commit introduces the support for the MSI interrupts in the
armada-370-xp interrupt controller driver. It registers an IRQ domain
associated with the 'msi' node in the Device Tree, which the PCI
controller will refer to in a followup commit.

The MSI interrupts use the 16 high doorbells, and are therefore
notified using IRQ1 of the main interrupt controller.

The Device Tree binding documentation for the armada-370-xp driver is
also updated in the process.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 .../devicetree/bindings/arm/armada-370-xp-mpic.txt |   39 +++++++--
 arch/arm/boot/dts/armada-370-xp.dtsi               |    6 ++
 arch/arm/mach-mvebu/Kconfig                        |    1 +
 drivers/irqchip/irq-armada-370-xp.c                |   91 +++++++++++++++++++-
 4 files changed, 130 insertions(+), 7 deletions(-)
Arnd Bergmann - March 26, 2013, 5:07 p.m.
On Tuesday 26 March 2013, Thomas Petazzoni wrote:
> +              mpic: main-intc@d0020000 {
> +                    #interrupt-cells = <1>;
> +                    interrupt-controller;
> +              };
> +
> +              msi: msi-intc@d0020000 {
> +                    #interrupt-cells = <1>;
> +                    interrupt-controller;
> +                    marvell,doorbell = <0xd0020a04>;
> +              };

I think the @d0020000 part needs to be removed for the nodes that
have no reg property.

I think I did not follow the entire discussion. What has led to having
two subnodes in the end, rather than a single node like this?


       interrupt-controller@d0020000 {
              compatible = "marvell,mpic";
              #interrupt-cells = <1>;
              #msi-cells = <1>;
              #address-cells = <1>;
              #size-cells = <1>;
              interrupt-controller;
              msi-controller;
              reg = <0xd0020a00 0x1d0>,
                     <0xd0021070 0x58>;
              marvell,doorbell = <0xd0020a04>;
       };


	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni - March 26, 2013, 5:17 p.m.
Dear Arnd Bergmann,

On Tue, 26 Mar 2013 17:07:41 +0000, Arnd Bergmann wrote:

> I think the @d0020000 part needs to be removed for the nodes that
> have no reg property.

Sure, will fix.

> I think I did not follow the entire discussion. What has led to having
> two subnodes in the end, rather than a single node like this?
> 
> 
>        interrupt-controller@d0020000 {
>               compatible = "marvell,mpic";
>               #interrupt-cells = <1>;
>               #msi-cells = <1>;
>               #address-cells = <1>;
>               #size-cells = <1>;
>               interrupt-controller;
>               msi-controller;
>               reg = <0xd0020a00 0x1d0>,
>                      <0xd0021070 0x58>;
>               marvell,doorbell = <0xd0020a04>;
>        };

I've tried to explain that in the commit log of PATCH 6, which says:

    However, we need the driver to expose two different IRQ domains: one
    for the main interrupt controller itself, and one for the MSI
    interrupt controller. In order to achieve this, we will create two
    subnodes in the interrupt-controller@d0020000 node: one subnode for
    the main interrupt controller, and one subnode for the MSI interrupt
    controller. The two irq domains can't be registered on the same DT
    node, otherwise when irq_find_host() gets used by of_irq_map_one()
    to resolve IRQs of devices, they may find the MSI interrupt
    controller instead of the main interrupt controller.
    
    Note that both the parent and the child node need to have the
    'interrupt-controller' empty property:
    
      * The interrupt-controller property is needed in the main
    interrupt controller node (interrupt-controller@d0020000) because
    the of_irq_init() function skips nodes that are matching the given
       compatible string, but that don't have the interrupt-controller
       property.
    
     * The interrupt-controller property is needed in the child
    interrupt controller node (main-intc@d0020000) otherwise the
    resolution done by of_irq_map_one() doesn't work.

So really, the thing is that irq_domain_add_linear() registers an IRQ
domain on a specific DT node, and then irq_find_host() finds back an
IRQ domain from a given DT node. So if you have two IRQ domains
registered on the same DT node, then you don't know which one will be
used.

So if I do the two irq_domain_add_linear() (one for MPIC, one for MSI)
on one single DT node, when the timer driver will request its
interrupt, it turns out that the MSI IRQ domain is used and not the
MPIC IRQ domain, even though the timer has <&mpic> as its interrupt
parent.

Best regards,

Thomas
Jason Gunthorpe - March 26, 2013, 6:02 p.m.
On Tue, Mar 26, 2013 at 05:52:22PM +0100, Thomas Petazzoni wrote:

> This commit introduces the support for the MSI interrupts in the
> armada-370-xp interrupt controller driver. It registers an IRQ domain
> associated with the 'msi' node in the Device Tree, which the PCI
> controller will refer to in a followup commit.

I've got some MSI stuff working on Kirkwood using the doorbells, so I
can confirm this general approach works in HW.

What do you think it would take to extend this to other Marvell SOCs?

BTW, in my testing on Kirkwood I found that register ..40010 in the
PEX had to be set to the internal register base to make this work. Did
you find something similar?

> The MSI interrupts use the 16 high doorbells, and are therefore
> notified using IRQ1 of the main interrupt controller.

I was initially a bit surprised this was the high doorbell bits, but I
checked and it is right, the 16 bit MSI maps to the high two bytes in
the 32 bit MemWr TLP.

FWIW, MSI-X is not restricted to 16 bits, so if you can detect from
the PCI layer if it is setting up MSI or MSI-X you could allocate low
bits first to MSI-X and high bits first to MSI, increasing the number
of available MSI/MSI-X vectors.

> +   - marvell,doorbell: Gives the physical address at which PCIe
> +     devices should write to signal an MSI interrupt.

Why is this necessary? Can't the doorbell register physical address be
computed by the driver? AFAIK there is no possibility for address
translation on SOC inbound TLPs.

Thinking about it a bit, maybe less magic code is needed here, be
explicit about the available interrupts in the DT:

pcie-controller {
   msi-interrupts = <0xd0020a04 (1<<16) &msi 16
                     0xd0020a04 (1<<17) &msi 17
                     [..]
   msi-x-interrupts = <0xd0020a04 (1<<1) &msi 1
                       0xd0020a04 (1<<2) &msi 2
                       [..]

There is a better chance of that supporting other Marvell SOCs.. Not
sure, just throwing it out there.

Regarding the sub node, I'm not sure it should be necessary. You
don't need to differentiate the MSI vs non-MSI case in the doorbell
register at the interrupt controller level. Eg this ..
  
> +#ifdef CONFIG_PCI_MSI
> +static struct irq_chip armada_370_xp_msi_irq_chip = {
> +	.name = "armada_370_xp_msi_irq",
> +	.irq_enable = unmask_msi_irq,
> +	.irq_disable = mask_msi_irq,
> +	.irq_mask = mask_msi_irq,
> +	.irq_unmask = unmask_msi_irq,
> +};

.. isn't necessary for doorbell. With MSI cases on other archs there
is no local MSI interrupt controller, the MSI MemWr TLP directly
creates an interrupt at the CPU. This requires using the *_msi_irq
functions to control the interrupt at the source, since there is no
control at the destination.

However, Marvell's doorbell can be controlled at the destination, so
it is better to handle it that way, especially since it creates
symmetry with the IPI usage.

I used:
       priv->gc = irq_alloc_generic_chip("kirkwood_doorbell_irq", 1,
                                   irq_base, priv->base,
                                   handle_fasteoi_irq);
       [..]				  
       ct->regs.mask = 4;
       ct->regs.eoi = 0;
       ct->chip.irq_eoi = irq_gc_eoi_inv;
       ct->chip.irq_mask = irq_gc_mask_clr_bit;
       ct->chip.irq_unmask = irq_gc_mask_set_bit;

Which should work correctly for the IPI and MSI cases. The handle_fasteoi_irq
is important.

>  #ifdef CONFIG_SMP
>  void armada_mpic_send_doorbell(const struct cpumask *mask, unsigned int irq)
>  {
> @@ -220,12 +280,39 @@ armada_370_xp_handle_irq(struct pt_regs *regs)
>  		if (irqnr > 1022)
>  			break;
>  
> -		if (irqnr > 0) {
> +		if (irqnr > 1) {
>  			irqnr =	irq_find_mapping(armada_370_xp_mpic_domain,
>  					irqnr);
>  			handle_IRQ(irqnr, regs);
>  			continue;
>  		}
> +
> +#ifdef CONFIG_PCI_MSI
> +		/* MSI handling */
> +		if (irqnr == 1) {
> +			u32 msimask, msinr;
> +
> +			msimask = readl_relaxed(per_cpu_int_base +
> +					ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS)
> +				& PCI_MSI_DOORBELL_MASK;
> +
> +			writel(~PCI_MSI_DOORBELL_MASK, per_cpu_int_base +
> +			       ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);

Be careful here, the ordering of the EOI in relation to the handler is
important. If you use my stuff from above then the MSI and IPI cases
are identical and the core code handles ack'ing the doorbell via
irq_eoi at the right moment, you can just uniformly iterate over all
32 bits and there is no need to care if it is MSI or IPI.

Also, I'm not super familiar with the irq stuff, but is
irq_find_mapping the best way? Most of the drivers I looked at used
irq_alloc_descs to get a contiguous range of irq numbers and then just
used a simple offset in the handle_irq...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann - March 26, 2013, 6:38 p.m.
On Tuesday 26 March 2013, Thomas Petazzoni wrote:

> 
> I've tried to explain that in the commit log of PATCH 6, which says:
> 
>     However, we need the driver to expose two different IRQ domains: one
>     for the main interrupt controller itself, and one for the MSI
>     interrupt controller. In order to achieve this, we will create two
>     subnodes in the interrupt-controller@d0020000 node: one subnode for
>     the main interrupt controller, and one subnode for the MSI interrupt
>     controller. The two irq domains can't be registered on the same DT
>     node, otherwise when irq_find_host() gets used by of_irq_map_one()
>     to resolve IRQs of devices, they may find the MSI interrupt
>     controller instead of the main interrupt controller.

Right, I should have read the commit log better ...

>     Note that both the parent and the child node need to have the
>     'interrupt-controller' empty property:
>     
>       * The interrupt-controller property is needed in the main
>     interrupt controller node (interrupt-controller@d0020000) because
>     the of_irq_init() function skips nodes that are matching the given
>        compatible string, but that don't have the interrupt-controller
>        property.
>     
>      * The interrupt-controller property is needed in the child
>     interrupt controller node (main-intc@d0020000) otherwise the
>     resolution done by of_irq_map_one() doesn't work.

If you add compatible properties to the children, that would work
I suppose.

> So really, the thing is that irq_domain_add_linear() registers an IRQ
> domain on a specific DT node, and then irq_find_host() finds back an
> IRQ domain from a given DT node. So if you have two IRQ domains
> registered on the same DT node, then you don't know which one will be
> used.
> 
> So if I do the two irq_domain_add_linear() (one for MPIC, one for MSI)
> on one single DT node, when the timer driver will request its
> interrupt, it turns out that the MSI IRQ domain is used and not the
> MPIC IRQ domain, even though the timer has <&mpic> as its interrupt
> parent.

I still wonder if the real solution shouldn't instead be to make the
irq domain code MSI aware. For instance, you don't really need a
cell to describe an interrupt because the interrupt number is
not a hardware property. So an MSI using device doesn't really
needs an "msis" or "interrupts" property, just an "msi-parent",
and we can add code to handle as a separate domain even if you
have a single device node that can do both level and message
interrupts.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni - March 26, 2013, 8:46 p.m.
Dear Arnd Bergmann,

On Tue, 26 Mar 2013 18:38:22 +0000, Arnd Bergmann wrote:

> >     Note that both the parent and the child node need to have the
> >     'interrupt-controller' empty property:
> >     
> >       * The interrupt-controller property is needed in the main
> >     interrupt controller node (interrupt-controller@d0020000)
> > because the of_irq_init() function skips nodes that are matching
> > the given compatible string, but that don't have the
> > interrupt-controller property.
> >     
> >      * The interrupt-controller property is needed in the child
> >     interrupt controller node (main-intc@d0020000) otherwise the
> >     resolution done by of_irq_map_one() doesn't work.
> 
> If you add compatible properties to the children, that would work
> I suppose.

To which children? Only to the main-intc children? If so,
armada_370_xp_mpic_of_init() would be called with a 'device_node *'
that points to the main-intc, correct? Then it would have to go back up
in the Device Tree to find the msi node? It's doable of course, but
sounds strange, no?

To me, the topology of the hardware is really that a single device
provides two features: the main interrupt controller and the MSI
interrupt controller. But I will adapt to whatever DT binding you
propose.

> > So really, the thing is that irq_domain_add_linear() registers an
> > IRQ domain on a specific DT node, and then irq_find_host() finds
> > back an IRQ domain from a given DT node. So if you have two IRQ
> > domains registered on the same DT node, then you don't know which
> > one will be used.
> > 
> > So if I do the two irq_domain_add_linear() (one for MPIC, one for
> > MSI) on one single DT node, when the timer driver will request its
> > interrupt, it turns out that the MSI IRQ domain is used and not the
> > MPIC IRQ domain, even though the timer has <&mpic> as its interrupt
> > parent.
> 
> I still wonder if the real solution shouldn't instead be to make the
> irq domain code MSI aware. For instance, you don't really need a
> cell to describe an interrupt because the interrupt number is
> not a hardware property. So an MSI using device doesn't really
> needs an "msis" or "interrupts" property, just an "msi-parent",
> and we can add code to handle as a separate domain even if you
> have a single device node that can do both level and message
> interrupts.

So the msi-parent property should point to the single mpic node? But
then the IRQ domain for MSI cannot be registered on this single MPIC
node. Then, what would be the first argument of:

        armada_370_xp_msi_domain =
                irq_domain_add_linear(msi_node, PCI_MSI_DOORBELL_NR,
                                      &armada_370_xp_msi_irq_ops, NULL);

and how would the PCIe driver get a pointer to this IRQ domain? (In the
currently proposed code, it just does a irq_find_host(), which sounded
very simple and straightforward).

To clarify: I really don't mind reworking the code, but unfortunately
"make the IRQ domain code MSI aware" is too vague for me to understand
the direction you're thinking of.

Thanks for your review and comments!

Thomas
Arnd Bergmann - March 26, 2013, 9:10 p.m.
On Tuesday 26 March 2013, Thomas Petazzoni wrote:
> Dear Arnd Bergmann,
> 
> On Tue, 26 Mar 2013 18:38:22 +0000, Arnd Bergmann wrote:
> 
> > >     Note that both the parent and the child node need to have the
> > >     'interrupt-controller' empty property:
> > >     
> > >       * The interrupt-controller property is needed in the main
> > >     interrupt controller node (interrupt-controller@d0020000)
> > > because the of_irq_init() function skips nodes that are matching
> > > the given compatible string, but that don't have the
> > > interrupt-controller property.
> > >     
> > >      * The interrupt-controller property is needed in the child
> > >     interrupt controller node (main-intc@d0020000) otherwise the
> > >     resolution done by of_irq_map_one() doesn't work.
> > 
> > If you add compatible properties to the children, that would work
> > I suppose.
> 
> To which children? Only to the main-intc children? If so,
> armada_370_xp_mpic_of_init() would be called with a 'device_node *'
> that points to the main-intc, correct? Then it would have to go back up
> in the Device Tree to find the msi node? It's doable of course, but
> sounds strange, no?

I was thinking of registering two init functions as well.

> To me, the topology of the hardware is really that a single device
> provides two features: the main interrupt controller and the MSI
> interrupt controller. But I will adapt to whatever DT binding you
> propose.

My preference would be to have no sub-nodes but rather make the
code deal with an interrupt controller that has an interrupt domain
for regular IRQs but can also handle MSI.

> > I still wonder if the real solution shouldn't instead be to make the
> > irq domain code MSI aware. For instance, you don't really need a
> > cell to describe an interrupt because the interrupt number is
> > not a hardware property. So an MSI using device doesn't really
> > needs an "msis" or "interrupts" property, just an "msi-parent",
> > and we can add code to handle as a separate domain even if you
> > have a single device node that can do both level and message
> > interrupts.
> 
> So the msi-parent property should point to the single mpic node? But
> then the IRQ domain for MSI cannot be registered on this single MPIC
> node. Then, what would be the first argument of:
> 
>         armada_370_xp_msi_domain =
>                 irq_domain_add_linear(msi_node, PCI_MSI_DOORBELL_NR,
>                                       &armada_370_xp_msi_irq_ops, NULL);
> 
> and how would the PCIe driver get a pointer to this IRQ domain? (In the
> currently proposed code, it just does a irq_find_host(), which sounded
> very simple and straightforward).

I guess one way would be to have a single domain that is responsible
for the controller and handles both MSI and legacy interrupts. That
could probably be done without changes to the interrupt domain code.

Another option would be to add an irq_domain_add_msi() function that
creates a domain which is also tied to the same device node but does
not interact with it when going through the legacy interrupts.
You could then add a matching msi_find_host() or irq_find_msi_host()
function to return the domain.

> To clarify: I really don't mind reworking the code, but unfortunately
> "make the IRQ domain code MSI aware" is too vague for me to understand
> the direction you're thinking of.

I don't have specific code in mind yet, mainly playing around with the
possibilities for now.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe - March 26, 2013, 9:14 p.m.
On Tue, Mar 26, 2013 at 09:46:13PM +0100, Thomas Petazzoni wrote:

> To me, the topology of the hardware is really that a single device
> provides two features: the main interrupt controller and the MSI
> interrupt controller. But I will adapt to whatever DT binding you
> propose.

No.. the HW is a single device that provides an interrupt on register
write capability, so it ideally should be a single DT node..

The need to distinguish MSI vs IPI vs other usage is completely a side
effect of how Linux's IRQ and PCI layers are hooked together today.

> > I still wonder if the real solution shouldn't instead be to make the
> > irq domain code MSI aware. For instance, you don't really need a
> > cell to describe an interrupt because the interrupt number is

Some kind of generic way for an irq chip driver to say 'here, I can
provide some MSI interrupts' and then for the PCI layer to say 'irq
layer, find me a driver that can provision a MSI with XXX properties' ?

This need to stack an irq chip under a MSI is not something I think
the kernel has had to support before, so new common code is probably
needed...

The interrupt chip should not need to know what the ultimate consumer
of the interrupt capability will be, it just needs to tell the
consumer 'write D to physical address A and irq I will trigger'.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann - March 26, 2013, 9:15 p.m.
On Tuesday 26 March 2013, Thomas Petazzoni wrote:
> +                       msimask = readl_relaxed(per_cpu_int_base +
> +                                       ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS)
> +                               & PCI_MSI_DOORBELL_MASK;
> +
> +                       writel(~PCI_MSI_DOORBELL_MASK, per_cpu_int_base +
> +                              ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
> +

Upon reading this code again, I stumbled over the barriers. You use
a readl_relaxed() without barrier but a writel() with barrier. Is
that intentional? Are you sure that you don't need a full readl()
to guarantee that all inbound DMA that was sent by the device before
the MSI message has arrived by the time the interrupt handler function
is called? It depends on the implementation of the MSI controller whether
that guarantee is already made by the fact that you are handling the
interrupt.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni - March 26, 2013, 9:16 p.m.
Dear Jason Gunthorpe,

On Tue, 26 Mar 2013 12:02:45 -0600, Jason Gunthorpe wrote:

> > This commit introduces the support for the MSI interrupts in the
> > armada-370-xp interrupt controller driver. It registers an IRQ
> > domain associated with the 'msi' node in the Device Tree, which the
> > PCI controller will refer to in a followup commit.
> 
> I've got some MSI stuff working on Kirkwood using the doorbells, so I
> can confirm this general approach works in HW.
> 
> What do you think it would take to extend this to other Marvell SOCs?

A bit of time. It is on my list to use the PCIe driver on Kirkwood and
other Marvell SoCs as well.

> BTW, in my testing on Kirkwood I found that register ..40010 in the
> PEX had to be set to the internal register base to make this work. Did
> you find something similar?

The original code I used was prepared by Lior Amsalem, and it seems
like it was not needed, maybe because the bootloader had already
configured the PCIe interfaces. I'll have a look to see if those
registers already contain the right values, but certainly it would be
safer if the PCIe driver was setting their value. Not a big deal to
fix, I believe.

> > The MSI interrupts use the 16 high doorbells, and are therefore
> > notified using IRQ1 of the main interrupt controller.
> 
> I was initially a bit surprised this was the high doorbell bits, but I
> checked and it is right, the 16 bit MSI maps to the high two bytes in
> the 32 bit MemWr TLP.

Right. The low 8 bits are used for the IPIs, and the high 16 bits are
used for MSIs. I think it has been chosen to be like this because then
we have two different interrupt numbers for the IPIs and the MSIs,
which make the thing easy to handle in the IRQ controller driver.

> FWIW, MSI-X is not restricted to 16 bits, so if you can detect from
> the PCI layer if it is setting up MSI or MSI-X you could allocate low
> bits first to MSI-X and high bits first to MSI, increasing the number
> of available MSI/MSI-X vectors.

This could be an improvement. There are also other, non-per-CPU,
doorbell interrupts that could potentially be used. Can we consider
this a possible improvement, and not something that is fundamentally
necessary? For now, I'm trying to get the current feature set merged,
and not necessarily to extend it to cover all possible features of the
hardware.

> > +   - marvell,doorbell: Gives the physical address at which PCIe
> > +     devices should write to signal an MSI interrupt.
> 
> Why is this necessary? Can't the doorbell register physical address be
> computed by the driver? AFAIK there is no possibility for address
> translation on SOC inbound TLPs.

It is the responsibility of the PCIe driver to prepare the 'struct
msi_msg', which contains the physical address at which the PCIe device
should write to trigger an MSI. But this physical address is part of
the interrupt controller registers, so there is no way for the PCIe
driver to magically know about it.

Please see the code where we're using this:
https://github.com/MISL-EBU-System-SW/mainline-public/blob/marvell-pcie-msi-v1/drivers/pci/host/pci-mvebu.c#L844.

If you think this physical address can be "computed" by the driver, I'm
all open to suggestions.

> Thinking about it a bit, maybe less magic code is needed here, be
> explicit about the available interrupts in the DT:
> 
> pcie-controller {
>    msi-interrupts = <0xd0020a04 (1<<16) &msi 16
>                      0xd0020a04 (1<<17) &msi 17
>                      [..]
>    msi-x-interrupts = <0xd0020a04 (1<<1) &msi 1
>                        0xd0020a04 (1<<2) &msi 2
>                        [..]
> 
> There is a better chance of that supporting other Marvell SOCs.. Not
> sure, just throwing it out there.

Isn't that very verbose, to list each and every MSI interrupt, bit per
bit? I'm fine with doing that (except maybe implement both MSI and
MSI-X support, I'd like to stick with the current feature set for now),
but it sounds like a lot more code in the DT and a lot more code in the
driver to parse this... just to get the exact same feature.

Arnd, what is your feeling about this suggestion?

> Regarding the sub node, I'm not sure it should be necessary. You
> don't need to differentiate the MSI vs non-MSI case in the doorbell
> register at the interrupt controller level. Eg this ..
>   
> > +#ifdef CONFIG_PCI_MSI
> > +static struct irq_chip armada_370_xp_msi_irq_chip = {
> > +	.name = "armada_370_xp_msi_irq",
> > +	.irq_enable = unmask_msi_irq,
> > +	.irq_disable = mask_msi_irq,
> > +	.irq_mask = mask_msi_irq,
> > +	.irq_unmask = unmask_msi_irq,
> > +};
> 
> .. isn't necessary for doorbell. With MSI cases on other archs there
> is no local MSI interrupt controller, the MSI MemWr TLP directly
> creates an interrupt at the CPU. This requires using the *_msi_irq
> functions to control the interrupt at the source, since there is no
> control at the destination.
> 
> However, Marvell's doorbell can be controlled at the destination, so
> it is better to handle it that way, especially since it creates
> symmetry with the IPI usage.

Hum, ok. But the MSI and IPI are handled quite differently: MSIs have
to call handle_IRQ(), while IPIs have to call handle_IPI(), so you'd
still have to distinguish between IPIs and MSIs. In the current driver,
IPIs don't have an associated irq_chip structure.

> I used:
>        priv->gc = irq_alloc_generic_chip("kirkwood_doorbell_irq", 1,
>                                    irq_base, priv->base,
>                                    handle_fasteoi_irq);
>        [..]				  
>        ct->regs.mask = 4;
>        ct->regs.eoi = 0;
>        ct->chip.irq_eoi = irq_gc_eoi_inv;
>        ct->chip.irq_mask = irq_gc_mask_clr_bit;
>        ct->chip.irq_unmask = irq_gc_mask_set_bit;
> 
> Which should work correctly for the IPI and MSI cases. The
> handle_fasteoi_irq is important.

I don't understand how this can end up calling handle_IPI() when IPIs
are raised. There are no IPIs on Kirkwood, so are you sure the Kirkwood
logic can apply to the SMP case of Armada XP, which requires IPIs?

> >  #ifdef CONFIG_SMP
> >  void armada_mpic_send_doorbell(const struct cpumask *mask,
> > unsigned int irq) {
> > @@ -220,12 +280,39 @@ armada_370_xp_handle_irq(struct pt_regs *regs)
> >  		if (irqnr > 1022)
> >  			break;
> >  
> > -		if (irqnr > 0) {
> > +		if (irqnr > 1) {
> >  			irqnr =
> > irq_find_mapping(armada_370_xp_mpic_domain, irqnr);
> >  			handle_IRQ(irqnr, regs);
> >  			continue;
> >  		}
> > +
> > +#ifdef CONFIG_PCI_MSI
> > +		/* MSI handling */
> > +		if (irqnr == 1) {
> > +			u32 msimask, msinr;
> > +
> > +			msimask = readl_relaxed(per_cpu_int_base +
> > +
> > ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS)
> > +				& PCI_MSI_DOORBELL_MASK;
> > +
> > +			writel(~PCI_MSI_DOORBELL_MASK,
> > per_cpu_int_base +
> > +			       ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
> 
> Be careful here, the ordering of the EOI in relation to the handler is
> important. If you use my stuff from above then the MSI and IPI cases
> are identical and the core code handles ack'ing the doorbell via
> irq_eoi at the right moment, you can just uniformly iterate over all
> 32 bits and there is no need to care if it is MSI or IPI.

Again, I don't see how it's possible to not care whether it's MSI or
IPI. IPIs have to call handle_IPI() which is a ARM-specific call, and I
don't understand how handle_fasteoi_irq() would end up calling
handle_IPI().

> Also, I'm not super familiar with the irq stuff, but is
> irq_find_mapping the best way? Most of the drivers I looked at used
> irq_alloc_descs to get a contiguous range of irq numbers and then just
> used a simple offset in the handle_irq...

I'll let Arnd answer this one, but I'm pretty sure that using IRQ
domains is the way to go. The fact that a number of drivers don't yet
use IRQ domains is maybe just because they haven't been converted yet.

Thanks,

Thomas
Arnd Bergmann - March 26, 2013, 9:31 p.m.
On Tuesday 26 March 2013, Thomas Petazzoni wrote:
> > FWIW, MSI-X is not restricted to 16 bits, so if you can detect from
> > the PCI layer if it is setting up MSI or MSI-X you could allocate low
> > bits first to MSI-X and high bits first to MSI, increasing the number
> > of available MSI/MSI-X vectors.
> 
> This could be an improvement. There are also other, non-per-CPU,
> doorbell interrupts that could potentially be used. Can we consider
> this a possible improvement, and not something that is fundamentally
> necessary? For now, I'm trying to get the current feature set merged,
> and not necessarily to extend it to cover all possible features of the
> hardware.

If we are extending the DT binding for the current feature, we should
at least think about how it would look like for future extensions, to
make sure it won't be fundamentally incompatible.

> > > +   - marvell,doorbell: Gives the physical address at which PCIe
> > > +     devices should write to signal an MSI interrupt.
> > 
> > Why is this necessary? Can't the doorbell register physical address be
> > computed by the driver? AFAIK there is no possibility for address
> > translation on SOC inbound TLPs.
> 
> It is the responsibility of the PCIe driver to prepare the 'struct
> msi_msg', which contains the physical address at which the PCIe device
> should write to trigger an MSI. But this physical address is part of
> the interrupt controller registers, so there is no way for the PCIe
> driver to magically know about it.

If we introduce an irq_find_msi_host() interface, we can also introduce
an interface to return the doorbell register, or more. I suppose
we could actually have a generic version of your mvebu_pcie_setup_msi_irq()
function that looks up the domain from the device node and calls
a new irq_domain_ops function, which allocates a free MSI hwirq number,
creates a mapping for it, and fills out a struct msi_msg with the
doorbell register and data.

> > Thinking about it a bit, maybe less magic code is needed here, be
> > explicit about the available interrupts in the DT:
> > 
> > pcie-controller {
> >    msi-interrupts = <0xd0020a04 (1<<16) &msi 16
> >                      0xd0020a04 (1<<17) &msi 17
> >                      [..]
> >    msi-x-interrupts = <0xd0020a04 (1<<1) &msi 1
> >                        0xd0020a04 (1<<2) &msi 2
> >                        [..]
> > 
> > There is a better chance of that supporting other Marvell SOCs.. Not
> > sure, just throwing it out there.
> 
> Isn't that very verbose, to list each and every MSI interrupt, bit per
> bit? I'm fine with doing that (except maybe implement both MSI and
> MSI-X support, I'd like to stick with the current feature set for now),
> but it sounds like a lot more code in the DT and a lot more code in the
> driver to parse this... just to get the exact same feature.
> 
> Arnd, what is your feeling about this suggestion?

I think we should only need an msi-parent property and let the details
be handled by the irq driver.
 
> > Also, I'm not super familiar with the irq stuff, but is
> > irq_find_mapping the best way? Most of the drivers I looked at used
> > irq_alloc_descs to get a contiguous range of irq numbers and then just
> > used a simple offset in the handle_irq...
> 
> I'll let Arnd answer this one, but I'm pretty sure that using IRQ
> domains is the way to go. The fact that a number of drivers don't yet
> use IRQ domains is maybe just because they haven't been converted yet.

Yes, irq_find_mapping is what we should be using here.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni - March 26, 2013, 9:37 p.m.
Dear Arnd Bergmann,

On Tue, 26 Mar 2013 21:10:15 +0000, Arnd Bergmann wrote:

> > To which children? Only to the main-intc children? If so,
> > armada_370_xp_mpic_of_init() would be called with a 'device_node *'
> > that points to the main-intc, correct? Then it would have to go back up
> > in the Device Tree to find the msi node? It's doable of course, but
> > sounds strange, no?
> 
> I was thinking of registering two init functions as well.

But then which of the two init functions would do the of_iomap() (or
whatever ioremap()ing function you use) ?

Remember, the very reason what we have *one* driver for both interrupt
controllers is because the registers are completely mixed. So to me
it's really the case of *one* device providing *two* features, like a
device that would be both an Ethernet interface and a SPI controller.
You have a single ->probe() function that gets called when the device
is detected, and this ->probe() function registers both an Ethernet
interface and a SPI controller.

To me, the case we have here is really identical: we have one single set
of registers that provide multiple features.

> > To me, the topology of the hardware is really that a single device
> > provides two features: the main interrupt controller and the MSI
> > interrupt controller. But I will adapt to whatever DT binding you
> > propose.
> 
> My preference would be to have no sub-nodes but rather make the
> code deal with an interrupt controller that has an interrupt domain
> for regular IRQs but can also handle MSI.

Hum, ok.

> > > I still wonder if the real solution shouldn't instead be to make the
> > > irq domain code MSI aware. For instance, you don't really need a
> > > cell to describe an interrupt because the interrupt number is
> > > not a hardware property. So an MSI using device doesn't really
> > > needs an "msis" or "interrupts" property, just an "msi-parent",
> > > and we can add code to handle as a separate domain even if you
> > > have a single device node that can do both level and message
> > > interrupts.
> > 
> > So the msi-parent property should point to the single mpic node? But
> > then the IRQ domain for MSI cannot be registered on this single MPIC
> > node. Then, what would be the first argument of:
> > 
> >         armada_370_xp_msi_domain =
> >                 irq_domain_add_linear(msi_node, PCI_MSI_DOORBELL_NR,
> >                                       &armada_370_xp_msi_irq_ops, NULL);
> > 
> > and how would the PCIe driver get a pointer to this IRQ domain? (In the
> > currently proposed code, it just does a irq_find_host(), which sounded
> > very simple and straightforward).
> 
> I guess one way would be to have a single domain that is responsible
> for the controller and handles both MSI and legacy interrupts. That
> could probably be done without changes to the interrupt domain code.
> 
> Another option would be to add an irq_domain_add_msi() function that
> creates a domain which is also tied to the same device node but does
> not interact with it when going through the legacy interrupts.
> You could then add a matching msi_find_host() or irq_find_msi_host()
> function to return the domain.

This option seems doable. Not sure yet how to do it exactly, but at
least I understand the idea.

> > To clarify: I really don't mind reworking the code, but unfortunately
> > "make the IRQ domain code MSI aware" is too vague for me to understand
> > the direction you're thinking of.
> 
> I don't have specific code in mind yet, mainly playing around with the
> possibilities for now.

Sure, I understand. But I also don't have specific ideas: the current
code works fine for me, and I don't find it really ugly. So if you don't
point me in the direction that you think would make the code less ugly,
then I'm lost.

Best regards,

Thomas
Thomas Petazzoni - March 26, 2013, 9:41 p.m.
Dear Jason Gunthorpe,

On Tue, 26 Mar 2013 15:14:03 -0600, Jason Gunthorpe wrote:
> On Tue, Mar 26, 2013 at 09:46:13PM +0100, Thomas Petazzoni wrote:
> 
> > To me, the topology of the hardware is really that a single device
> > provides two features: the main interrupt controller and the MSI
> > interrupt controller. But I will adapt to whatever DT binding you
> > propose.
> 
> No.. the HW is a single device that provides an interrupt on register
> write capability, so it ideally should be a single DT node..

No, here you're only talking about the interrupt on register write
capability (used for doorbells), but the interrupt controller is also
used for all other devices. Not only IPIs and MSIs are handled by this
piece of code.

> The need to distinguish MSI vs IPI vs other usage is completely a side
> effect of how Linux's IRQ and PCI layers are hooked together today.

Yes. And since I live in today's world, I try to adapt somewhat to
it :-)

> > > I still wonder if the real solution shouldn't instead be to make the
> > > irq domain code MSI aware. For instance, you don't really need a
> > > cell to describe an interrupt because the interrupt number is
> 
> Some kind of generic way for an irq chip driver to say 'here, I can
> provide some MSI interrupts' and then for the PCI layer to say 'irq
> layer, find me a driver that can provision a MSI with XXX properties' ?
> 
> This need to stack an irq chip under a MSI is not something I think
> the kernel has had to support before, so new common code is probably
> needed...
> 
> The interrupt chip should not need to know what the ultimate consumer
> of the interrupt capability will be, it just needs to tell the
> consumer 'write D to physical address A and irq I will trigger'.

I honestly don't see much difference between what you're saying here
and what the proposed code is doing. The IRQ driver says "hay, I
provide two IRQ domains, one is named mpic, one is named msi". And then
all the regular devices have interrupt-parent = <&mpic> to say "I use
legacy interrupts from that controller", and the PCIe controller has
msi-parent = <&msi> to say "I use MSI interrupt from that controller".

Best regards,

Thomas
Thomas Petazzoni - March 26, 2013, 9:42 p.m.
Dear Arnd Bergmann,

On Tue, 26 Mar 2013 21:15:40 +0000, Arnd Bergmann wrote:
> On Tuesday 26 March 2013, Thomas Petazzoni wrote:
> > +                       msimask = readl_relaxed(per_cpu_int_base +
> > +                                       ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS)
> > +                               & PCI_MSI_DOORBELL_MASK;
> > +
> > +                       writel(~PCI_MSI_DOORBELL_MASK, per_cpu_int_base +
> > +                              ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
> > +
> 
> Upon reading this code again, I stumbled over the barriers. You use
> a readl_relaxed() without barrier but a writel() with barrier. Is
> that intentional? Are you sure that you don't need a full readl()
> to guarantee that all inbound DMA that was sent by the device before
> the MSI message has arrived by the time the interrupt handler function
> is called? It depends on the implementation of the MSI controller whether
> that guarantee is already made by the fact that you are handling the
> interrupt.

This question I will have to raise to the Marvell HW engineers, the
datasheet does not go into these considerations. For now, I'm rather
trying to validate the 'architecture' of the code: DT binding,
interaction between the IRQ controller driver and the PCIe driver, etc.

Thanks,

Thomas
Thomas Petazzoni - March 26, 2013, 9:47 p.m.
Dear Arnd Bergmann,

On Tue, 26 Mar 2013 21:31:45 +0000, Arnd Bergmann wrote:
> On Tuesday 26 March 2013, Thomas Petazzoni wrote:
> > > FWIW, MSI-X is not restricted to 16 bits, so if you can detect from
> > > the PCI layer if it is setting up MSI or MSI-X you could allocate low
> > > bits first to MSI-X and high bits first to MSI, increasing the number
> > > of available MSI/MSI-X vectors.
> > 
> > This could be an improvement. There are also other, non-per-CPU,
> > doorbell interrupts that could potentially be used. Can we consider
> > this a possible improvement, and not something that is fundamentally
> > necessary? For now, I'm trying to get the current feature set merged,
> > and not necessarily to extend it to cover all possible features of the
> > hardware.
> 
> If we are extending the DT binding for the current feature, we should
> at least think about how it would look like for future extensions, to
> make sure it won't be fundamentally incompatible.

Sure.

> > > > +   - marvell,doorbell: Gives the physical address at which PCIe
> > > > +     devices should write to signal an MSI interrupt.
> > > 
> > > Why is this necessary? Can't the doorbell register physical address be
> > > computed by the driver? AFAIK there is no possibility for address
> > > translation on SOC inbound TLPs.
> > 
> > It is the responsibility of the PCIe driver to prepare the 'struct
> > msi_msg', which contains the physical address at which the PCIe device
> > should write to trigger an MSI. But this physical address is part of
> > the interrupt controller registers, so there is no way for the PCIe
> > driver to magically know about it.
> 
> If we introduce an irq_find_msi_host() interface, we can also introduce
> an interface to return the doorbell register, or more. I suppose
> we could actually have a generic version of your mvebu_pcie_setup_msi_irq()
> function that looks up the domain from the device node and calls
> a new irq_domain_ops function, which allocates a free MSI hwirq number,
> creates a mapping for it, and fills out a struct msi_msg with the
> doorbell register and data.

Ok, sounds like a plan. I must admit I'm not very familiar with the IRQ
domain code, but I guess I should take this as an opportunity to become
a little bit more familiar :)

> > > Thinking about it a bit, maybe less magic code is needed here, be
> > > explicit about the available interrupts in the DT:
> > > 
> > > pcie-controller {
> > >    msi-interrupts = <0xd0020a04 (1<<16) &msi 16
> > >                      0xd0020a04 (1<<17) &msi 17
> > >                      [..]
> > >    msi-x-interrupts = <0xd0020a04 (1<<1) &msi 1
> > >                        0xd0020a04 (1<<2) &msi 2
> > >                        [..]
> > > 
> > > There is a better chance of that supporting other Marvell SOCs.. Not
> > > sure, just throwing it out there.
> > 
> > Isn't that very verbose, to list each and every MSI interrupt, bit per
> > bit? I'm fine with doing that (except maybe implement both MSI and
> > MSI-X support, I'd like to stick with the current feature set for now),
> > but it sounds like a lot more code in the DT and a lot more code in the
> > driver to parse this... just to get the exact same feature.
> > 
> > Arnd, what is your feeling about this suggestion?
> 
> I think we should only need an msi-parent property and let the details
> be handled by the irq driver.

Ok. Having the irq driver allocate the MSI interrupt number as you
suggested above seems a good idea.

> > > Also, I'm not super familiar with the irq stuff, but is
> > > irq_find_mapping the best way? Most of the drivers I looked at used
> > > irq_alloc_descs to get a contiguous range of irq numbers and then just
> > > used a simple offset in the handle_irq...
> > 
> > I'll let Arnd answer this one, but I'm pretty sure that using IRQ
> > domains is the way to go. The fact that a number of drivers don't yet
> > use IRQ domains is maybe just because they haven't been converted yet.
> 
> Yes, irq_find_mapping is what we should be using here.

Ok, thanks.

Thomas
Arnd Bergmann - March 26, 2013, 9:53 p.m.
On Tuesday 26 March 2013, Thomas Petazzoni wrote:
> On Tue, 26 Mar 2013 21:10:15 +0000, Arnd Bergmann wrote:
> 
> > > To which children? Only to the main-intc children? If so,
> > > armada_370_xp_mpic_of_init() would be called with a 'device_node *'
> > > that points to the main-intc, correct? Then it would have to go back up
> > > in the Device Tree to find the msi node? It's doable of course, but
> > > sounds strange, no?
> > 
> > I was thinking of registering two init functions as well.
> 
> But then which of the two init functions would do the of_iomap() (or
> whatever ioremap()ing function you use) ?

I agree this is where it get a little fishy, hence my preference to
use a single node.

> Remember, the very reason what we have one driver for both interrupt
> controllers is because the registers are completely mixed. So to me
> it's really the case of one device providing two features, like a
> device that would be both an Ethernet interface and a SPI controller.
> You have a single ->probe() function that gets called when the device
> is detected, and this ->probe() function registers both an Ethernet
> interface and a SPI controller.b
> 
> To me, the case we have here is really identical: we have one single set
> of registers that provide multiple features.

The problem here is that the irq subsystem makes certain assumptions
about one device node being the controller and mapping to a single
irq domain, so you'd have to cheat one way or another. Using the
sub-nodes the way you do in your patch is bending the rules for
the device tree binding, which I think is more problematic than
bending the rules for the code. Regarding the call to of_iomap(),
you could work around it by having a static variable in the driver
that remembers the mmio address and gets set by whichever device node
init function gets called first.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe - March 26, 2013, 9:55 p.m.
On Tue, Mar 26, 2013 at 10:16:54PM +0100, Thomas Petazzoni wrote:

> > FWIW, MSI-X is not restricted to 16 bits, so if you can detect from
> > the PCI layer if it is setting up MSI or MSI-X you could allocate low
> > bits first to MSI-X and high bits first to MSI, increasing the number
> > of available MSI/MSI-X vectors.
> 
> This could be an improvement. There are also other, non-per-CPU,
> doorbell interrupts that could potentially be used. Can we consider
> this a possible improvement, and not something that is fundamentally
> necessary? For now, I'm trying to get the current feature set merged,
> and not necessarily to extend it to cover all possible features of the
> hardware.

A major point of MSI is to be able to direct interrupts on a CPU by
CPU basis. Looks like XP has per-cpu and all-cpu doorbell bits?

Combined with this:

> It is the responsibility of the PCIe driver to prepare the 'struct
> msi_msg', which contains the physical address at which the PCIe device
> should write to trigger an MSI. But this physical address is part of

Makes me think the split of responsibility created by moving the MSI
ops into the PCI host structure is not correct.

The PCIe host driver just seems to get in the way, it has no knowledge
it is adding to the process.

irqchip knows:
 - what the physical address of the doorbell is
 - how to construct an address that is per-cpu or all-cpu
 - which bits in the doorbell registers are allocated and which are
   free

pci has none of that info.

Looking at this some more, there is tonnes of stuff in linux that when
a PCI MSI is allocated a special IRQ number is created for it that has
special properties - eg set_affinity on that number actually goes into
the MSI table and changes it.

The cleanest would be to keep the doorbell driver purely related to
the doorbell and when a request for a PCI MSI comes in allocate a new
irq_chip (like arch/x86/kernel/apic/io_apic.c does) that has all the
special PCI stuff and chain it to the proper bit in the doorbell. 

Optimizing to remove function calls from the interrupt stack could
happen later.

> > However, Marvell's doorbell can be controlled at the destination, so
> > it is better to handle it that way, especially since it creates
> > symmetry with the IPI usage.
> 
> Hum, ok. But the MSI and IPI are handled quite differently: MSIs have
> to call handle_IRQ(), while IPIs have to call handle_IPI(), so you'd
> still have to distinguish between IPIs and MSIs. In the current driver,
> IPIs don't have an associated irq_chip structure.

Once you have a proper generic interrupt driver you can go ahead and
use request_irq to grab N bits of the doorbell register and assign
them to a handler that only calls handle_IPI(ipinr,get_irq_regs()).

It is not necessary to keep IPI and the irqchip driver convoluted
together.

> Again, I don't see how it's possible to not care whether it's MSI or
> IPI. IPIs have to call handle_IPI() which is a ARM-specific call, and I
> don't understand how handle_fasteoi_irq() would end up calling
> handle_IPI().

- The main IRQ vector is entered, it decodes the main cause register
  and calls the handler for the doorbell
- The doorbell handler is setup as a chained handler and it uses
  handle_fasteoi_irq to enter into armada_370_xp_handle_irq
- armada_370_xp_handle_irq then runs through all bits and calls their
  handlers
- the handler for IPI bits is associated with the IPI handler that
  simply calls handle_IPI(...)

handle_fasteoi_irq acks's and clears the handled bits in the doorbell
register at the proper time.

> I'll let Arnd answer this one, but I'm pretty sure that using IRQ
> domains is the way to go. The fact that a number of drivers don't yet
> use IRQ domains is maybe just because they haven't been converted yet.

Maybe, but they have irq domain code as well.. I'm curious about the
answer too :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann - March 26, 2013, 10:04 p.m.
On Tuesday 26 March 2013, Jason Gunthorpe wrote:
> > I'll let Arnd answer this one, but I'm pretty sure that using IRQ
> > domains is the way to go. The fact that a number of drivers don't yet
> > use IRQ domains is maybe just because they haven't been converted yet.
> 
> Maybe, but they have irq domain code as well.. I'm curious about the
> answer too :)

To expand on my previous answer: irq_alloc_descs can require a lot of
memory if you have a lot of MSIs. If the driver only has 16 MSIs,
there is probably no reason to not assign all of them at once, but if
you have 2048 MSIs, creating the mapping only when you need it is
much more space efficient.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni - March 26, 2013, 10:06 p.m.
Dear Jason Gunthorpe,

On Tue, 26 Mar 2013 15:55:46 -0600, Jason Gunthorpe wrote:

> > > FWIW, MSI-X is not restricted to 16 bits, so if you can detect from
> > > the PCI layer if it is setting up MSI or MSI-X you could allocate low
> > > bits first to MSI-X and high bits first to MSI, increasing the number
> > > of available MSI/MSI-X vectors.
> > 
> > This could be an improvement. There are also other, non-per-CPU,
> > doorbell interrupts that could potentially be used. Can we consider
> > this a possible improvement, and not something that is fundamentally
> > necessary? For now, I'm trying to get the current feature set merged,
> > and not necessarily to extend it to cover all possible features of the
> > hardware.
> 
> A major point of MSI is to be able to direct interrupts on a CPU by
> CPU basis. Looks like XP has per-cpu and all-cpu doorbell bits?

Yes. Two per-CPU interrupts, IRQ 0 for the low 16 bits of the first
doorbell register, IRQ 1 for the high 16 bits of the first doorbell
register. And then, three global interrupts, each with a 32 bits
register associated to trigger the interrupt. At least, that's my
understanding of the datasheet.

> Combined with this:
> 
> > It is the responsibility of the PCIe driver to prepare the 'struct
> > msi_msg', which contains the physical address at which the PCIe device
> > should write to trigger an MSI. But this physical address is part of
> 
> Makes me think the split of responsibility created by moving the MSI
> ops into the PCI host structure is not correct.
> 
> The PCIe host driver just seems to get in the way, it has no knowledge
> it is adding to the process.
> 
> irqchip knows:
>  - what the physical address of the doorbell is
>  - how to construct an address that is per-cpu or all-cpu
>  - which bits in the doorbell registers are allocated and which are
>    free
> 
> pci has none of that info.

Wait, wait. Did you look at the Tegra PCIe driver? In the Tegra case,
the MSI stuff is handled completely by the PCIe unit, and does not need
the special interaction with the IRQ controller driver that I need.

The fact that PCIe and MSI handling are completely separate matters on
Marvell may just be a specific situation. On other platforms, the
physical register that gets written to by PCIe devices to trigger a MSI
may well be located within the PCIe interface registers, and therefore
the MSI thing should be handled by the PCIe driver.

> Looking at this some more, there is tonnes of stuff in linux that when
> a PCI MSI is allocated a special IRQ number is created for it that has
> special properties - eg set_affinity on that number actually goes into
> the MSI table and changes it.
> 
> The cleanest would be to keep the doorbell driver purely related to
> the doorbell and when a request for a PCI MSI comes in allocate a new
> irq_chip (like arch/x86/kernel/apic/io_apic.c does) that has all the
> special PCI stuff and chain it to the proper bit in the doorbell. 
> 
> Optimizing to remove function calls from the interrupt stack could
> happen later.
> 
> > > However, Marvell's doorbell can be controlled at the destination, so
> > > it is better to handle it that way, especially since it creates
> > > symmetry with the IPI usage.
> > 
> > Hum, ok. But the MSI and IPI are handled quite differently: MSIs have
> > to call handle_IRQ(), while IPIs have to call handle_IPI(), so you'd
> > still have to distinguish between IPIs and MSIs. In the current driver,
> > IPIs don't have an associated irq_chip structure.
> 
> Once you have a proper generic interrupt driver you can go ahead and
> use request_irq to grab N bits of the doorbell register and assign
> them to a handler that only calls handle_IPI(ipinr,get_irq_regs()).
> 
> It is not necessary to keep IPI and the irqchip driver convoluted
> together.
> 
> > Again, I don't see how it's possible to not care whether it's MSI or
> > IPI. IPIs have to call handle_IPI() which is a ARM-specific call, and I
> > don't understand how handle_fasteoi_irq() would end up calling
> > handle_IPI().
> 
> - The main IRQ vector is entered, it decodes the main cause register
>   and calls the handler for the doorbell
> - The doorbell handler is setup as a chained handler and it uses
>   handle_fasteoi_irq to enter into armada_370_xp_handle_irq
> - armada_370_xp_handle_irq then runs through all bits and calls their
>   handlers
> - the handler for IPI bits is associated with the IPI handler that
>   simply calls handle_IPI(...)
> 
> handle_fasteoi_irq acks's and clears the handled bits in the doorbell
> register at the proper time.

Could you propose some code that implements this? The existing code
works fine, and is modeled after what the GIC IRQ driver is doing. I
don't say what you're proposing is not interesting, but just that it
looks like every time I come with some code, you're suggesting me to
'reinvent' the whole world, and you've never come up with some code to
help in a direction or another.

Thanks,

Thomas
Jason Gunthorpe - March 26, 2013, 10:26 p.m.
On Tue, Mar 26, 2013 at 11:06:56PM +0100, Thomas Petazzoni wrote:

> > The PCIe host driver just seems to get in the way, it has no knowledge
> > it is adding to the process.
> > 
> > irqchip knows:
> >  - what the physical address of the doorbell is
> >  - how to construct an address that is per-cpu or all-cpu
> >  - which bits in the doorbell registers are allocated and which are
> >    free
> > 
> > pci has none of that info.
> 
> Wait, wait. Did you look at the Tegra PCIe driver? In the Tegra case,
> the MSI stuff is handled completely by the PCIe unit, and does not need
> the special interaction with the IRQ controller driver that I need.

Yes, but only briefly. It doesn't matter if the mechanisms are in the
PCI-E register block or someplace else. If irqchip is providing the
interface then the PCI-E driver would have to also instantiate an
irqchip block.

This is no different from what would be required to handle INTx on
Marvell, the PCI-E driver would have to create a chained irqchip from
the PEX interrupt and decode the cause register into INTA/B/C/D.

> > > Again, I don't see how it's possible to not care whether it's MSI or
> > > IPI. IPIs have to call handle_IPI() which is a ARM-specific call, and I
> > > don't understand how handle_fasteoi_irq() would end up calling
> > > handle_IPI().
> > 
> > - The main IRQ vector is entered, it decodes the main cause register
> >   and calls the handler for the doorbell
> > - The doorbell handler is setup as a chained handler and it uses
> >   handle_fasteoi_irq to enter into armada_370_xp_handle_irq
> > - armada_370_xp_handle_irq then runs through all bits and calls their
> >   handlers
> > - the handler for IPI bits is associated with the IPI handler that
> >   simply calls handle_IPI(...)
> > 
> > handle_fasteoi_irq acks's and clears the handled bits in the doorbell
> > register at the proper time.
> 
> Could you propose some code that implements this? The existing code
> works fine, and is modeled after what the GIC IRQ driver is doing. I
> don't say what you're proposing is not interesting, but just that it
> looks like every time I come with some code, you're suggesting me to
> 'reinvent' the whole world, and you've never come up with some code to
> help in a direction or another.

I'll send you my doorbell driver for Kirkwood - it doesn't attempt to
generically solve the MSI problem, but the basic template shows how to
wire up the doorbell concept to irqchip's generic code, and it works
in the MSI role with lots of rather ugly hacking..

The XP driver is taking some small shortcuts because it only uses the
doorbell for IPI..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt b/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt
index 61df564..2233d20 100644
--- a/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt
+++ b/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt
@@ -4,8 +4,6 @@  Marvell Armada 370 and Armada XP Interrupt Controller
 Required properties:
 - compatible: Should be "marvell,mpic"
 - interrupt-controller: Identifies the node as an interrupt controller.
-- #interrupt-cells: The number of cells to define the interrupts. Should be 1.
-  The cell is the IRQ number
 
 - reg: Should contain PMIC registers location and length. First pair
   for the main interrupt registers, second pair for the per-CPU
@@ -14,16 +12,45 @@  Required properties:
   automatically map to the interrupt controller registers of the
   current CPU)
 
+The node should have two sub-nodes, each describing one aspect of the
+interrupt controller:
 
+ * A first node named main-intc that describes the main interrupt
+   controller itself, receiving interrupts from all devices.
+
+   This sub-node should have the following properties:
+   - interrupt-controller: identifies the code as an interrupt
+     controller.
+   - #interrupt-cells: The number of cells to define the
+     interrupts. Should be 1.  The cell is the IRQ number
+
+ * A second node named msi-intc that is used for Message Signaled
+   Interrupts of the PCIe bus.
+
+   This sub-node should have the following properties:
+   - interrupt-controller: identifies the code as an interrupt
+     controller.
+   - #interrupt-cells: The number of cells to define the
+     interrupts. Should be 1.  The cell is the IRQ number
+   - marvell,doorbell: Gives the physical address at which PCIe
+     devices should write to signal an MSI interrupt.
 
 Example:
 
-        mpic: interrupt-controller@d0020000 {
+        interrupt-controller@d0020000 {
               compatible = "marvell,mpic";
-              #interrupt-cells = <1>;
-              #address-cells = <1>;
-              #size-cells = <1>;
               interrupt-controller;
               reg = <0xd0020a00 0x1d0>,
                     <0xd0021070 0x58>;
+
+              mpic: main-intc@d0020000 {
+                    #interrupt-cells = <1>;
+                    interrupt-controller;
+              };
+
+              msi: msi-intc@d0020000 {
+                    #interrupt-cells = <1>;
+                    interrupt-controller;
+                    marvell,doorbell = <0xd0020a04>;
+              };
         };
diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index de054ed..de6569cd 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -36,6 +36,12 @@ 
 		    #interrupt-cells = <1>;
 		    interrupt-controller;
 	      };
+
+	      msi: msi-intc@d0020000 {
+		    #interrupt-cells = <1>;
+		    interrupt-controller;
+		    marvell,doorbell = <0xd0020a04>;
+	      };
 	};
 
 	coherency-fabric@d0020200 {
diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig
index d353249..3254b77 100644
--- a/arch/arm/mach-mvebu/Kconfig
+++ b/arch/arm/mach-mvebu/Kconfig
@@ -16,6 +16,7 @@  config ARCH_MVEBU
 	select MVEBU_MBUS
 	select MIGHT_HAVE_PCI
 	select PCI_QUIRKS if PCI
+	select ARCH_SUPPORTS_MSI if PCI
 
 if ARCH_MVEBU
 
diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 4b2a6d7..3180797 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -22,6 +22,7 @@ 
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
 #include <linux/irqdomain.h>
+#include <linux/msi.h>
 #include <asm/mach/arch.h>
 #include <asm/exception.h>
 #include <asm/smp_plat.h>
@@ -51,6 +52,10 @@ 
 #define IPI_DOORBELL_START                      (0)
 #define IPI_DOORBELL_END                        (8)
 #define IPI_DOORBELL_MASK                       0xFF
+#define PCI_MSI_DOORBELL_START                  (16)
+#define PCI_MSI_DOORBELL_NR                     (16)
+#define PCI_MSI_DOORBELL_END                    (32)
+#define PCI_MSI_DOORBELL_MASK                   0xFFFF0000
 
 static DEFINE_RAW_SPINLOCK(irq_controller_lock);
 
@@ -58,6 +63,10 @@  static void __iomem *per_cpu_int_base;
 static void __iomem *main_int_base;
 static struct irq_domain *armada_370_xp_mpic_domain;
 
+#ifdef CONFIG_PCI_MSI
+static struct irq_domain *armada_370_xp_msi_domain;
+#endif
+
 /*
  * In SMP mode:
  * For shared global interrupts, mask/unmask global enable bit
@@ -167,6 +176,57 @@  static int armada_370_xp_mpic_irq_map(struct irq_domain *h,
 	return 0;
 }
 
+#ifdef CONFIG_PCI_MSI
+static struct irq_chip armada_370_xp_msi_irq_chip = {
+	.name = "armada_370_xp_msi_irq",
+	.irq_enable = unmask_msi_irq,
+	.irq_disable = mask_msi_irq,
+	.irq_mask = mask_msi_irq,
+	.irq_unmask = unmask_msi_irq,
+};
+
+static int armada_370_xp_msi_map(struct irq_domain *domain, unsigned int virq,
+				 irq_hw_number_t hw)
+{
+	irq_set_chip_and_handler(virq, &armada_370_xp_msi_irq_chip,
+				 handle_simple_irq);
+	set_irq_flags(virq, IRQF_VALID);
+
+	return 0;
+}
+
+static const struct irq_domain_ops armada_370_xp_msi_irq_ops = {
+	.map = armada_370_xp_msi_map,
+};
+
+void armada_370_xp_msi_init(struct device_node *node)
+{
+	struct device_node *msi_node;
+	u32 reg;
+
+	msi_node = of_get_child_by_name(node, "msi-intc");
+	if (!msi_node)
+		return;
+
+	armada_370_xp_msi_domain =
+		irq_domain_add_linear(msi_node, PCI_MSI_DOORBELL_NR,
+				      &armada_370_xp_msi_irq_ops, NULL);
+	if (!armada_370_xp_msi_domain)
+		panic("Unable to add Armada 370/XP MSI irq domain\n");
+
+	reg = readl(per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_MSK_OFFS)
+		| PCI_MSI_DOORBELL_MASK;
+
+	writel(reg, per_cpu_int_base +
+	       ARMADA_370_XP_IN_DRBEL_MSK_OFFS);
+
+	/* Unmask IPI interrupt */
+	writel(1, per_cpu_int_base + ARMADA_370_XP_INT_CLEAR_MASK_OFFS);
+}
+#else
+static void armada_370_xp_msi_init(struct device_node *node) { }
+#endif  /* CONFIG_PCI_MSI */
+
 #ifdef CONFIG_SMP
 void armada_mpic_send_doorbell(const struct cpumask *mask, unsigned int irq)
 {
@@ -220,12 +280,39 @@  armada_370_xp_handle_irq(struct pt_regs *regs)
 		if (irqnr > 1022)
 			break;
 
-		if (irqnr > 0) {
+		if (irqnr > 1) {
 			irqnr =	irq_find_mapping(armada_370_xp_mpic_domain,
 					irqnr);
 			handle_IRQ(irqnr, regs);
 			continue;
 		}
+
+#ifdef CONFIG_PCI_MSI
+		/* MSI handling */
+		if (irqnr == 1) {
+			u32 msimask, msinr;
+
+			msimask = readl_relaxed(per_cpu_int_base +
+					ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS)
+				& PCI_MSI_DOORBELL_MASK;
+
+			writel(~PCI_MSI_DOORBELL_MASK, per_cpu_int_base +
+			       ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
+
+			for (msinr = PCI_MSI_DOORBELL_START;
+			     msinr < PCI_MSI_DOORBELL_END; msinr++) {
+				int irq;
+
+				if (!(msimask & BIT(msinr)))
+					continue;
+
+				irq = irq_find_mapping(armada_370_xp_msi_domain,
+						       msinr - 16);
+				handle_IRQ(irq, regs);
+			}
+		}
+#endif
+
 #ifdef CONFIG_SMP
 		/* IPI Handling */
 		if (irqnr == 0) {
@@ -291,6 +378,8 @@  static int __init armada_370_xp_mpic_of_init(struct device_node *node,
 
 #endif
 
+	armada_370_xp_msi_init(node);
+
 	set_handle_irq(armada_370_xp_handle_irq);
 
 	return 0;