[1/2] of_pci_irq: add a check to fallback to standard device tree parsing

Message ID 31c765c53e85e41bfc001d110d69e46c9967f4e7.1516961656.git.ryder.lee@mediatek.com
State Not Applicable
Delegated to: Lorenzo Pieralisi
Headers show
Series
  • [1/2] of_pci_irq: add a check to fallback to standard device tree parsing
Related show

Commit Message

Ryder Lee Jan. 31, 2018, 7:41 a.m.
A root complex usually consist of a host bridge and multiple P2P bridges,
and someone may express that in the form of a root node with many subnodes
and list all four interrupts for each slot (child node) in the root node
like this:

	pcie-controller {
		...
		interrupt-map-mask = <0xf800 0 0 7>;
		interrupt-map = <0x0000 0 0 {INTx} &{interrupt parent} ...>
				 0x0800 0 0 {INTx} &{interrupt parent} ...>;

		pcie@0,0 {
			reg = <0x0000 0 0 0 0>;
			...
		};

		pcie@1,0 {
			reg = <0x0800 0 0 0 0>;
			...
		};
	};

As shown above, we'd like to propagate IRQs from a root port to the devices
in the hierarchy below it in this way.  However, it seems that the current
parser couldn't handle such cases and will get something unexpected below:

	pcieport 0000:00:01.0: assign IRQ: got 213
	igb 0000:01:00.0: assign IRQ: got 212

There is a device which is connected to 2nd slot, but the port doesn't share
the same IRQ with its downstream devices.  The problem here is that, if the
loop found a P2P bridge, it wouldn't check whether the reg property exists
in ppnode or not but just pass the subordinate devfn to of_irq_parse_raw(),
thus the subsequent flow couldn't correctly resolve them.

Fix this by adding a check to fallback to standard device tree parsing.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
---
Please refer to the previous discussion thread: https://patchwork.ozlabs.org/patch/829108/
---
 drivers/of/of_pci_irq.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Rob Herring Jan. 31, 2018, 4:02 p.m. | #1
On Wed, Jan 31, 2018 at 1:41 AM, Ryder Lee <ryder.lee@mediatek.com> wrote:
> A root complex usually consist of a host bridge and multiple P2P bridges,
> and someone may express that in the form of a root node with many subnodes
> and list all four interrupts for each slot (child node) in the root node
> like this:
>
>         pcie-controller {
>                 ...
>                 interrupt-map-mask = <0xf800 0 0 7>;
>                 interrupt-map = <0x0000 0 0 {INTx} &{interrupt parent} ...>
>                                  0x0800 0 0 {INTx} &{interrupt parent} ...>;
>
>                 pcie@0,0 {
>                         reg = <0x0000 0 0 0 0>;
>                         ...
>                 };
>
>                 pcie@1,0 {
>                         reg = <0x0800 0 0 0 0>;
>                         ...
>                 };
>         };
>
> As shown above, we'd like to propagate IRQs from a root port to the devices
> in the hierarchy below it in this way.  However, it seems that the current
> parser couldn't handle such cases and will get something unexpected below:
>
>         pcieport 0000:00:01.0: assign IRQ: got 213
>         igb 0000:01:00.0: assign IRQ: got 212
>
> There is a device which is connected to 2nd slot, but the port doesn't share
> the same IRQ with its downstream devices.  The problem here is that, if the
> loop found a P2P bridge, it wouldn't check whether the reg property exists
> in ppnode or not but just pass the subordinate devfn to of_irq_parse_raw(),
> thus the subsequent flow couldn't correctly resolve them.
>
> Fix this by adding a check to fallback to standard device tree parsing.
>
> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> ---
> Please refer to the previous discussion thread: https://patchwork.ozlabs.org/patch/829108/
> ---
>  drivers/of/of_pci_irq.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/of/of_pci_irq.c b/drivers/of/of_pci_irq.c
> index 3a05568..e445866 100644
> --- a/drivers/of/of_pci_irq.c
> +++ b/drivers/of/of_pci_irq.c
> @@ -86,8 +86,18 @@ int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *out_irq
>         out_irq->np = ppnode;
>         out_irq->args_count = 1;
>         out_irq->args[0] = pin;
> -       laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
> -       laddr[1] = laddr[2] = cpu_to_be32(0);
> +
> +       if (!dn && ppnode) {

I would think whether you have a child device in DT or not is
irrelevant. If it's the bridge address you need to look at for
resolving interrupts, that would be true regardless.

> +               const __be32 *addr;
> +
> +               addr = of_get_property(ppnode, "reg", NULL);
> +               if (addr)
> +                       memcpy(laddr, addr, 3);

Can't you just adjust pdev to be ppdev in this case and then use the
existing code to set laddr?

Please copy the powerpc list on this. I worry that touching this
function will break something.

BTW, this code is moving to drivers/pci/ in 4.16.

> +       } else {
> +               laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
> +               laddr[1] = laddr[2] = cpu_to_be32(0);
> +       }
> +
>         rc = of_irq_parse_raw(laddr, out_irq);
>         if (rc)
>                 goto err;
> --
> 1.9.1
>
Ryder Lee Feb. 2, 2018, 9:32 a.m. | #2
On Wed, 2018-01-31 at 10:02 -0600, Rob Herring wrote:
> On Wed, Jan 31, 2018 at 1:41 AM, Ryder Lee <ryder.lee@mediatek.com> wrote:
> > A root complex usually consist of a host bridge and multiple P2P bridges,
> > and someone may express that in the form of a root node with many subnodes
> > and list all four interrupts for each slot (child node) in the root node
> > like this:
> >
> >         pcie-controller {
> >                 ...
> >                 interrupt-map-mask = <0xf800 0 0 7>;
> >                 interrupt-map = <0x0000 0 0 {INTx} &{interrupt parent} ...>
> >                                  0x0800 0 0 {INTx} &{interrupt parent} ...>;
> >
> >                 pcie@0,0 {
> >                         reg = <0x0000 0 0 0 0>;
> >                         ...
> >                 };
> >
> >                 pcie@1,0 {
> >                         reg = <0x0800 0 0 0 0>;
> >                         ...
> >                 };
> >         };
> >
> > As shown above, we'd like to propagate IRQs from a root port to the devices
> > in the hierarchy below it in this way.  However, it seems that the current
> > parser couldn't handle such cases and will get something unexpected below:
> >
> >         pcieport 0000:00:01.0: assign IRQ: got 213
> >         igb 0000:01:00.0: assign IRQ: got 212
> >
> > There is a device which is connected to 2nd slot, but the port doesn't share
> > the same IRQ with its downstream devices.  The problem here is that, if the
> > loop found a P2P bridge, it wouldn't check whether the reg property exists
> > in ppnode or not but just pass the subordinate devfn to of_irq_parse_raw(),
> > thus the subsequent flow couldn't correctly resolve them.
> >
> > Fix this by adding a check to fallback to standard device tree parsing.
> >
> > Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> > ---
> > Please refer to the previous discussion thread: https://patchwork.ozlabs.org/patch/829108/
> > ---
> >  drivers/of/of_pci_irq.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/of/of_pci_irq.c b/drivers/of/of_pci_irq.c
> > index 3a05568..e445866 100644
> > --- a/drivers/of/of_pci_irq.c
> > +++ b/drivers/of/of_pci_irq.c
> > @@ -86,8 +86,18 @@ int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *out_irq
> >         out_irq->np = ppnode;
> >         out_irq->args_count = 1;
> >         out_irq->args[0] = pin;
> > -       laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
> > -       laddr[1] = laddr[2] = cpu_to_be32(0);
> > +
> > +       if (!dn && ppnode) {
> 
> I would think whether you have a child device in DT or not is
> irrelevant. If it's the bridge address you need to look at for
> resolving interrupts, that would be true regardless.
> 
> > +               const __be32 *addr;
> > +
> > +               addr = of_get_property(ppnode, "reg", NULL);
> > +               if (addr)
> > +                       memcpy(laddr, addr, 3);
> 
> Can't you just adjust pdev to be ppdev in this case and then use the
> existing code to set laddr?

Okay, I will try it out and and see if the code gets better or worse.
 
> Please copy the powerpc list on this. I worry that touching this
> function will break something.
> BTW, this code is moving to drivers/pci/ in 4.16.

Sure. I will loop more people in next version.

Thanks

> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
Benjamin Herrenschmidt Feb. 5, 2018, 9:36 p.m. | #3
On Fri, 2018-02-02 at 17:32 +0800, Ryder Lee wrote:
> On Wed, 2018-01-31 at 10:02 -0600, Rob Herring wrote:
> > On Wed, Jan 31, 2018 at 1:41 AM, Ryder Lee <ryder.lee@mediatek.com> wrote:
> > > A root complex usually consist of a host bridge and multiple P2P bridges,
> > > and someone may express that in the form of a root node with many subnodes
> > > and list all four interrupts for each slot (child node) in the root node
> > > like this:
> > > 
> > >          pcie-controller {
> > >                  ...
> > >                  interrupt-map-mask = <0xf800 0 0 7>;
> > >                  interrupt-map = <0x0000 0 0 {INTx} &{interrupt parent} ...>
> > >                                   0x0800 0 0 {INTx} &{interrupt parent} ...>;
> > > 
> > >                  pcie@0,0 {
> > >                          reg = <0x0000 0 0 0 0>;
> > >                          ...
> > >                  };
> > > 
> > >                  pcie@1,0 {
> > >                          reg = <0x0800 0 0 0 0>;
> > >                          ...
> > >                  };
> > >          };
> > > 
> > > As shown above, we'd like to propagate IRQs from a root port to the devices
> > > in the hierarchy below it in this way.  However, it seems that the current
> > > parser couldn't handle such cases and will get something unexpected below:
> > > 
> > >          pcieport 0000:00:01.0: assign IRQ: got 213
> > >          igb 0000:01:00.0: assign IRQ: got 212
> > > 
> > > There is a device which is connected to 2nd slot, but the port doesn't share
> > > the same IRQ with its downstream devices.  The problem here is that, if the
> > > loop found a P2P bridge, it wouldn't check whether the reg property exists
> > > in ppnode or not but just pass the subordinate devfn to of_irq_parse_raw(),
> > > thus the subsequent flow couldn't correctly resolve them.

I don't really understand the problem explanation here. Something
doesn't look right as you shouldn't have to change that function, but I
just don't get what you a

Cheers,
Ben.
Ryder Lee Feb. 6, 2018, 2:38 a.m. | #4
On Tue, 2018-02-06 at 08:36 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2018-02-02 at 17:32 +0800, Ryder Lee wrote:
> > On Wed, 2018-01-31 at 10:02 -0600, Rob Herring wrote:
> > > On Wed, Jan 31, 2018 at 1:41 AM, Ryder Lee <ryder.lee@mediatek.com> wrote:
> > > > A root complex usually consist of a host bridge and multiple P2P bridges,
> > > > and someone may express that in the form of a root node with many subnodes
> > > > and list all four interrupts for each slot (child node) in the root node
> > > > like this:
> > > > 
> > > >          pcie-controller {
> > > >                  ...
> > > >                  interrupt-map-mask = <0xf800 0 0 7>;
> > > >                  interrupt-map = <0x0000 0 0 {INTx} &{interrupt parent} ...>
> > > >                                   0x0800 0 0 {INTx} &{interrupt parent} ...>;
> > > > 
> > > >                  pcie@0,0 {
> > > >                          reg = <0x0000 0 0 0 0>;
> > > >                          ...
> > > >                  };
> > > > 
> > > >                  pcie@1,0 {
> > > >                          reg = <0x0800 0 0 0 0>;
> > > >                          ...
> > > >                  };
> > > >          };
> > > > 
> > > > As shown above, we'd like to propagate IRQs from a root port to the devices
> > > > in the hierarchy below it in this way.  However, it seems that the current
> > > > parser couldn't handle such cases and will get something unexpected below:
> > > > 
> > > >          pcieport 0000:00:01.0: assign IRQ: got 213
> > > >          igb 0000:01:00.0: assign IRQ: got 212
> > > > 
> > > > There is a device which is connected to 2nd slot, but the port doesn't share
> > > > the same IRQ with its downstream devices.  The problem here is that, if the
> > > > loop found a P2P bridge, it wouldn't check whether the reg property exists
> > > > in ppnode or not but just pass the subordinate devfn to of_irq_parse_raw(),
> > > > thus the subsequent flow couldn't correctly resolve them.
> 
> I don't really understand the problem explanation here. Something
> doesn't look right as you shouldn't have to change that function, but I
> just don't get what you a
> 
> Cheers,
> Ben.
> 

I think the code should look at the bridge address <0x0800 ...> we list
in bindings for resolving interrupts in this case, but it seems like it
use the 'pdev->defvn << 8' which is not really we want and will lead to
mismatch.

		interrupt-map-mask = <0xf800 0 0 7>;
		interrupt-map = <0x0000 0 0 1 ...>,
				<0x0000 0 0 2 ...>,
				<0x0000 0 0 3 ...>,
				<0x0000 0 0 4 ...>,

				 0x0800 0 0 1 ...>,
				 0x0800 0 0 2 ...>,
				 0x0800 0 0 3 ...>,
				 0x0800 0 0 4 ...>;
		...
		pcie@1,0 {
			reg = <0x0800 0 0 0 0>;
			...
		};


Or, alternatively, we could add a interrupt-map property in both child
and root node to solve this. The below example is my original version as
I don't want to change that function either.

		interrupt-map-mask = <0xf800 0 0 0>;
		interrupt-map = <0x0000 0 0 0 ...>,
				 0x0800 0 0 0 ...>;
		...
		pcie@1,0 {
			reg = <0x0800 0 0 0 0>;
			#interrupt-cells = <1>;
			interrupt-map-mask = <0 0 0 0>;
			interrupt-map = <0 0 0 0 ...>;
			...
		};

However, I can't find any other similar case in documentation.

Thanks.
Benjamin Herrenschmidt Feb. 6, 2018, 4:05 a.m. | #5
On Tue, 2018-02-06 at 10:38 +0800, Ryder Lee wrote:
> 
> I think the code should look at the bridge address <0x0800 ...> we list
> in bindings for resolving interrupts in this case, but it seems like it
> use the 'pdev->defvn << 8' which is not really we want and will lead to
> mismatch.
> 
> 		interrupt-map-mask = <0xf800 0 0 7>;
> 		interrupt-map = <0x0000 0 0 1 ...>,
> 				<0x0000 0 0 2 ...>,
> 				<0x0000 0 0 3 ...>,
> 				<0x0000 0 0 4 ...>,
> 
> 				 0x0800 0 0 1 ...>,
> 				 0x0800 0 0 2 ...>,
> 				 0x0800 0 0 3 ...>,
> 				 0x0800 0 0 4 ...>;
> 		...
> 		pcie@1,0 {
> 			reg = <0x0800 0 0 0 0>;
> 			...
> 		};
> 
> 
> Or, alternatively, we could add a interrupt-map property in both child
> and root node to solve this. The below example is my original version as
> I don't want to change that function either.

The code looks at devfn because it's meant to work for PCI including
when the devices dont have a device node in the DT.

What I'm trying to figure out is what is it that your parent and
children are representing here. Which is/are the root complex ?

What is the actual topology as visible on the PCIe bus (is lspci output
basically) and how does that map to your representation ?

> 		interrupt-map-mask = <0xf800 0 0 0>;
> 		interrupt-map = <0x0000 0 0 0 ...>,
> 				 0x0800 0 0 0 ...>;
> 		...
> 		pcie@1,0 {
> 			reg = <0x0800 0 0 0 0>;
> 			#interrupt-cells = <1>;
> 			interrupt-map-mask = <0 0 0 0>;
> 			interrupt-map = <0 0 0 0 ...>;
> 			...
> 		};
> 
> However, I can't find any other similar case in documentation.
> 
> Thanks.
Ryder Lee Feb. 6, 2018, 4:31 a.m. | #6
On Tue, 2018-02-06 at 15:05 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2018-02-06 at 10:38 +0800, Ryder Lee wrote:
> > 
> > I think the code should look at the bridge address <0x0800 ...> we list
> > in bindings for resolving interrupts in this case, but it seems like it
> > use the 'pdev->defvn << 8' which is not really we want and will lead to
> > mismatch.
> > 
> > 		interrupt-map-mask = <0xf800 0 0 7>;
> > 		interrupt-map = <0x0000 0 0 1 ...>,
> > 				<0x0000 0 0 2 ...>,
> > 				<0x0000 0 0 3 ...>,
> > 				<0x0000 0 0 4 ...>,
> > 
> > 				 0x0800 0 0 1 ...>,
> > 				 0x0800 0 0 2 ...>,
> > 				 0x0800 0 0 3 ...>,
> > 				 0x0800 0 0 4 ...>;
> > 		...
> > 		pcie@1,0 {
> > 			reg = <0x0800 0 0 0 0>;
> > 			...
> > 		};
> > 
> > 
> > Or, alternatively, we could add a interrupt-map property in both child
> > and root node to solve this. The below example is my original version as
> > I don't want to change that function either.
> 
> The code looks at devfn because it's meant to work for PCI including
> when the devices dont have a device node in the DT.
> 
> What I'm trying to figure out is what is it that your parent and
> children are representing here. Which is/are the root complex ?

This is a single root complex with 2 root port (children in DT).

> What is the actual topology as visible on the PCIe bus (is lspci output
> basically) and how does that map to your representation ?

# lspci
00:00.0 Class 0604: 14c3:0801 //1st slot - pcie@0,0
00:01.0 Class 0604: 14c3:0801 //2nd slot - pcie@1,0

01:00.0 Class 0280: 14c3:7603 //A device which is connected to 1st slot
02:00.0 Class 0200: 8086:1521 //A 4 func device which is connected to
2nd slot
02:00.1 Class 0200: 8086:1521
02:00.2 Class 0200: 8086:1521
02:00.3 Class 0200: 8086:1521

> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
Benjamin Herrenschmidt Feb. 6, 2018, 4:50 a.m. | #7
On Tue, 2018-02-06 at 12:31 +0800, Ryder Lee wrote:
> On Tue, 2018-02-06 at 15:05 +1100, Benjamin Herrenschmidt wrote:
> > On Tue, 2018-02-06 at 10:38 +0800, Ryder Lee wrote:
> > > 
> > > I think the code should look at the bridge address <0x0800 ...> we list
> > > in bindings for resolving interrupts in this case, but it seems like it
> > > use the 'pdev->defvn << 8' which is not really we want and will lead to
> > > mismatch.
> > > 
> > > 		interrupt-map-mask = <0xf800 0 0 7>;
> > > 		interrupt-map = <0x0000 0 0 1 ...>,
> > > 				<0x0000 0 0 2 ...>,
> > > 				<0x0000 0 0 3 ...>,
> > > 				<0x0000 0 0 4 ...>,
> > > 
> > > 				 0x0800 0 0 1 ...>,
> > > 				 0x0800 0 0 2 ...>,
> > > 				 0x0800 0 0 3 ...>,
> > > 				 0x0800 0 0 4 ...>;
> > > 		...
> > > 		pcie@1,0 {
> > > 			reg = <0x0800 0 0 0 0>;
> > > 			...
> > > 		};
> > > 
> > > 
> > > Or, alternatively, we could add a interrupt-map property in both child
> > > and root node to solve this. The below example is my original version as
> > > I don't want to change that function either.
> > 
> > The code looks at devfn because it's meant to work for PCI including
> > when the devices dont have a device node in the DT.
> > 
> > What I'm trying to figure out is what is it that your parent and
> > children are representing here. Which is/are the root complex ?
> 
> This is a single root complex with 2 root port (children in DT).
> 
> > What is the actual topology as visible on the PCIe bus (is lspci output
> > basically) and how does that map to your representation ?
> 
> # lspci
> 00:00.0 Class 0604: 14c3:0801 //1st slot - pcie@0,0
> 00:01.0 Class 0604: 14c3:0801 //2nd slot - pcie@1,0
> 
> 01:00.0 Class 0280: 14c3:7603 //A device which is connected to 1st slot
> 02:00.0 Class 0200: 8086:1521 //A 4 func device which is connected to
> 2nd slot
> 02:00.1 Class 0200: 8086:1521
> 02:00.2 Class 0200: 8086:1521
> 02:00.3 Class 0200: 8086:1521

Ok so that's a rather standard setup. The "devfn << 8" of your root
ports should be the exact same thing as their first reg property entry,
I'm not sure why you have a mismatch here.

However, that map only represents the INTA..D lines going to the root
ports, not how these get mapped to children of the root ports.

of_irq_parse_pci() will implement standard swizzling if you don't have
nodes for your devices at all. If you do, however, the code assumes
you have a correct and complete interrupt tree in the device-tree.

That means that you need in each "p2p bridge", including your root
ports, an interrupt-map that will map the children INTA...D of that
bridge to the parent INTA...D of that bridge.

Alternatively, you can make the maps in the root ports point directly
to the parent PIC. If you chose to do that, then the interrupt-map in
your root complex becomes only useful to represent the root ports own
interrutps (hotplug, AER,...) and could be replaced by just having
interrupt-parent & interrupts in those root port nodes.

Cheers,
Ben.
Ryder Lee Feb. 6, 2018, 5:42 a.m. | #8
On Tue, 2018-02-06 at 15:50 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2018-02-06 at 12:31 +0800, Ryder Lee wrote:
> > On Tue, 2018-02-06 at 15:05 +1100, Benjamin Herrenschmidt wrote:
> > > On Tue, 2018-02-06 at 10:38 +0800, Ryder Lee wrote:
> > > > 
> > > > I think the code should look at the bridge address <0x0800 ...> we list
> > > > in bindings for resolving interrupts in this case, but it seems like it
> > > > use the 'pdev->defvn << 8' which is not really we want and will lead to
> > > > mismatch.
> > > > 
> > > > 		interrupt-map-mask = <0xf800 0 0 7>;
> > > > 		interrupt-map = <0x0000 0 0 1 ...>,
> > > > 				<0x0000 0 0 2 ...>,
> > > > 				<0x0000 0 0 3 ...>,
> > > > 				<0x0000 0 0 4 ...>,
> > > > 
> > > > 				 0x0800 0 0 1 ...>,
> > > > 				 0x0800 0 0 2 ...>,
> > > > 				 0x0800 0 0 3 ...>,
> > > > 				 0x0800 0 0 4 ...>;
> > > > 		...
> > > > 		pcie@1,0 {
> > > > 			reg = <0x0800 0 0 0 0>;
> > > > 			...
> > > > 		};
> > > > 
> > > > 
> > > > Or, alternatively, we could add a interrupt-map property in both child
> > > > and root node to solve this. The below example is my original version as
> > > > I don't want to change that function either.
> > > 
> > > The code looks at devfn because it's meant to work for PCI including
> > > when the devices dont have a device node in the DT.
> > > 
> > > What I'm trying to figure out is what is it that your parent and
> > > children are representing here. Which is/are the root complex ?
> > 
> > This is a single root complex with 2 root port (children in DT).
> > 
> > > What is the actual topology as visible on the PCIe bus (is lspci output
> > > basically) and how does that map to your representation ?
> > 
> > # lspci
> > 00:00.0 Class 0604: 14c3:0801 //1st slot - pcie@0,0
> > 00:01.0 Class 0604: 14c3:0801 //2nd slot - pcie@1,0
> > 
> > 01:00.0 Class 0280: 14c3:7603 //A device which is connected to 1st slot
> > 02:00.0 Class 0200: 8086:1521 //A 4 func device which is connected to
> > 2nd slot
> > 02:00.1 Class 0200: 8086:1521
> > 02:00.2 Class 0200: 8086:1521
> > 02:00.3 Class 0200: 8086:1521
> 
> Ok so that's a rather standard setup. The "devfn << 8" of your root
> ports should be the exact same thing as their first reg property entry,
> I'm not sure why you have a mismatch here.

I've added some log after 'for loop':

pr_err("busn=0x%x, devfn=0x%x, reg=0x%x\n", pdev->bus->number, pdev->devfn, of_pci_get_devfn(ppnode));

and got these:

[    5.651836] busn=0x0, devfn=0x0, reg=0x0
[    5.651936] pcieport 0000:00:00.0: assign IRQ: got 213
...
[    5.652398] busn=0x0, devfn=0x8, reg=0x0
[    5.652487] pcieport 0000:00:01.0: assign IRQ: got 214

...
[    5.653025] busn=0x2, devfn=0x0, reg=0x8
[    5.653058] igb 0000:02:00.0: assign IRQ: got 213

[    5.724582] busn=0x2, devfn=0x1, reg=0x8
[    5.724634] igb 0000:02:00.1: assign IRQ: got 213

> However, that map only represents the INTA..D lines going to the root
> ports, not how these get mapped to children of the root ports.
> 
> of_irq_parse_pci() will implement standard swizzling if you don't have
> nodes for your devices at all. If you do, however, the code assumes
> you have a correct and complete interrupt tree in the device-tree.
> 
> That means that you need in each "p2p bridge", including your root
> ports, an interrupt-map that will map the children INTA...D of that
> bridge to the parent INTA...D of that bridge.
> 
> Alternatively, you can make the maps in the root ports point directly
> to the parent PIC. If you chose to do that, then the interrupt-map in
> your root complex becomes only useful to represent the root ports own
> interrutps (hotplug, AER,...) and could be replaced by just having
> interrupt-parent & interrupts in those root port nodes.
> 

Thanks for explanation.

So I guess the better way to achieve my aim - one IRQ per slot that is
connected to all INTx and get propagated through the bridges (and for
those root ports own interrupts (PME ..)} is to add interrupt-map
properties in both parent and root port nodes.

Something like this: https://patchwork.kernel.org/patch/9970923/ ,right?

Ryder
Benjamin Herrenschmidt Feb. 6, 2018, 10:31 p.m. | #9
On Tue, 2018-02-06 at 13:42 +0800, Ryder Lee wrote:
> Thanks for explanation.
> 
> So I guess the better way to achieve my aim - one IRQ per slot that is
> connected to all INTx and get propagated through the bridges (and for
> those root ports own interrupts (PME ..)} is to add interrupt-map
> properties in both parent and root port nodes.
> 
> Something like this: https://patchwork.kernel.org/patch/9970923// ,right?

Yup.

Cheers,
Ben.
Ryder Lee Feb. 7, 2018, 1:58 a.m. | #10
Hi, Arnd

On Wed, 2018-02-07 at 09:31 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2018-02-06 at 13:42 +0800, Ryder Lee wrote:
> > Thanks for explanation.
> > 
> > So I guess the better way to achieve my aim - one IRQ per slot that is
> > connected to all INTx and get propagated through the bridges (and for
> > those root ports own interrupts (PME ..)} is to add interrupt-map
> > properties in both parent and root port nodes.
> > 
> > Something like this: https://patchwork.kernel.org/patch/9970923// ,right?
> 
> Yup.
> 
> Cheers,
> Ben.
> 

Do you have any thoughts on the original approach? If you are okay with
it, I will resend the DT patch.

Thanks.
Lorenzo Pieralisi March 15, 2018, 5:43 p.m. | #11
On Wed, Jan 31, 2018 at 03:41:49PM +0800, Ryder Lee wrote:
> A root complex usually consist of a host bridge and multiple P2P bridges,
> and someone may express that in the form of a root node with many subnodes
> and list all four interrupts for each slot (child node) in the root node
> like this:
> 
> 	pcie-controller {
> 		...
> 		interrupt-map-mask = <0xf800 0 0 7>;
> 		interrupt-map = <0x0000 0 0 {INTx} &{interrupt parent} ...>
> 				 0x0800 0 0 {INTx} &{interrupt parent} ...>;
> 
> 		pcie@0,0 {
> 			reg = <0x0000 0 0 0 0>;
> 			...
> 		};
> 
> 		pcie@1,0 {
> 			reg = <0x0800 0 0 0 0>;
> 			...
> 		};
> 	};
> 
> As shown above, we'd like to propagate IRQs from a root port to the devices
> in the hierarchy below it in this way.  However, it seems that the current
> parser couldn't handle such cases and will get something unexpected below:
> 
> 	pcieport 0000:00:01.0: assign IRQ: got 213
> 	igb 0000:01:00.0: assign IRQ: got 212
> 
> There is a device which is connected to 2nd slot, but the port doesn't share
> the same IRQ with its downstream devices.  The problem here is that, if the
> loop found a P2P bridge, it wouldn't check whether the reg property exists
> in ppnode or not but just pass the subordinate devfn to of_irq_parse_raw(),
> thus the subsequent flow couldn't correctly resolve them.
> 
> Fix this by adding a check to fallback to standard device tree parsing.
> 
> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> ---
> Please refer to the previous discussion thread: https://patchwork.ozlabs.org/patch/829108/
> ---
>  drivers/of/of_pci_irq.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)

Hi Ryder,

from the thread discussion I gather I can drop this series from the PCI
queue and you will update the DT as agreed with Ben, that looks like
the most reasonable solution to the problem you are facing, please
let me know if there is anything I am missing.

Thanks,
Lorenzo

> diff --git a/drivers/of/of_pci_irq.c b/drivers/of/of_pci_irq.c
> index 3a05568..e445866 100644
> --- a/drivers/of/of_pci_irq.c
> +++ b/drivers/of/of_pci_irq.c
> @@ -86,8 +86,18 @@ int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *out_irq
>  	out_irq->np = ppnode;
>  	out_irq->args_count = 1;
>  	out_irq->args[0] = pin;
> -	laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
> -	laddr[1] = laddr[2] = cpu_to_be32(0);
> +
> +	if (!dn && ppnode) {
> +		const __be32 *addr;
> +
> +		addr = of_get_property(ppnode, "reg", NULL);
> +		if (addr)
> +			memcpy(laddr, addr, 3);
> +	} else {
> +		laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
> +		laddr[1] = laddr[2] = cpu_to_be32(0);
> +	}
> +
>  	rc = of_irq_parse_raw(laddr, out_irq);
>  	if (rc)
>  		goto err;
> -- 
> 1.9.1
>
Ryder Lee March 16, 2018, 12:58 a.m. | #12
On Thu, 2018-03-15 at 17:43 +0000, Lorenzo Pieralisi wrote:
> On Wed, Jan 31, 2018 at 03:41:49PM +0800, Ryder Lee wrote:
> > A root complex usually consist of a host bridge and multiple P2P bridges,
> > and someone may express that in the form of a root node with many subnodes
> > and list all four interrupts for each slot (child node) in the root node
> > like this:
> > 
> > 	pcie-controller {
> > 		...
> > 		interrupt-map-mask = <0xf800 0 0 7>;
> > 		interrupt-map = <0x0000 0 0 {INTx} &{interrupt parent} ...>
> > 				 0x0800 0 0 {INTx} &{interrupt parent} ...>;
> > 
> > 		pcie@0,0 {
> > 			reg = <0x0000 0 0 0 0>;
> > 			...
> > 		};
> > 
> > 		pcie@1,0 {
> > 			reg = <0x0800 0 0 0 0>;
> > 			...
> > 		};
> > 	};
> > 
> > As shown above, we'd like to propagate IRQs from a root port to the devices
> > in the hierarchy below it in this way.  However, it seems that the current
> > parser couldn't handle such cases and will get something unexpected below:
> > 
> > 	pcieport 0000:00:01.0: assign IRQ: got 213
> > 	igb 0000:01:00.0: assign IRQ: got 212
> > 
> > There is a device which is connected to 2nd slot, but the port doesn't share
> > the same IRQ with its downstream devices.  The problem here is that, if the
> > loop found a P2P bridge, it wouldn't check whether the reg property exists
> > in ppnode or not but just pass the subordinate devfn to of_irq_parse_raw(),
> > thus the subsequent flow couldn't correctly resolve them.
> > 
> > Fix this by adding a check to fallback to standard device tree parsing.
> > 
> > Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> > ---
> > Please refer to the previous discussion thread: https://patchwork.ozlabs.org/patch/829108/
> > ---
> >  drivers/of/of_pci_irq.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> Hi Ryder,
> 
> from the thread discussion I gather I can drop this series from the PCI
> queue and you will update the DT as agreed with Ben, that looks like
> the most reasonable solution to the problem you are facing, please
> let me know if there is anything I am missing.
> 
> Thanks,
> Lorenzo
> 

Yes, please drop the series.

Thanks

Patch

diff --git a/drivers/of/of_pci_irq.c b/drivers/of/of_pci_irq.c
index 3a05568..e445866 100644
--- a/drivers/of/of_pci_irq.c
+++ b/drivers/of/of_pci_irq.c
@@ -86,8 +86,18 @@  int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *out_irq
 	out_irq->np = ppnode;
 	out_irq->args_count = 1;
 	out_irq->args[0] = pin;
-	laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
-	laddr[1] = laddr[2] = cpu_to_be32(0);
+
+	if (!dn && ppnode) {
+		const __be32 *addr;
+
+		addr = of_get_property(ppnode, "reg", NULL);
+		if (addr)
+			memcpy(laddr, addr, 3);
+	} else {
+		laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
+		laddr[1] = laddr[2] = cpu_to_be32(0);
+	}
+
 	rc = of_irq_parse_raw(laddr, out_irq);
 	if (rc)
 		goto err;