Patchwork [v2,22/27] arm: mvebu: add PCIe Device Tree informations for Armada XP

login
register
mail settings
Submitter Thomas Petazzoni
Date Jan. 28, 2013, 6:56 p.m.
Message ID <1359399397-29729-23-git-send-email-thomas.petazzoni@free-electrons.com>
Download mbox | patch
Permalink /patch/216322/
State Not Applicable
Headers show

Comments

Thomas Petazzoni - Jan. 28, 2013, 6:56 p.m.
The Armada XP SoCs have multiple PCIe interfaces. The MV78230 has 2
PCIe units (one 4x or quad 1x, the other 1x only), the MV78260 has 3
PCIe units (two 4x or quad 1x and one 4x/1x), the MV78460 has 4 PCIe
units (two 4x or quad 1x and two 4x/1x). We therefore add the
necessary Device Tree informations to make those PCIe interfaces
usable.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/boot/dts/armada-xp-mv78230.dtsi |   92 ++++++++++++++++++
 arch/arm/boot/dts/armada-xp-mv78260.dtsi |  105 +++++++++++++++++++++
 arch/arm/boot/dts/armada-xp-mv78460.dtsi |  150 ++++++++++++++++++++++++++++++
 3 files changed, 347 insertions(+)
Arnd Bergmann - Feb. 6, 2013, 10:41 p.m.
On Monday 28 January 2013, Thomas Petazzoni wrote:
> +
> +               /*
> +                * MV78230 has 2 PCIe units Gen2.0: One unit can be
> +                * configured as x4 or quad x1 lanes. One unit is
> +                * x4/x1.
> +                */
> +               pcie-controller {
> +                       compatible = "marvell,armada-370-xp-pcie";
> +                       status = "disabled";
> +
> +                       #address-cells = <3>;
> +                       #size-cells = <2>;
> +
> +
> +                       bus-range = <0x00 0xff>;
> +
> +                       ranges = <0x00000800 0 0xd0040000 0xd0040000 0 0x00002000   /* port 0.0 registers */
> +                                 0x00004800 0 0xd0042000 0xd0042000 0 0x00002000   /* port 2.0 registers */
> +                                 0x00001000 0 0xd0044000 0xd0044000 0 0x00002000   /* port 0.1 registers */
> +                                 0x00001800 0 0xd0048000 0xd0048000 0 0x00002000   /* port 0.2 registers */
> +                                 0x00002000 0 0xd004C000 0xd004C000 0 0x00002000   /* port 0.3 registers */
> +                                 0x81000000 0 0          0xc0000000 0 0x00010000   /* downstream I/O */
> +                                 0x82000000 0 0          0xc1000000 0 0x08000000>; /* non-prefetchable memory */
> +

I've been thinking some more about this, and I wonder if it would
make more sense to describe the address remapping correctly as
a node on top of the pcie-controller node.

This would mean that rather than putting the mapped physical address
(0xc0000000, 0xc1000000, ...) in here, you would actually have 64-bit
address as the destination as well, in whatever format the
address map hardware uses, I assume using a numbered 32 bit
address space for each object that can be remapped.

This would also let you do the PCI memory address assignment for
each port separately, starting at bus address 0, followed by
finding a location in the CPU address space and passing
the start as the sys->mem_offset argument to
pci_add_resource_offset.

> +                       pcie@0,0 {
> +                               device_type = "pciex";
> +                               reg = <0x0800 0 0xd0040000 0 0x2000>;
> +                               #address-cells = <3>;
> +                               #size-cells = <2>;
> +                               marvell,pcie-port = <0>;
> +                               marvell,pcie-lane = <0>;
> +                               interrupts = <1>;
> +                               clocks = <&gateclk 5>;
> +                               status = "disabled";
> +                       };

I think you are missing a "ranges" property here, at least an empty
one, which is required by the standard but not currently enforced
in the code.
	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni - Feb. 6, 2013, 11:07 p.m.
Dear Arnd Bergmann,

On Wed, 6 Feb 2013 22:41:14 +0000, Arnd Bergmann wrote:

> I've been thinking some more about this, and I wonder if it would
> make more sense to describe the address remapping correctly as
> a node on top of the pcie-controller node.
> 
> This would mean that rather than putting the mapped physical address
> (0xc0000000, 0xc1000000, ...) in here, you would actually have 64-bit
> address as the destination as well, in whatever format the
> address map hardware uses, I assume using a numbered 32 bit
> address space for each object that can be remapped.
> 
> This would also let you do the PCI memory address assignment for
> each port separately, starting at bus address 0, followed by
> finding a location in the CPU address space and passing
> the start as the sys->mem_offset argument to
> pci_add_resource_offset.

Hum, good you give a skeleton example, because I'm not sure to
understand your suggestion.


> 
> > +                       pcie@0,0 {
> > +                               device_type = "pciex";
> > +                               reg = <0x0800 0 0xd0040000 0
> > 0x2000>;
> > +                               #address-cells = <3>;
> > +                               #size-cells = <2>;
> > +                               marvell,pcie-port = <0>;
> > +                               marvell,pcie-lane = <0>;
> > +                               interrupts = <1>;
> > +                               clocks = <&gateclk 5>;
> > +                               status = "disabled";
> > +                       };
> 
> I think you are missing a "ranges" property here, at least an empty
> one, which is required by the standard but not currently enforced
> in the code.

Is it really wise to have DT properties that are not used by anything,
and therefore have a very high chance of either being incorrect, or
becoming incorrect?

Best regards,

Thomas
Jason Gunthorpe - Feb. 7, 2013, 1:05 a.m.
On Wed, Feb 06, 2013 at 10:41:14PM +0000, Arnd Bergmann wrote:

> This would mean that rather than putting the mapped physical address
> (0xc0000000, 0xc1000000, ...) in here, you would actually have 64-bit
> address as the destination as well, in whatever format the
> address map hardware uses, I assume using a numbered 32 bit
> address space for each object that can be remapped.

Do you mean like:

mbus_matrix {
   /* At this level addresses are
       <32 bit MBUS target ID> <32 bit target address offset>
      target_id is an internal MBUS specification
     */
   #address-cells = <2>;
   #size-cells = <1>;

   pex0 {
       ranges = <
      // CPU window bridged to PCI MMIO
      0x82000000 0x00000000 0x00000000  0x<target_id> 0x0 0x0 0x8000000
      // CPU IO window bridged to PCI IO
      0x81000000 0x00000000 0x00000000  0x<target_id> 0x0  0x0 0xa0000
      ...>;
   }

   nand {
       reg = <0x<target_id> 0 0x200>;
   };

   internal_regs {
       ranges = <0 0x<target_id> 0 0x10000>;

       timer {
           reg = <0x20300 0x20>;
      };
   };
}

?

I think the big issue would go back to how to pool all the link
decoders together in a way that fits OF and Linux's PCI core will
understand?

How does of_translate_address work in a world like this?

> This would also let you do the PCI memory address assignment for
> each port separately, starting at bus address 0, followed by
> finding a location in the CPU address space and passing
> the start as the sys->mem_offset argument to
> pci_add_resource_offset.

That goes back to the original problem - the goal is to have only one
pci_sys_data, not one for every link.

The host driver would have to request a large region of physical
address space and still dole it out on a link by link basis. Not sure
how to model that in DT??

In any event, changing how all the dynamic windows are configured in
DT is a big job (there was another thread about this) it seems
orthogonal to the PCI host driver..

> > +                       pcie@0,0 {
> > +                               device_type = "pciex";
> > +                               reg = <0x0800 0 0xd0040000 0 0x2000>;
> > +                               #address-cells = <3>;
> > +                               #size-cells = <2>;
> > +                               marvell,pcie-port = <0>;
> > +                               marvell,pcie-lane = <0>;
> > +                               interrupts = <1>;
> > +                               clocks = <&gateclk 5>;
> > +                               status = "disabled";
> > +                       };
> 
> I think you are missing a "ranges" property here, at least an empty
> one, which is required by the standard but not currently enforced
> in the code.

Maybe.. according to the standard the ranges in this stanza should
reflect the bridge configuration, but that isn't known when the DT is
written. An empty ranges means identity and that isn't really right
either.

Also, what should 'reg' be so that the PCI core binds the OF nodes
properly?  The standard says reg should have the configuration space
address of the bridge, and I noticed Thierry was using something that
almost looked like a config space address in his driver..

But that seems overly tricky.. When using the link stanzas, shouldn't
this scheme be more like this:

// The PCI-E host bridge
pex@0 {
   device_type = "pciex"; // <<-- Important!!

   ranges = <
      // Driver internal control registers in MMIO space
      0x82000000 0x10000000 0xd0040000  0xd0040000  0x0 0x8000000

      // CPU 0xe0000000 -> e8000000 bridged to PCI MMIO
      0x82000000 0x00000000 0xe0000000  0xe0000000  0x0 0x8000000
      // CPU IO bus 0x0000000 -> 0xa0000 bridged to PCI IO
      0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
      // CPU 0xe8000000 -> f0000000 bridged to PCI MMIO, prefetchable
      0xC2000000 0x00000000 0xe8000000  0xe8000000  0x0 0x8000000
      >;

   //...

   // PCI-PCI bridge to Physical link 0
   pcie@0,0 {
     device_type = "pciex";
     // The configuration space address of the PCI-PCI bridge required by OF
     reg = <0x80 0 0  0 0>; // Bus 0, Dev 0x10, Fn 0

     /* The 'bar' on the PCI-PCI bridge, maps to internal control
        registers, required by the driver. */
     assigned-addresses = <0x82000000 0x10000000 0xd0040000  0 0x2000>;
     // ..
  }
}

Thierry, what do you think?

I struggled with this particular area of the OF system recently and
Grant Likely was very helpful - the above is based on that
discussion..

======

Thomas, here is one possible alternate idea, not sure on the merit,
just including it because it is close to what I've got in my DT right
now..:

// The entire multi-lane controller and PCI root bus
pex@0 {
   device_type = "pciex";

   /* This section configures the Linux PCI host driver. Each line is a
      physical PCI-Link. (Erorrs included :) */
   compatible = "marvell,armada-370-xp-pcie";
   regs = <0xd0040000 0x00002000   // port 0.0
           0xd0042000 0x00002000   // port 2.0
           0xd0044000 0x00002000   // port 0.1
           0xd0048000 0x00002000   // port 0.2
           0xd004C000 0x00002000>; // port 0.3
   io-cpu-window = <0xc0000000 0xa0000>;
   interrupts = <58 59 60 61 99>;
   clocks = <&gateclk 5
             &gateclk 6
             &gateclk 7
             &gateclk 8
             &gateclk 26>
   marvell,port-lane <0 0
                      0 1
                      0 2
                      0 3
                      2 0>;

   /* Below here configures the aggregate PCI bus the PEX0 stanza
      describes */

   #address-cells = <3>;
   #size-cells = <2>;

   // CPU resources allocated to this PCI host bridge
   bus-range = <0x00 0xff>;
   ranges = <
      // CPU 0xe0000000 -> e8000000 bridged to PCI MMIO
      0x82000000 0x00000000 0xe0000000  0xe0000000  0x0 0x8000000
      // CPU IO bus 0x0000000 -> 0xa0000 bridged to PCI IO
      0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
      // CPU 0xe8000000 -> f0000000 bridged to PCI MMIO, prefetchable
      0xC2000000 0x00000000 0xe8000000  0xe8000000  0x0 0x8000000
      >;

   /* Any PCI devices associated with this bus go here, relative to
      the above ranges. As example: */
   link@0 {
   	device_type = "pciex";
        // Root port at bus 0, device 0x10, function 0
	reg = <0x00000080 0 0  0 0>;

       ep {
   	 device_type = "pciex";
         // End port on bus 1, device 0, function 0
	 reg = <0x00000100 0 0  0 0>;
       } 
   }
}

Thoughts:
 - 'regs' in the main stanza, keep to CPU addresses instead of
   confusing translated fake ranges address
 - each regs line is matches to an interrupt, clock and port-lane
   line to describe a link. The above describes 5 links.
 - The CPU physical address window to use for the IO space
   is set via io-cpu-window, not much choice here, the PCI
   format ranges must be 0 based.
 - It is not necessary to have per-root port stanzas at all
   with the above.
 - There is only one PCI bus stanza, the top level.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thierry Reding - Feb. 7, 2013, 7:28 a.m.
On Wed, Feb 06, 2013 at 06:05:02PM -0700, Jason Gunthorpe wrote:
[...]
> Also, what should 'reg' be so that the PCI core binds the OF nodes
> properly?  The standard says reg should have the configuration space
> address of the bridge, and I noticed Thierry was using something that
> almost looked like a config space address in his driver..
> 
> But that seems overly tricky.. When using the link stanzas, shouldn't
> this scheme be more like this:
> 
> // The PCI-E host bridge
> pex@0 {
>    device_type = "pciex"; // <<-- Important!!

That might actually work but is somewhat dangerous. I originally tried
to trick the OF code into parsing the ranges properly by setting this to
"pci". However that breaks horribly because of_bus_pci_match() in
drivers/of/address.c will cause the parent bus of the pex@0 controller
to be PCI, which will cause #address-cells == 3 and #size-cells == 2,
and thus messing up the address translation because you actually have
#address-cells == 1 and #size-cells == 1.

The code doesn't seem to match on "pciex", so you might get away with
it, but I don't see why exactly it would be necessary. For one it is
actually wrong as the device isn't a PCI device but a platform device.
What I did to solve the ranges parsing problem is to use of_find_bus()
instead of of_match_bus() in of_pci_process_ranges(), so that the cell
count is correct.

>    ranges = <
>       // Driver internal control registers in MMIO space
>       0x82000000 0x10000000 0xd0040000  0xd0040000  0x0 0x8000000

I'm not sure I understand what you're doing here. Where does the
0x10000000 in cell 2 come from?

I also just noticed that I used 0x00000800 in the first cell, maybe that
should be 0x02000800, though I think that didn't quite work for some
reason. I'll need to check that. The odd part about this is that the
address is in fact not within PCI MMIO space at all, so I'm not sure
this is even a correct way to represent it. However I find it quite
intuitive to do so and it is really the only way to make the translation
from the PCI-PCI bridge's reg property work while at the same time
having a proper PCI address for the port.

Note that 0x82000800 would probably not be correct as it'd indicate that
the address is relocatable, which it really isn't.

>       // CPU 0xe0000000 -> e8000000 bridged to PCI MMIO
>       0x82000000 0x00000000 0xe0000000  0xe0000000  0x0 0x8000000
>       // CPU IO bus 0x0000000 -> 0xa0000 bridged to PCI IO
>       0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
>       // CPU 0xe8000000 -> f0000000 bridged to PCI MMIO, prefetchable
>       0xC2000000 0x00000000 0xe8000000  0xe8000000  0x0 0x8000000
>       >;

These look good.

>    // PCI-PCI bridge to Physical link 0
>    pcie@0,0 {
>      device_type = "pciex";
>      // The configuration space address of the PCI-PCI bridge required by OF
>      reg = <0x80 0 0  0 0>; // Bus 0, Dev 0x10, Fn 0

I think the first cell should be 0x800.

> 
>      /* The 'bar' on the PCI-PCI bridge, maps to internal control
>         registers, required by the driver. */
>      assigned-addresses = <0x82000000 0x10000000 0xd0040000  0 0x2000>;
>      // ..
>   }

The PCI DT binding says that each entry in the assigned-addresses
property is to correspond to one of the PCI device's base address
registers. So unless this is actually the value that ends up being
written to one of the BARs I don't think this is correct.

Thierry
Arnd Bergmann - Feb. 7, 2013, 8:04 a.m.
On Wednesday 06 February 2013, Thomas Petazzoni wrote:
> Dear Arnd Bergmann,
> 
> On Wed, 6 Feb 2013 22:41:14 +0000, Arnd Bergmann wrote:
> 
> > I've been thinking some more about this, and I wonder if it would
> > make more sense to describe the address remapping correctly as
> > a node on top of the pcie-controller node.
> > 
> > This would mean that rather than putting the mapped physical address
> > (0xc0000000, 0xc1000000, ...) in here, you would actually have 64-bit
> > address as the destination as well, in whatever format the
> > address map hardware uses, I assume using a numbered 32 bit
> > address space for each object that can be remapped.
> > 
> > This would also let you do the PCI memory address assignment for
> > each port separately, starting at bus address 0, followed by
> > finding a location in the CPU address space and passing
> > the start as the sys->mem_offset argument to
> > pci_add_resource_offset.
> 
> Hum, good you give a skeleton example, because I'm not sure to
> understand your suggestion.

I mean (roughly, since I don't know how that hardware defines it)

/ {
	#address-cells = <1>;
	#size-cells = <1>;

	memory@0 {
		/* this node can not get remapped */
		reg = <0x0 0x40000000>;
	};

	address-map {
		/* this device translates 64 bit MMIO bus addresses into 32 bit CPU addresses */
		compatible = "marvell,armada-addr-decoding-controller";
		reg = <0xd0020000 0x258>;
		#addres-cells = <2>;
		#address-cells = <1>;

		/* each remapped window has one entry here */
		ranges = <0xa 0 0xc0000000 0x10000>,     /* map window a to 0xc0000000 */
		         <0xb 0 0xc1000000 0x08000000>,  /* map window b to 0xc1000000 */
			 <...>; /* more windows */

		pciex {
			#addres-cells = <3>;
			#size-cells = <2>;
			ranges = <0x81000000 0 0 0xa 0 0 0x00010000,  /* I/O is window a */
				  0x82000000 0 0 0xb 0 0 0x08000000>; /* non-prefetchable memory */
			...
		};


		something-else {
			...
			reg = <0xc 0 0x10000>; /* window c */
		};
	};
};

> > 
> > > +                       pcie@0,0 {
> > > +                               device_type = "pciex";
> > > +                               reg = <0x0800 0 0xd0040000 0
> > > 0x2000>;
> > > +                               #address-cells = <3>;
> > > +                               #size-cells = <2>;
> > > +                               marvell,pcie-port = <0>;
> > > +                               marvell,pcie-lane = <0>;
> > > +                               interrupts = <1>;
> > > +                               clocks = <&gateclk 5>;
> > > +                               status = "disabled";
> > > +                       };
> > 
> > I think you are missing a "ranges" property here, at least an empty
> > one, which is required by the standard but not currently enforced
> > in the code.
> 
> Is it really wise to have DT properties that are not used by anything,
> and therefore have a very high chance of either being incorrect, or
> becoming incorrect?

In this case, definitely. It's mandated by an IEEE standard and the only reason
why we let some mistakes slip here is that some ancient PowerMac systems
get it wrong in their firmware. There are lots of cases where the difference
between an empty "ranges" property and an absent one is very significant
and I would really prefer to change the kernel to be standard compliant
here.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann - Feb. 7, 2013, 8:24 a.m.
On Thursday 07 February 2013, Jason Gunthorpe wrote:
> On Wed, Feb 06, 2013 at 10:41:14PM +0000, Arnd Bergmann wrote:
> 
> > This would mean that rather than putting the mapped physical address
> > (0xc0000000, 0xc1000000, ...) in here, you would actually have 64-bit
> > address as the destination as well, in whatever format the
> > address map hardware uses, I assume using a numbered 32 bit
> > address space for each object that can be remapped.
> 
> Do you mean like:
> 
> mbus_matrix {
>    /* At this level addresses are
>        <32 bit MBUS target ID> <32 bit target address offset>
>       target_id is an internal MBUS specification
>      */
>    #address-cells = <2>;
>    #size-cells = <1>;
> 
>    pex0 {
>        ranges = <
>       // CPU window bridged to PCI MMIO
>       0x82000000 0x00000000 0x00000000  0x<target_id> 0x0 0x0 0x8000000
>       // CPU IO window bridged to PCI IO
>       0x81000000 0x00000000 0x00000000  0x<target_id> 0x0  0x0 0xa0000
>       ...>;
>    }
> 
>    nand {
>        reg = <0x<target_id> 0 0x200>;
>    };
> 
>    internal_regs {
>        ranges = <0 0x<target_id> 0 0x10000>;
> 
>        timer {
>            reg = <0x20300 0x20>;
>       };
>    };
> }
> 
> ?

Yes, but the mbus-matrix node in the example would need a ranges property
to map the addresses according to how the windows are set up.

> I think the big issue would go back to how to pool all the link
> decoders together in a way that fits OF and Linux's PCI core will
> understand?
> 
> How does of_translate_address work in a world like this?

The trick is that the root node must still be #address-cells=<1> and
refer to the CPU's translated view of the world, rather the raw
view with individul windows. When we set up the address map in
Linux, we would initially read the windows from the ranges property,
but if we make changes to the windows, that property has to be
adapted on the fly. of_translate_address will then for any
device go through the intermediate 64 bit address but end up with
the correct physical address at the root node.

> > This would also let you do the PCI memory address assignment for
> > each port separately, starting at bus address 0, followed by
> > finding a location in the CPU address space and passing
> > the start as the sys->mem_offset argument to
> > pci_add_resource_offset.
> 
> That goes back to the original problem - the goal is to have only one
> pci_sys_data, not one for every link.
> 
> The host driver would have to request a large region of physical
> address space and still dole it out on a link by link basis. Not sure
> how to model that in DT??
> 
> In any event, changing how all the dynamic windows are configured in
> DT is a big job (there was another thread about this) it seems
> orthogonal to the PCI host driver..

Yes, it is orthogonal, you are right, but I think it would make it
easier to understand what we are trying to do here with the PCI
node. 

I'm not sure why you say "the goal is to have only one pci_sys_data",
as far as I'm concerned, the goal is to have a working system that
is both as sensible and as simple as possible. When in hardware
you have a bunch of independent PCIe host bridges, each with their
own address translation, that means to me that (if possible) we 
should in the device tree show multiple independent PCIe host bridges
and how they are set up with address translation, and Linux should
see multiple independent host bridges as well.
Having a single pci_sys_data is a hack to defeat some of the problems
associated with getting bus probing to work in practice, but it's
not necessarily the best solution.

> > > +                       pcie@0,0 {
> > > +                               device_type = "pciex";
> > > +                               reg = <0x0800 0 0xd0040000 0 0x2000>;
> > > +                               #address-cells = <3>;
> > > +                               #size-cells = <2>;
> > > +                               marvell,pcie-port = <0>;
> > > +                               marvell,pcie-lane = <0>;
> > > +                               interrupts = <1>;
> > > +                               clocks = <&gateclk 5>;
> > > +                               status = "disabled";
> > > +                       };
> > 
> > I think you are missing a "ranges" property here, at least an empty
> > one, which is required by the standard but not currently enforced
> > in the code.
> 
> Maybe.. according to the standard the ranges in this stanza should
> reflect the bridge configuration, but that isn't known when the DT is
> written. An empty ranges means identity and that isn't really right
> either.

Ok, I thought it was an identity mapping here.

> Also, what should 'reg' be so that the PCI core binds the OF nodes
> properly?  The standard says reg should have the configuration space
> address of the bridge, and I noticed Thierry was using something that
> almost looked like a config space address in his driver..

Well, that assumes that a bridge is addressed using configuration space,
which IIRC is normally the case but not here.

> But that seems overly tricky.. When using the link stanzas, shouldn't
> this scheme be more like this:
> 
> // The PCI-E host bridge
> pex@0 {
>    device_type = "pciex"; // <<-- Important!!
> 
>    ranges = <
>       // Driver internal control registers in MMIO space
>       0x82000000 0x10000000 0xd0040000  0xd0040000  0x0 0x8000000
> 
>       // CPU 0xe0000000 -> e8000000 bridged to PCI MMIO
>       0x82000000 0x00000000 0xe0000000  0xe0000000  0x0 0x8000000
>       // CPU IO bus 0x0000000 -> 0xa0000 bridged to PCI IO
>       0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
>       // CPU 0xe8000000 -> f0000000 bridged to PCI MMIO, prefetchable
>       0xC2000000 0x00000000 0xe8000000  0xe8000000  0x0 0x8000000
>       >;
> 
>    //...
> 
>    // PCI-PCI bridge to Physical link 0
>    pcie@0,0 {
>      device_type = "pciex";
>      // The configuration space address of the PCI-PCI bridge required by OF
>      reg = <0x80 0 0  0 0>; // Bus 0, Dev 0x10, Fn 0
> 
>      /* The 'bar' on the PCI-PCI bridge, maps to internal control
>         registers, required by the driver. */
>      assigned-addresses = <0x82000000 0x10000000 0xd0040000  0 0x2000>;
>      // ..
>   }
> }
> 
> Thierry, what do you think?
> 
> I struggled with this particular area of the OF system recently and
> Grant Likely was very helpful - the above is based on that
> discussion..

I never really understood the 'assigned-addresses' property, but it looks
sensible.

> ======
> 
> Thomas, here is one possible alternate idea, not sure on the merit,
> just including it because it is close to what I've got in my DT right
> now..:
> 
> // The entire multi-lane controller and PCI root bus
> pex@0 {
>    device_type = "pciex";
> 
>    /* This section configures the Linux PCI host driver. Each line is a
>       physical PCI-Link. (Erorrs included :) */
>    compatible = "marvell,armada-370-xp-pcie";
>    regs = <0xd0040000 0x00002000   // port 0.0
>            0xd0042000 0x00002000   // port 2.0
>            0xd0044000 0x00002000   // port 0.1
>            0xd0048000 0x00002000   // port 0.2
>            0xd004C000 0x00002000>; // port 0.3
>    io-cpu-window = <0xc0000000 0xa0000>;
>    interrupts = <58 59 60 61 99>;
>    clocks = <&gateclk 5
>              &gateclk 6
>              &gateclk 7
>              &gateclk 8
>              &gateclk 26>
>    marvell,port-lane <0 0
>                       0 1
>                       0 2
>                       0 3
>                       2 0>;
> 
>    /* Below here configures the aggregate PCI bus the PEX0 stanza
>       describes */
> 
>    #address-cells = <3>;
>    #size-cells = <2>;
> 
>    // CPU resources allocated to this PCI host bridge
>    bus-range = <0x00 0xff>;
>    ranges = <
>       // CPU 0xe0000000 -> e8000000 bridged to PCI MMIO
>       0x82000000 0x00000000 0xe0000000  0xe0000000  0x0 0x8000000
>       // CPU IO bus 0x0000000 -> 0xa0000 bridged to PCI IO
>       0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
>       // CPU 0xe8000000 -> f0000000 bridged to PCI MMIO, prefetchable
>       0xC2000000 0x00000000 0xe8000000  0xe8000000  0x0 0x8000000
>       >;
> 
>    /* Any PCI devices associated with this bus go here, relative to
>       the above ranges. As example: */
>    link@0 {
>    	device_type = "pciex";
>         // Root port at bus 0, device 0x10, function 0
> 	reg = <0x00000080 0 0  0 0>;
> 
>        ep {
>    	 device_type = "pciex";
>          // End port on bus 1, device 0, function 0
> 	 reg = <0x00000100 0 0  0 0>;
>        } 
>    }
> }
> 
> Thoughts:
>  - 'regs' in the main stanza, keep to CPU addresses instead of
>    confusing translated fake ranges address

Yes, I like that. If we follow the address translation method I suggested,
this would be a 64-bit address of course, but still easier to understand
than what we have now.

>  - each regs line is matches to an interrupt, clock and port-lane
>    line to describe a link. The above describes 5 links.

nice.

>  - The CPU physical address window to use for the IO space
>    is set via io-cpu-window, not much choice here, the PCI
>    format ranges must be 0 based.

I don't think I understand this part. Why can't you put this into
ranges as before?

-       0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
+	0x81000000 0x00000000 0x00000000  0xc0000000  0x0 0xa0000

>  - It is not necessary to have per-root port stanzas at all
>    with the above.
>  - There is only one PCI bus stanza, the top level.

Ok.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni - Feb. 7, 2013, 8:45 a.m.
Dear Arnd Bergmann,

On Thu, 7 Feb 2013 08:04:12 +0000, Arnd Bergmann wrote:
> 	address-map {
> 		/* this device translates 64 bit MMIO bus addresses into 32 bit CPU addresses */
> 		compatible = "marvell,armada-addr-decoding-controller";
> 		reg = <0xd0020000 0x258>;
> 		#addres-cells = <2>;
> 		#address-cells = <1>;
> 
> 		/* each remapped window has one entry here */
> 		ranges = <0xa 0 0xc0000000 0x10000>,     /* map window a to 0xc0000000 */
> 		         <0xb 0 0xc1000000 0x08000000>,  /* map window b to 0xc1000000 */
> 			 <...>; /* more windows */
> 
> 		pciex {
> 			#addres-cells = <3>;
> 			#size-cells = <2>;
> 			ranges = <0x81000000 0 0 0xa 0 0 0x00010000,  /* I/O is window a */
> 				  0x82000000 0 0 0xb 0 0 0x08000000>; /* non-prefetchable memory */
> 			...
> 		};
> 
> 
> 		something-else {
> 			...
> 			reg = <0xc 0 0x10000>; /* window c */
> 		};
> 	};
> };

Unfortunately, this means the windows are statically defined in the DT,
which is simply not possible for PCIe, as we have already explained
several times in this thread.

Any solution where the PCIe windows are statically described is
simply /not/ acceptable.

We have 10 PCIe interfaces, each requiring up to two windows, and we
have on the system a *total* of 20 windows. Doing static assignments of
windows is simply not an option.

Of course, you'll tell me that it's up to each board .dts to have a
number of windows that matches the number of actually existing PCIe
interface. But it means that each and every developer adding the
support for a new board must understand this complex problem, which is
something we do not want. We have a solution that makes all of this
PCIe window assignment dynamic, so it surprises me that we have to
continue to explain why a static solution is not appropriate.

Best regards,

Thomas
Arnd Bergmann - Feb. 7, 2013, 9:09 a.m.
On Thursday 07 February 2013, Thomas Petazzoni wrote:
> Unfortunately, this means the windows are statically defined in the DT,
> which is simply not possible for PCIe, as we have already explained
> several times in this thread.
> 
> Any solution where the PCIe windows are statically described is
> simply not acceptable.
> 
> We have 10 PCIe interfaces, each requiring up to two windows, and we
> have on the system a total of 20 windows. Doing static assignments of
> windows is simply not an option.
> 
> Of course, you'll tell me that it's up to each board .dts to have a
> number of windows that matches the number of actually existing PCIe
> interface. But it means that each and every developer adding the
> support for a new board must understand this complex problem, which is
> something we do not want. We have a solution that makes all of this
> PCIe window assignment dynamic, so it surprises me that we have to
> continue to explain why a static solution is not appropriate.

No, the idea here was actually to leave out any of the dynamic mappings
from the device tree and do the PCI bus probe for each port based
on a local bus address starting at 0 for each port. After all
BARs are assigned, you can then map each port to a convenient physical
address and register it by passing the start offset so that the
pci resources are adapted correctly.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe - Feb. 7, 2013, 5 p.m.
On Thu, Feb 07, 2013 at 08:24:50AM +0000, Arnd Bergmann wrote:

> Yes, but the mbus-matrix node in the example would need a ranges property
> to map the addresses according to how the windows are set up.

I'll hang on to this for next time the moving windows config to DT
discussion comes up..


> > Maybe.. according to the standard the ranges in this stanza should
> > reflect the bridge configuration, but that isn't known when the DT is
> > written. An empty ranges means identity and that isn't really right
> > either.
> 
> Ok, I thought it was an identity mapping here.
> 
> > Also, what should 'reg' be so that the PCI core binds the OF nodes
> > properly?  The standard says reg should have the configuration space
> > address of the bridge, and I noticed Thierry was using something that
> > almost looked like a config space address in his driver..
> 
> Well, that assumes that a bridge is addressed using configuration space,
> which IIRC is normally the case but not here.

With Thomas's driver each link has a PCI-PCI bridge in config space, and
'configuration space address' is that wonky format OF defines for
encoding the bus/device/function number into the 3 dword address. So
the correct thing is to put the bus/device/function of the PCI-PCI
bridge for the link in the reg value.

> I never really understood the 'assigned-addresses' property, but it looks
> sensible.

assigned-addresses does the same thing as reg in simple bus, but
handles all the wonky PCI address translation
 
> >  - The CPU physical address window to use for the IO space
> >    is set via io-cpu-window, not much choice here, the PCI
> >    format ranges must be 0 based.
> 
> I don't think I understand this part. Why can't you put this into
> ranges as before?
> 
> -       0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
> +	0x81000000 0x00000000 0x00000000  0xc0000000  0x0 0xa0000

The OF PCI core translates 0x81000000 IO space addresess into a 'struct
resource' tagged with IORESOURCE_IO.

But 0xc0000000 is not an IORESOURCE_IO address, it is an
IORESOURCE_MEM address..

So, I think with the current OF code this has to be 0, otherwise your
IORESOURCE_IO's end up starting at 0xc000000 - but maybe there is some
trickyness to work with in here? (Although none of this matters when
Linux does resource assignment, the OF translation code is never
enganged)

But I agree, 0xc0000000 seems much better...

To think about it from a different angle, what would you put in the
4th dword on x86? How do you describe the IO numberspace in DT on x86?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe - Feb. 7, 2013, 5:49 p.m.
On Thu, Feb 07, 2013 at 08:28:24AM +0100, Thierry Reding wrote:
> On Wed, Feb 06, 2013 at 06:05:02PM -0700, Jason Gunthorpe wrote:
> [...]
> > Also, what should 'reg' be so that the PCI core binds the OF nodes
> > properly?  The standard says reg should have the configuration space
> > address of the bridge, and I noticed Thierry was using something that
> > almost looked like a config space address in his driver..
> > 
> > But that seems overly tricky.. When using the link stanzas, shouldn't
> > this scheme be more like this:
> > 
> > // The PCI-E host bridge
> > pex@0 {
> >    device_type = "pciex"; // <<-- Important!!
> 
> That might actually work but is somewhat dangerous. I originally tried
> to trick the OF code into parsing the ranges properly by setting this to
> "pci". However that breaks horribly because of_bus_pci_match() in
> drivers/of/address.c will cause the parent bus of the pex@0 controller
> to be PCI, which will cause #address-cells == 3 and #size-cells == 2,
> and thus messing up the address translation because you actually have
> #address-cells == 1 and #size-cells == 1.

Oh right, yes, I ment it to be "pci", specifically to engage that
code. Ditto for the other instance.

Not sure about the side effect you are talking about, I am using that
pattern in my DT and it is OK, but the PEX node is off the root node
directly.. It shouldn't affect the parent. Other DT examples using
device_type = pci (ie in PPC) also put it on the top PCI node and have
that node buried, so it should be correct, but all the stuff below has
to be correct first..
 
> >    ranges = <
> >       // Driver internal control registers in MMIO space
> >       0x82000000 0x10000000 0xd0040000  0xd0040000  0x0 0x8000000
> 
> I'm not sure I understand what you're doing here. Where does the
> 0x10000000 in cell 2 come from?

It is just a marker to keep the internal regs distinct from the PCI
MMIO window. I don't really like it, but something is necessary to
pass the non-PCI items into the link stanza. It would probably be
better to just place them outside the link as I did in my other
example..

> I also just noticed that I used 0x00000800 in the first cell, maybe that
> should be 0x02000800, though I think that didn't quite work for some
> reason. I'll need to check that. The odd part about this is that the

I think that is the trickyness you have, 0x00000800 is probably the
correct config space address number for your root port bridge. You are
overriding it to be both a config space address and something that
translates back to a CPU MMIO address.

If you use 0x02000800 then it is no longer a config space address, it is
an MMIO address and your 'reg' won't match anymore.

> address is in fact not within PCI MMIO space at all, so I'm not sure
> this is even a correct way to represent it. However I find it quite
> intuitive to do so and it is really the only way to make the translation
> from the PCI-PCI bridge's reg property work while at the same time
> having a proper PCI address for the port.

Your problem is that in PCI devices 'reg' is not supposed to be
translated to a CPU address. It is the config space address of the PCI
device followed by the *sizes* of all the BARs.

assigned-addresses is the MMIO location (and size) of the 'BARs', and
is intended to be translated to a CPU address.

When the bus is properly in PCI mode the OF address code automatically
uses assigned-addresses instead of regs, that is why you need
device_type = pci

> >    // PCI-PCI bridge to Physical link 0
> >    pcie@0,0 {
> >      device_type = "pciex";
> >      // The configuration space address of the PCI-PCI bridge required by OF
> >      reg = <0x80 0 0  0 0>; // Bus 0, Dev 0x10, Fn 0
> 
> I think the first cell should be 0x800.

Right.

> >      /* The 'bar' on the PCI-PCI bridge, maps to internal control
> >         registers, required by the driver. */
> >      assigned-addresses = <0x82000000 0x10000000 0xd0040000  0 0x2000>;
> >      // ..
> >   }
> 
> The PCI DT binding says that each entry in the assigned-addresses
> property is to correspond to one of the PCI device's base address
> registers. So unless this is actually the value that ends up being
> written to one of the BARs I don't think this is correct.

Yes, but, it is a compromise, of sorts. DT has no way to pass a
non-PCI described resource into this stanza. It would ideal if the
driver didn't require that at all, but if it does I think it has to be
through assigned-addresses...

Now, perhaps this is better:

assigned-addresses = <0 0 0  0 0 // BAR 0 of the bridge, unused
                      0 0 0  0 0 // BAR 1 of the bridge, unused
                      0x82000000 0x10000000 0xd0040000  0 0x2000>; // Extended

Mark the only two BARs in the bridge as unused and put your non-BAR
registers after them.

Certainly, trying to use reg to convey that the link has an internal
MMIO region at CPU address 0xd0040000 would be even worse..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann - Feb. 7, 2013, 11:44 p.m.
On Thursday 07 February 2013, Jason Gunthorpe wrote:
> On Thu, Feb 07, 2013 at 08:24:50AM +0000, Arnd Bergmann wrote:
> > >  - The CPU physical address window to use for the IO space
> > >    is set via io-cpu-window, not much choice here, the PCI
> > >    format ranges must be 0 based.
> > 
> > I don't think I understand this part. Why can't you put this into
> > ranges as before?
> > 
> > -       0x81000000 0x00000000 0x00000000  0x00000000  0x0 0xa0000
> > +	0x81000000 0x00000000 0x00000000  0xc0000000  0x0 0xa0000
> 
> The OF PCI core translates 0x81000000 IO space addresess into a 'struct
> resource' tagged with IORESOURCE_IO.
> 
> But 0xc0000000 is not an IORESOURCE_IO address, it is an
> IORESOURCE_MEM address..
> 
> So, I think with the current OF code this has to be 0, otherwise your
> IORESOURCE_IO's end up starting at 0xc000000 - but maybe there is some
> trickyness to work with in here? (Although none of this matters when
> Linux does resource assignment, the OF translation code is never
> enganged)

Yes, I think this is for historic reasons: the PCI binding far
predates the Linux implementation, and I'm sure on MacOS, AIX
and Solaris the PCI drivers did not actually have the same kind
of wrappers for PIO functions that we have on Linux because of
the x86 legacy.

> But I agree, 0xc0000000 seems much better...
> 
> To think about it from a different angle, what would you put in the
> 4th dword on x86? How do you describe the IO numberspace in DT on x86?

I think the best option is to have no translation at all on x86,
leaving 0x81000000 out of the ranges property. I'm not sure if
the authors of the binding actually considered that case though.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/arm/boot/dts/armada-xp-mv78230.dtsi b/arch/arm/boot/dts/armada-xp-mv78230.dtsi
index e041f42..6abb0ff 100644
--- a/arch/arm/boot/dts/armada-xp-mv78230.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78230.dtsi
@@ -70,5 +70,97 @@ 
 			#interrupts-cells = <2>;
 			interrupts = <87>, <88>, <89>;
 		};
+
+		/*
+		 * MV78230 has 2 PCIe units Gen2.0: One unit can be
+		 * configured as x4 or quad x1 lanes. One unit is
+		 * x4/x1.
+		 */
+		pcie-controller {
+			compatible = "marvell,armada-370-xp-pcie";
+			status = "disabled";
+
+			#address-cells = <3>;
+			#size-cells = <2>;
+
+
+			bus-range = <0x00 0xff>;
+
+			ranges = <0x00000800 0 0xd0040000 0xd0040000 0 0x00002000   /* port 0.0 registers */
+			          0x00004800 0 0xd0042000 0xd0042000 0 0x00002000   /* port 2.0 registers */
+			          0x00001000 0 0xd0044000 0xd0044000 0 0x00002000   /* port 0.1 registers */
+			          0x00001800 0 0xd0048000 0xd0048000 0 0x00002000   /* port 0.2 registers */
+			          0x00002000 0 0xd004C000 0xd004C000 0 0x00002000   /* port 0.3 registers */
+				  0x81000000 0 0	  0xc0000000 0 0x00010000   /* downstream I/O */
+				  0x82000000 0 0	  0xc1000000 0 0x08000000>; /* non-prefetchable memory */
+
+			#interrupt-cells = <1>;
+			interrupt-map-mask = <0xf800 0 0 1>;
+			interrupt-map = <0x0800 0 0 1 &mpic 58 1
+				         0x1000 0 0 1 &mpic 59 1
+					 0x1800 0 0 1 &mpic 60 1
+					 0x2000 0 0 1 &mpic 61 1
+					 0x4800 0 0 1 &mpic 99 1>;
+
+			pcie@0,0 {
+				device_type = "pciex";
+				reg = <0x0800 0 0xd0040000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <0>;
+				interrupts = <1>;
+				clocks = <&gateclk 5>;
+				status = "disabled";
+			};
+
+			pcie@0,1 {
+				device_type = "pciex";
+				reg = <0x1000 0 0xd0044000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <1>;
+				interrupts = <1>;
+				clocks = <&gateclk 6>;
+				status = "disabled";
+			};
+
+			pcie@0,2 {
+				device_type = "pciex";
+				reg = <0x1800 0 0xd0048000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <2>;
+				interrupts = <1>;
+				clocks = <&gateclk 7>;
+				status = "disabled";
+			};
+
+			pcie@0,3 {
+				device_type = "pciex";
+				reg = <0x2000 0 0xd004C000 0 0xC000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <3>;
+				interrupts = <1>;
+				clocks = <&gateclk 8>;
+				status = "disabled";
+			};
+
+			pcie@2,0 {
+				device_type = "pciex";
+				reg = <0x4800 0 0xd0042000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <2>;
+				marvell,pcie-lane = <0>;
+				interrupts = <1>;
+				clocks = <&gateclk 26>;
+				status = "disabled";
+			};
+		};
 	};
 };
diff --git a/arch/arm/boot/dts/armada-xp-mv78260.dtsi b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
index 9e23bd8..ab8c593 100644
--- a/arch/arm/boot/dts/armada-xp-mv78260.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
@@ -90,5 +90,110 @@ 
 				clocks = <&gateclk 1>;
 				status = "disabled";
 		};
+
+		/*
+		 * MV78260 has 3 PCIe units Gen2.0: Two units can be
+		 * configured as x4 or quad x1 lanes. One unit is
+		 * x4/x1.
+		 */
+		pcie-controller {
+			compatible = "marvell,armada-370-xp-pcie";
+			status = "disabled";
+
+			#address-cells = <3>;
+			#size-cells = <2>;
+
+			bus-range = <0x00 0xff>;
+
+			ranges = <0x00000800 0 0xd0040000 0xd0040000 0 0x00002000   /* port 0.0 registers */
+			          0x00004800 0 0xd0042000 0xd0042000 0 0x00002000   /* port 2.0 registers */
+			          0x00001000 0 0xd0044000 0xd0044000 0 0x00002000   /* port 0.1 registers */
+			          0x00001800 0 0xd0048000 0xd0048000 0 0x00002000   /* port 0.2 registers */
+			          0x00002000 0 0xd004C000 0xd004C000 0 0x00002000   /* port 0.3 registers */
+			          0x00005000 0 0xd0082000 0xd0082000 0 0x00002000   /* port 3.0 registers */
+				  0x81000000 0 0	  0xc0000000 0 0x00010000   /* downstream I/O */
+				  0x82000000 0 0	  0xc1000000 0 0x08000000>; /* non-prefetchable memory */
+
+			#interrupt-cells = <1>;
+			interrupt-map-mask = <0xf800 0 0 1>;
+			interrupt-map = <0x0800 0 0 1 &mpic 58 1
+				         0x1000 0 0 1 &mpic 59 1
+					 0x1800 0 0 1 &mpic 60 1
+					 0x2000 0 0 1 &mpic 61 1
+					 0x4800 0 0 1 &mpic 99 1
+					 0x5000 0 0 1 &mpic 103 1>;
+
+			pcie@0,0 {
+				device_type = "pciex";
+				reg = <0x0800 0 0xd0040000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <0>;
+				interrupts = <1>;
+				clocks = <&gateclk 5>;
+				status = "disabled";
+			};
+
+			pcie@0,1 {
+				device_type = "pciex";
+				reg = <0x1000 0 0xd0044000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <1>;
+				interrupts = <1>;
+				clocks = <&gateclk 6>;
+				status = "disabled";
+			};
+
+			pcie@0,2 {
+				device_type = "pciex";
+				reg = <0x1800 0 0xd0048000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <2>;
+				interrupts = <1>;
+				clocks = <&gateclk 7>;
+				status = "disabled";
+			};
+
+			pcie@0,3 {
+				device_type = "pciex";
+				reg = <0x2000 0 0xd004C000 0 0xC000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <3>;
+				interrupts = <1>;
+				clocks = <&gateclk 8>;
+				status = "disabled";
+			};
+
+			pcie@2,0 {
+				device_type = "pciex";
+				reg = <0x4800 0 0xd0042000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <2>;
+				marvell,pcie-lane = <0>;
+				interrupts = <1>;
+				clocks = <&gateclk 26>;
+				status = "disabled";
+			};
+
+			pcie@3,0 {
+				device_type = "pciex";
+				reg = <0x5000 0 0xd0082000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <3>;
+				marvell,pcie-lane = <0>;
+				interrupts = <1>;
+				clocks = <&gateclk 27>;
+				status = "disabled";
+			};
+		};
 	};
 };
diff --git a/arch/arm/boot/dts/armada-xp-mv78460.dtsi b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
index 9659661..00c69aa 100644
--- a/arch/arm/boot/dts/armada-xp-mv78460.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
@@ -105,5 +105,155 @@ 
 				clocks = <&gateclk 1>;
 				status = "disabled";
 		};
+
+		/*
+		 * MV78460 has 4 PCIe units Gen2.0: Two units can be
+		 * configured as x4 or quad x1 lanes. Two units are
+		 * x4/x1.
+		 */
+		pcie-controller {
+			compatible = "marvell,armada-370-xp-pcie";
+			status = "disabled";
+
+			#address-cells = <3>;
+			#size-cells = <2>;
+
+			bus-range = <0x00 0xff>;
+
+			ranges = <0x00000800 0 0xd0040000 0xd0040000 0 0x00002000   /* port 0.0 registers */
+			          0x00004800 0 0xd0042000 0xd0042000 0 0x00002000   /* port 2.0 registers */
+			          0x00001000 0 0xd0044000 0xd0044000 0 0x00002000   /* port 0.1 registers */
+			          0x00001800 0 0xd0048000 0xd0048000 0 0x00002000   /* port 0.2 registers */
+			          0x00002000 0 0xd004C000 0xd004C000 0 0x00002000   /* port 0.3 registers */
+				  0x00002800 0 0xd0080000 0xd0080000 0 0x00002000   /* port 1.0 registers */
+			          0x00005000 0 0xd0082000 0xd0082000 0 0x00002000   /* port 3.0 registers */
+				  0x00003000 0 0xd0084000 0xd0084000 0 0x00002000   /* port 1.1 registers */
+				  0x00003800 0 0xd0088000 0xd0088000 0 0x00002000   /* port 1.2 registers */
+				  0x00004000 0 0xd008C000 0xd008C000 0 0x00002000   /* port 1.3 registers */
+				  0x81000000 0 0	  0xc0000000 0 0x00100000   /* downstream I/O */
+				  0x82000000 0 0	  0xc1000000 0 0x08000000>; /* non-prefetchable memory */
+
+			#interrupt-cells = <1>;
+			interrupt-map-mask = <0xf800 0 0 1>;
+			interrupt-map = <0x0800 0 0 1 &mpic 58
+				         0x1000 0 0 1 &mpic 59
+					 0x1800 0 0 1 &mpic 60
+					 0x2000 0 0 1 &mpic 61
+					 0x2800 0 0 1 &mpic 62
+				         0x3000 0 0 1 &mpic 63
+					 0x3800 0 0 1 &mpic 64
+					 0x4000 0 0 1 &mpic 65
+					 0x4800 0 0 1 &mpic 99
+					 0x5000 0 0 1 &mpic 103>;
+
+			pcie@0,0 {
+				device_type = "pciex";
+				reg = <0x0800 0 0xd0040000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <0>;
+				clocks = <&gateclk 5>;
+				status = "disabled";
+			};
+
+			pcie@0,1 {
+				device_type = "pciex";
+				reg = <0x1000 0 0xd0044000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <1>;
+				clocks = <&gateclk 6>;
+				status = "disabled";
+			};
+
+			pcie@0,2 {
+				device_type = "pciex";
+				reg = <0x1800 0 0xd0048000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <2>;
+				clocks = <&gateclk 7>;
+				status = "disabled";
+			};
+
+			pcie@0,3 {
+				device_type = "pciex";
+				reg = <0x2000 0 0xd004C000 0 0xC000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <0>;
+				marvell,pcie-lane = <3>;
+				clocks = <&gateclk 8>;
+				status = "disabled";
+			};
+
+			pcie@1,0 {
+				device_type = "pciex";
+				reg = <0x2800 0 0xd0080000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <1>;
+				marvell,pcie-lane = <0>;
+				clocks = <&gateclk 9>;
+				status = "disabled";
+			};
+
+			pcie@1,1 {
+				device_type = "pciex";
+				reg = <0x3000 0 0xd0084000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <1>;
+				marvell,pcie-lane = <1>;
+				clocks = <&gateclk 10>;
+				status = "disabled";
+			};
+
+			pcie@1,2 {
+				device_type = "pciex";
+				reg = <0x3800 0 0xd0088000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <1>;
+				marvell,pcie-lane = <2>;
+				clocks = <&gateclk 11>;
+				status = "disabled";
+			};
+
+			pcie@1,3 {
+				device_type = "pciex";
+				reg = <0x4000 0 0xd008C000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <1>;
+				marvell,pcie-lane = <3>;
+				clocks = <&gateclk 12>;
+				status = "disabled";
+			};
+			pcie@2,0 {
+				device_type = "pciex";
+				reg = <0x4800 0 0xd0042000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <2>;
+				marvell,pcie-lane = <0>;
+				clocks = <&gateclk 26>;
+				status = "disabled";
+			};
+
+			pcie@3,0 {
+				device_type = "pciex";
+				reg = <0x5000 0 0xd0082000 0 0x2000>;
+				#address-cells = <3>;
+				#size-cells = <2>;
+				marvell,pcie-port = <3>;
+				marvell,pcie-lane = <0>;
+				clocks = <&gateclk 27>;
+				status = "disabled";
+			};
+		};
 	};
  };