mbox series

[00/10] Add Tegra194 PCIe support

Message ID 1553613207-3988-1-git-send-email-vidyas@nvidia.com
Headers show
Series Add Tegra194 PCIe support | expand

Message

Vidya Sagar March 26, 2019, 3:13 p.m. UTC
Tegra194 has six PCIe controllers based on Synopsys DesignWare core.
There are two Universal PHY (UPHY) blocks with each supporting 12(HSIO:
Hisg Speed IO) and 8(NVHS: NVIDIA High Speed) lanes respectively.
Controllers:0~4 use UPHY lanes from HSIO brick whereas Controller:5 uses
UPHY lanes from NVHS brick. Lane mapping in HSIO UPHY brick to each PCIe
controller (0~4) is controlled in XBAR module by BPMP-FW. Since PCIe
core has PIPE interface, a glue module called PIPE-to-UPHY (P2U) is used
to connect each UPHY lane (applicable to both HSIO and NVHS UPHY bricks)
to PCIe controller
This patch series
- Adds support for P2U PHY driver
- Adds support for PCIe host controller
- Adds device tree nodes each PCIe controllers
- Enables nodes applicable to p2972-0000 platform
- Adds helper APIs in Designware core driver to get capability regs offset
- Adds defines for new feature registers of PCIe spec revision 4
- Makes changes in DesignWare core driver to get Tegra194 PCIe working

Testing done on P2972-0000 platform
- Able to get PCIe link up with on-board Marvel eSATA controller
- Able to get PCIe link up with NVMe cards connected to M.2 Key-M slot
- Able to do data transfers with both SATA drives and NVMe cards

Note
- Enabling x8 slot on P2972-0000 platform requires pinmux driver for Tegra194.
  It is being worked on currently and hence Controller:5 (i.e. x8 slot) is
  disabled in this patch series. A future patch series would enable this.
Vidya Sagar (10):
  PCI: save pci_bus pointer in pcie_port structure
  PCI: perform dbi regs write lock towards the end
  PCI: dwc: Move config space capability search API
  PCI: Add #defines for PCIe spec r4.0 features
  dt-bindings: PCI: tegra: Add device tree support for T194
  arm64: tegra: Add P2U and PCIe controller nodes to Tegra194 DT
  arm64: tegra: Enable PCIe slots in P2972-0000 board
  phy: tegra: Add PCIe PIPE2UPHY support
  PCI: tegra: Add Tegra194 PCIe support
  arm64: Add Tegra194 PCIe driver to defconfig

 .../bindings/pci/nvidia,tegra194-pcie.txt          |  209 +++
 .../devicetree/bindings/phy/phy-tegra194-p2u.txt   |   34 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi     |    2 +-
 arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts |   50 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi           |  473 +++++
 arch/arm64/configs/defconfig                       |    1 +
 drivers/pci/controller/dwc/Kconfig                 |   10 +
 drivers/pci/controller/dwc/Makefile                |    1 +
 drivers/pci/controller/dwc/pcie-designware-ep.c    |   37 +-
 drivers/pci/controller/dwc/pcie-designware-host.c  |    4 +-
 drivers/pci/controller/dwc/pcie-designware.c       |   73 +
 drivers/pci/controller/dwc/pcie-designware.h       |    4 +
 drivers/pci/controller/dwc/pcie-tegra194.c         | 1862 ++++++++++++++++++++
 drivers/phy/tegra/Kconfig                          |    7 +
 drivers/phy/tegra/Makefile                         |    1 +
 drivers/phy/tegra/pcie-p2u-tegra194.c              |  138 ++
 include/uapi/linux/pci_regs.h                      |   22 +-
 17 files changed, 2888 insertions(+), 40 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/nvidia,tegra194-pcie.txt
 create mode 100644 Documentation/devicetree/bindings/phy/phy-tegra194-p2u.txt
 create mode 100644 drivers/pci/controller/dwc/pcie-tegra194.c
 create mode 100644 drivers/phy/tegra/pcie-p2u-tegra194.c

Comments

Jon Hunter March 27, 2019, 10:07 a.m. UTC | #1
On 26/03/2019 15:13, Vidya Sagar wrote:
> Add support for Synopsys DesignWare core IP based PCIe host controller
> present in Tegra194 SoC.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  drivers/pci/controller/dwc/Kconfig         |   10 +
>  drivers/pci/controller/dwc/Makefile        |    1 +
>  drivers/pci/controller/dwc/pcie-tegra194.c | 1862 ++++++++++++++++++++++++++++
>  3 files changed, 1873 insertions(+)
>  create mode 100644 drivers/pci/controller/dwc/pcie-tegra194.c
> 
> diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> index 6ea74b1c0d94..d80f2d77892a 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -213,4 +213,14 @@ config PCIE_UNIPHIER
>  	  Say Y here if you want PCIe controller support on UniPhier SoCs.
>  	  This driver supports LD20 and PXs3 SoCs.
>  
> +config PCIE_TEGRA194
> +	bool "NVIDIA Tegra (T194) PCIe controller"
> +	depends on TEGRA_BPMP && (ARCH_TEGRA || COMPILE_TEST)
> +	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCIE_DW_HOST
> +	select PHY_TEGRA194_PCIE_P2U
> +	help
> +	  Say Y here if you want support for DesignWare core based PCIe host
> +	  controller found in NVIDIA Tegra T194 SoC.
> +

Don't we want tristate here? You have a removal function.

Cheers
Jon
Jon Hunter March 27, 2019, 10:08 a.m. UTC | #2
On 26/03/2019 15:13, Vidya Sagar wrote:
> Add PCIe host controller driver for DesignWare core based
> PCIe controller IP present in Tegra194.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  arch/arm64/configs/defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 2d9c39033c1a..2ddea5c4e87d 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -87,6 +87,7 @@ CONFIG_PCIE_QCOM=y
>  CONFIG_PCIE_ARMADA_8K=y
>  CONFIG_PCIE_KIRIN=y
>  CONFIG_PCIE_HISI_STB=y
> +CONFIG_PCIE_TEGRA194=y
>  CONFIG_ARM64_VA_BITS_48=y
>  CONFIG_SCHED_MC=y
>  CONFIG_NUMA=y

Maybe building as a module is more appropriate here as I am not sure
that anyone else will want this built-in and it is not critical to
booting AFAIK.

Cheers
Jon
Vidya Sagar March 27, 2019, 10:12 a.m. UTC | #3
On 3/27/2019 3:38 PM, Jon Hunter wrote:
> 
> On 26/03/2019 15:13, Vidya Sagar wrote:
>> Add PCIe host controller driver for DesignWare core based
>> PCIe controller IP present in Tegra194.
>>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>>   arch/arm64/configs/defconfig | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
>> index 2d9c39033c1a..2ddea5c4e87d 100644
>> --- a/arch/arm64/configs/defconfig
>> +++ b/arch/arm64/configs/defconfig
>> @@ -87,6 +87,7 @@ CONFIG_PCIE_QCOM=y
>>   CONFIG_PCIE_ARMADA_8K=y
>>   CONFIG_PCIE_KIRIN=y
>>   CONFIG_PCIE_HISI_STB=y
>> +CONFIG_PCIE_TEGRA194=y
>>   CONFIG_ARM64_VA_BITS_48=y
>>   CONFIG_SCHED_MC=y
>>   CONFIG_NUMA=y
> 
> Maybe building as a module is more appropriate here as I am not sure
> that anyone else will want this built-in and it is not critical to
> booting AFAIK.
Since the DesignWare core framework doesn't yet have support for making this
as a module, I added it as a built-in driver. I'll switch it to a module
once support is available.

> 
> Cheers
> Jon
>
Jon Hunter March 27, 2019, 12:26 p.m. UTC | #4
On 27/03/2019 10:12, Vidya Sagar wrote:
> On 3/27/2019 3:38 PM, Jon Hunter wrote:
>>
>> On 26/03/2019 15:13, Vidya Sagar wrote:
>>> Add PCIe host controller driver for DesignWare core based
>>> PCIe controller IP present in Tegra194.
>>>
>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>> ---
>>>   arch/arm64/configs/defconfig | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
>>> index 2d9c39033c1a..2ddea5c4e87d 100644
>>> --- a/arch/arm64/configs/defconfig
>>> +++ b/arch/arm64/configs/defconfig
>>> @@ -87,6 +87,7 @@ CONFIG_PCIE_QCOM=y
>>>   CONFIG_PCIE_ARMADA_8K=y
>>>   CONFIG_PCIE_KIRIN=y
>>>   CONFIG_PCIE_HISI_STB=y
>>> +CONFIG_PCIE_TEGRA194=y
>>>   CONFIG_ARM64_VA_BITS_48=y
>>>   CONFIG_SCHED_MC=y
>>>   CONFIG_NUMA=y
>>
>> Maybe building as a module is more appropriate here as I am not sure
>> that anyone else will want this built-in and it is not critical to
>> booting AFAIK.
> Since the DesignWare core framework doesn't yet have support for making
> this
> as a module, I added it as a built-in driver. I'll switch it to a module
> once support is available.

Ah I see. We often get reports/patches if a driver has a removal
function but is defined in the Kconfig to only support being built-in.

Does the designware core just need to export some symbols to support it
being a module? If so it maybe worth adding these as part of the series
to see if it is acceptable, otherwise it might not get done and there is
no point us supporting it as a module.

Cheers
Jon
Thierry Reding March 28, 2019, 12:33 p.m. UTC | #5
On Tue, Mar 26, 2019 at 08:43:20PM +0530, Vidya Sagar wrote:
> move PCIe config space capability search API to common designware file
> as this can be used by both host and ep mode codes.
> It also adds extended capability search APIs.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  drivers/pci/controller/dwc/pcie-designware-ep.c | 37 +------------
>  drivers/pci/controller/dwc/pcie-designware.c    | 73 +++++++++++++++++++++++++
>  drivers/pci/controller/dwc/pcie-designware.h    |  3 +
>  3 files changed, 78 insertions(+), 35 deletions(-)

Just out of curiosity: is there any reason why this driver needs to
reimplement this? Couldn't this be made to work using the standard
pci_find_next_capability() function?

Other than that it might be a good idea to split this into two patches,
one that moves the existing functionality to the common code and another
that adds the extra functionality.

Thierry
Thierry Reding March 28, 2019, 4:59 p.m. UTC | #6
On Tue, Mar 26, 2019 at 08:43:23PM +0530, Vidya Sagar wrote:
> Add P2U (PIPE to UPHY) and PCIe controller nodes to device tree.
> The Tegra194 SoC contains six PCIe controllers and twenty P2U instances
> grouped into two different PHY bricks namely High-Speed IO (HSIO-12 P2Us)
> and NVIDIA High Speed (NVHS-8 P2Us) respectively.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  arch/arm64/boot/dts/nvidia/tegra194.dtsi | 473 +++++++++++++++++++++++++++++++
>  1 file changed, 473 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
> index c77ca211fa8f..266a3058fa66 100644
> --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
> +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
> @@ -1054,4 +1054,477 @@
>  				(GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>;
>  		interrupt-parent = <&gic>;
>  	};
> +
> +	hsio-p2u {
> +		compatible = "simple-bus";
> +		#address-cells = <2>;
> +		#size-cells = <2>;
> +		ranges;

It doesn't look to me like there's really a bus here. Perhaps just leave
out that top-level hsio-p2u node? If you only want to somehow organize
these, perhaps a better way would be to add a comment.

Or: the address map lists something called PIPE2UPHY_XBAR at addresses
0x03e00000-0x03e0ffff. Perhaps that really ought to be the "bus" in this
case?

Also, please leave a blank linke between the properties and the nodes
for readability.

> +		p2u_0: p2u@03e10000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e10000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_1: p2u@03e20000 {

Please leave blank lines between nodes for readability.

> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e20000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_2: p2u@03e30000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e30000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_3: p2u@03e40000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e40000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_4: p2u@03e50000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e50000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_5: p2u@03e60000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e60000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_6: p2u@03e70000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e70000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_7: p2u@03e80000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e80000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_8: p2u@03e90000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03e90000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_9: p2u@03ea0000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03ea0000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_10: p2u@03f30000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03f30000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_11: p2u@03f40000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03f40000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +	};
> +
> +	nvhs-p2u {
> +		compatible = "simple-bus";
> +		#address-cells = <2>;
> +		#size-cells = <2>;
> +		ranges;
> +		p2u_12: p2u@03eb0000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03eb0000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_13: p2u@03ec0000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03ec0000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_14: p2u@03ed0000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03ed0000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_15: p2u@03ee0000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03ee0000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_16: p2u@03ef0000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03ef0000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_17: p2u@03f00000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03f00000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_18: p2u@03f10000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03f10000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +		p2u_19: p2u@03f20000 {
> +			compatible = "nvidia,tegra194-phy-p2u";
> +			reg = <0x0 0x03f20000 0x0 0x00010000>;
> +			reg-names = "base";
> +
> +			#phy-cells = <0>;
> +		};
> +	};

It's not clear to me why NVHS PHYs are listed as a separate bus. Also,
these really should be sorted by unit-address. If that means that HSIO
and NVHS PHYs are mixed, so be it. We can use comments to highlight
which PHYs are of which type. Or perhaps we really should be using
different compatible strings for them?

Same comments on the below as for the bindings earlier.

Thierry

> +
> +	pcie@14180000 {
> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX8B>;
> +		reg = <0x00 0x14180000 0x0 0x00020000   /* appl registers (128K)      */
> +		       0x00 0x38000000 0x0 0x02000000   /* window1 (32M)              */
> +		       0x00 0x38000000 0x0 0x00040000   /* configuration space (256K) */
> +		       0x00 0x38040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
> +		       0x00 0x38080000 0x0 0x00040000   /* DBI reg space (256K)       */
> +		       0x18 0x00000000 0x4 0x00000000>; /* window2 (16G)              */
> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
> +
> +		status = "disabled";
> +
> +		#address-cells = <3>;
> +		#size-cells = <2>;
> +		device_type = "pci";
> +		num-lanes = <8>;
> +		num-viewport = <8>;
> +		linux,pci-domain = <0>;
> +
> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_0>;
> +		clock-names = "core_clk";
> +
> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_0_APB>,
> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_0>;
> +		reset-names = "core_apb_rst", "core_rst";
> +
> +		interrupts = <GIC_SPI 72 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
> +			     <GIC_SPI 73 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
> +		interrupt-names = "intr", "msi";
> +
> +		#interrupt-cells = <1>;
> +		interrupt-map-mask = <0 0 0 0>;
> +		interrupt-map = <0 0 0 0 &gic 0 72 0x04>;
> +
> +		nvidia,bpmp = <&bpmp>;
> +
> +		nvidia,max-speed = <4>;
> +		nvidia,disable-aspm-states = <0xf>;
> +		nvidia,controller-id = <&bpmp 0x0>;
> +		nvidia,aspm-cmrt = <0x3C>;
> +		nvidia,aspm-pwr-on-t = <0x14>;
> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
> +
> +		bus-range = <0x0 0xff>;
> +		ranges = <0x81000000 0x0 0x38100000 0x0 0x38100000 0x0 0x00100000    /* downstream I/O (1MB) */
> +			  0xc2000000 0x18 0x00000000 0x18 0x00000000 0x3 0x40000000  /* prefetchable memory (13GB) */
> +			  0x82000000 0x0 0x40000000 0x1B 0x40000000 0x0 0xC0000000>; /* non-prefetchable memory (3GB) */
> +
> +		nvidia,event-cntr-ctrl = <0x1d8>;
> +		nvidia,event-cntr-data = <0x1dc>;
> +	};
> +
> +	pcie@14100000 {
> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX1A>;
> +		reg = <0x00 0x14100000 0x0 0x00020000   /* appl registers (128K)      */
> +		       0x00 0x30000000 0x0 0x02000000   /* window1 (32M)              */
> +		       0x00 0x30000000 0x0 0x00040000   /* configuration space (256K) */
> +		       0x00 0x30040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
> +		       0x00 0x30080000 0x0 0x00040000   /* DBI reg space (256K)       */
> +		       0x12 0x00000000 0x0 0x40000000>; /* window2 (1G)               */
> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
> +
> +		status = "disabled";
> +
> +		#address-cells = <3>;
> +		#size-cells = <2>;
> +		device_type = "pci";
> +		num-lanes = <1>;
> +		num-viewport = <8>;
> +		linux,pci-domain = <1>;
> +
> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_1>;
> +		clock-names = "core_clk";
> +
> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_1_APB>,
> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_1>;
> +		reset-names = "core_apb_rst", "core_rst";
> +
> +		interrupts = <GIC_SPI 45 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
> +			     <GIC_SPI 46 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
> +		interrupt-names = "intr", "msi";
> +
> +		#interrupt-cells = <1>;
> +		interrupt-map-mask = <0 0 0 0>;
> +		interrupt-map = <0 0 0 0 &gic 0 45 0x04>;
> +
> +		nvidia,bpmp = <&bpmp>;
> +
> +		nvidia,max-speed = <4>;
> +		nvidia,disable-aspm-states = <0xf>;
> +		nvidia,controller-id = <&bpmp 0x1>;
> +		nvidia,aspm-cmrt = <0x3C>;
> +		nvidia,aspm-pwr-on-t = <0x14>;
> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
> +
> +		bus-range = <0x0 0xff>;
> +		ranges = <0x81000000 0x0 0x30100000 0x0 0x30100000 0x0 0x00100000    /* downstream I/O (1MB) */
> +			  0xc2000000 0x12 0x00000000 0x12 0x00000000 0x0 0x30000000  /* prefetchable memory (768MB) */
> +			  0x82000000 0x0 0x40000000 0x12 0x30000000 0x0 0x10000000>; /* non-prefetchable memory (256MB) */
> +
> +		nvidia,event-cntr-ctrl = <0x1a8>;
> +		nvidia,event-cntr-data = <0x1ac>;
> +	};
> +
> +	pcie@14120000 {
> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX1A>;
> +		reg = <0x00 0x14120000 0x0 0x00020000   /* appl registers (128K)      */
> +		       0x00 0x32000000 0x0 0x02000000   /* window1 (32M)              */
> +		       0x00 0x32000000 0x0 0x00040000   /* configuration space (256K) */
> +		       0x00 0x32040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
> +		       0x00 0x32080000 0x0 0x00040000   /* DBI reg space (256K)       */
> +		       0x12 0x40000000 0x0 0x40000000>; /* window2 (1G)               */
> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
> +
> +		status = "disabled";
> +
> +		#address-cells = <3>;
> +		#size-cells = <2>;
> +		device_type = "pci";
> +		num-lanes = <1>;
> +		num-viewport = <8>;
> +		linux,pci-domain = <2>;
> +
> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_2>;
> +		clock-names = "core_clk";
> +
> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_2_APB>,
> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_2>;
> +		reset-names = "core_apb_rst", "core_rst";
> +
> +		interrupts = <GIC_SPI 47 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
> +			     <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
> +		interrupt-names = "intr", "msi";
> +
> +		#interrupt-cells = <1>;
> +		interrupt-map-mask = <0 0 0 0>;
> +		interrupt-map = <0 0 0 0 &gic 0 47 0x04>;
> +
> +		nvidia,bpmp = <&bpmp>;
> +
> +		nvidia,max-speed = <4>;
> +		nvidia,disable-aspm-states = <0xf>;
> +		nvidia,controller-id = <&bpmp 0x2>;
> +		nvidia,aspm-cmrt = <0x3C>;
> +		nvidia,aspm-pwr-on-t = <0x14>;
> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
> +
> +		bus-range = <0x0 0xff>;
> +		ranges = <0x81000000 0x0 0x32100000 0x0 0x32100000 0x0 0x00100000    /* downstream I/O (1MB) */
> +			  0xc2000000 0x12 0x40000000 0x12 0x40000000 0x0 0x30000000  /* prefetchable memory (768MB) */
> +			  0x82000000 0x0 0x40000000 0x12 0x70000000 0x0 0x10000000>; /* non-prefetchable memory (256MB) */
> +
> +		nvidia,event-cntr-ctrl = <0x1a8>;
> +		nvidia,event-cntr-data = <0x1ac>;
> +	};
> +
> +	pcie@14140000 {
> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX1A>;
> +		reg = <0x00 0x14140000 0x0 0x00020000   /* appl registers (128K)      */
> +		       0x00 0x34000000 0x0 0x02000000   /* window1 (32M)              */
> +		       0x00 0x34000000 0x0 0x00040000   /* configuration space (256K) */
> +		       0x00 0x34040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
> +		       0x00 0x34080000 0x0 0x00040000   /* DBI reg space (256K)       */
> +		       0x12 0x80000000 0x0 0x40000000>; /* window2 (1G)               */
> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
> +
> +		status = "disabled";
> +
> +		#address-cells = <3>;
> +		#size-cells = <2>;
> +		device_type = "pci";
> +		num-lanes = <1>;
> +		num-viewport = <8>;
> +		linux,pci-domain = <3>;
> +
> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_3>;
> +		clock-names = "core_clk";
> +
> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_3_APB>,
> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_3>;
> +		reset-names = "core_apb_rst", "core_rst";
> +
> +		interrupts = <GIC_SPI 49 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
> +			     <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
> +		interrupt-names = "intr", "msi";
> +
> +		#interrupt-cells = <1>;
> +		interrupt-map-mask = <0 0 0 0>;
> +		interrupt-map = <0 0 0 0 &gic 0 49 0x04>;
> +
> +		nvidia,bpmp = <&bpmp>;
> +
> +		nvidia,max-speed = <4>;
> +		nvidia,disable-aspm-states = <0xf>;
> +		nvidia,controller-id = <&bpmp 0x3>;
> +		nvidia,aspm-cmrt = <0x3C>;
> +		nvidia,aspm-pwr-on-t = <0x14>;
> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
> +
> +		bus-range = <0x0 0xff>;
> +		ranges = <0x81000000 0x0 0x34100000 0x0 0x34100000 0x0 0x00100000    /* downstream I/O (1MB) */
> +			  0xc2000000 0x12 0x80000000 0x12 0x80000000 0x0 0x30000000  /* prefetchable memory (768MB) */
> +			  0x82000000 0x0 0x40000000 0x12 0xB0000000 0x0 0x10000000>; /* non-prefetchable memory (256MB) */
> +
> +		nvidia,event-cntr-ctrl = <0x1a8>;
> +		nvidia,event-cntr-data = <0x1ac>;
> +	};
> +
> +	pcie@14160000 {
> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX4A>;
> +		reg = <0x00 0x14160000 0x0 0x00020000   /* appl registers (128K)      */
> +		       0x00 0x36000000 0x0 0x02000000   /* window1 (32M)              */
> +		       0x00 0x36000000 0x0 0x00040000   /* configuration space (256K) */
> +		       0x00 0x36040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
> +		       0x00 0x36080000 0x0 0x00040000   /* DBI reg space (256K)       */
> +		       0x14 0x00000000 0x4 0x00000000>; /* window2 (16G)              */
> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
> +
> +		status = "disabled";
> +
> +		#address-cells = <3>;
> +		#size-cells = <2>;
> +		device_type = "pci";
> +		num-lanes = <4>;
> +		num-viewport = <8>;
> +		linux,pci-domain = <4>;
> +
> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_4>;
> +		clock-names = "core_clk";
> +
> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_4_APB>,
> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_4>;
> +		reset-names = "core_apb_rst", "core_rst";
> +
> +		interrupts = <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
> +			     <GIC_SPI 52 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
> +		interrupt-names = "intr", "msi";
> +
> +		#interrupt-cells = <1>;
> +		interrupt-map-mask = <0 0 0 0>;
> +		interrupt-map = <0 0 0 0 &gic 0 51 0x04>;
> +
> +		nvidia,bpmp = <&bpmp>;
> +
> +		nvidia,max-speed = <4>;
> +		nvidia,disable-aspm-states = <0xf>;
> +		nvidia,controller-id = <&bpmp 0x4>;
> +		nvidia,aspm-cmrt = <0x3C>;
> +		nvidia,aspm-pwr-on-t = <0x14>;
> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
> +
> +		bus-range = <0x0 0xff>;
> +		ranges = <0x81000000 0x0 0x36100000 0x0 0x36100000 0x0 0x00100000    /* downstream I/O (1MB) */
> +			  0xc2000000 0x14 0x00000000 0x14 0x00000000 0x3 0x40000000  /* prefetchable memory (13GB) */
> +			  0x82000000 0x0 0x40000000 0x17 0x40000000 0x0 0xC0000000>; /* non-prefetchable memory (3GB) */
> +
> +		nvidia,event-cntr-ctrl = <0x1c4>;
> +		nvidia,event-cntr-data = <0x1c8>;
> +	};
> +
> +	pcie@141a0000 {
> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX8A>;
> +		reg = <0x00 0x141a0000 0x0 0x00020000   /* appl registers (128K)      */
> +		       0x00 0x3a000000 0x0 0x02000000   /* window1 (32M)              */
> +		       0x00 0x3a000000 0x0 0x00040000   /* configuration space (256K) */
> +		       0x00 0x3a040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
> +		       0x00 0x3a080000 0x0 0x00040000   /* DBI reg space (256K)       */
> +		       0x1c 0x00000000 0x4 0x00000000>; /* window2 (16G)              */
> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
> +
> +		status = "disabled";
> +
> +		#address-cells = <3>;
> +		#size-cells = <2>;
> +		device_type = "pci";
> +		num-lanes = <8>;
> +		num-viewport = <8>;
> +		linux,pci-domain = <5>;
> +
> +		clocks = <&bpmp TEGRA194_CLK_PEX1_CORE_5>,
> +			<&bpmp TEGRA194_CLK_PEX1_CORE_5M>;
> +		clock-names = "core_clk", "core_clk_m";
> +
> +		resets = <&bpmp TEGRA194_RESET_PEX1_CORE_5_APB>,
> +			 <&bpmp TEGRA194_RESET_PEX1_CORE_5>;
> +		reset-names = "core_apb_rst", "core_rst";
> +
> +		interrupts = <GIC_SPI 53 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
> +			     <GIC_SPI 54 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
> +		interrupt-names = "intr", "msi";
> +
> +		nvidia,bpmp = <&bpmp>;
> +
> +		#interrupt-cells = <1>;
> +		interrupt-map-mask = <0 0 0 0>;
> +		interrupt-map = <0 0 0 0 &gic 0 53 0x04>;
> +
> +		nvidia,max-speed = <4>;
> +		nvidia,disable-aspm-states = <0xf>;
> +		nvidia,controller-id = <&bpmp 0x5>;
> +		nvidia,aspm-cmrt = <0x3C>;
> +		nvidia,aspm-pwr-on-t = <0x14>;
> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
> +
> +		bus-range = <0x0 0xff>;
> +		ranges = <0x81000000 0x0 0x3a100000 0x0 0x3a100000 0x0 0x00100000    /* downstream I/O (1MB) */
> +			  0xc2000000 0x1c 0x00000000 0x1c 0x00000000 0x3 0x40000000  /* prefetchable memory (13GB) */
> +			  0x82000000 0x0 0x40000000 0x1f 0x40000000 0x0 0xC0000000>; /* non-prefetchable memory (3GB) */
> +
> +		nvidia,event-cntr-ctrl = <0x1d8>;
> +		nvidia,event-cntr-data = <0x1dc>;
> +	};
>  };
> -- 
> 2.7.4
>
Bjorn Helgaas March 29, 2019, 8:52 p.m. UTC | #7
Hi Vidya,

Wow, there's a lot of nice work here!  Thanks for that!

On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
> Add support for Synopsys DesignWare core IP based PCIe host controller
> present in Tegra194 SoC.

General comments:

  - There are a few multi-line comments that don't match the
    prevailing style:

        /*
	 * Text...
	 */

  - Comments and dev_info()/dev_err() messages are inconsistent about
    starting with upper-case or lower-case letters.

  - Use "MSI", "IRQ", "PCIe", "CPU", etc in comments and messages.

  - There are a few functions that use "&pdev->dev" many times; can
    you add a "struct device *dev = &pdev->dev" to reduce the
    redundancy?

> +#include "../../pcie/portdrv.h"

What's this for?  I didn't see any obvious uses of things from
portdrv.h, but I didn't actually try to build without it.

> +struct tegra_pcie_dw {
> +	struct device		*dev;
> +	struct resource		*appl_res;
> +	struct resource		*dbi_res;
> +	struct resource		*atu_dma_res;
> +	void __iomem		*appl_base;
> +	struct clk		*core_clk;
> +	struct reset_control	*core_apb_rst;
> +	struct reset_control	*core_rst;
> +	struct dw_pcie		pci;
> +	enum dw_pcie_device_mode mode;
> +
> +	bool disable_clock_request;
> +	bool power_down_en;
> +	u8 init_link_width;
> +	bool link_state;
> +	u32 msi_ctrl_int;
> +	u32 num_lanes;
> +	u32 max_speed;
> +	u32 init_speed;
> +	bool cdm_check;
> +	u32 cid;
> +	int pex_wake;
> +	bool update_fc_fixup;
> +	int n_gpios;
> +	int *gpios;
> +#if defined(CONFIG_PCIEASPM)
> +	u32 cfg_link_cap_l1sub;
> +	u32 event_cntr_ctrl;
> +	u32 event_cntr_data;
> +	u32 aspm_cmrt;
> +	u32 aspm_pwr_on_t;
> +	u32 aspm_l0s_enter_lat;
> +	u32 disabled_aspm_states;
> +#endif

The above could be indented the same as the rest of the struct?

> +	struct regulator	*pex_ctl_reg;
> +
> +	int			phy_count;
> +	struct phy		**phy;
> +
> +	struct dentry		*debugfs;
> +};

> +static void apply_bad_link_workaround(struct pcie_port *pp)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
> +	u16 val;
> +
> +	/*
> +	 * NOTE:- Since this scenario is uncommon and link as
> +	 * such is not stable anyway, not waiting to confirm
> +	 * if link is really transiting to Gen-2 speed

s/transiting/transitioning/

I think there are other uses of "transit" when you mean "transition".

> +static int tegra_pcie_dw_rd_own_conf(struct pcie_port *pp, int where, int size,
> +				     u32 *val)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +
> +	/*
> +	 * This is an endpoint mode specific register happen to appear even
> +	 * when controller is operating in root port mode and system hangs
> +	 * when it is accessed with link being in ASPM-L1 state.
> +	 * So skip accessing it altogether
> +	 */
> +	if (where == PORT_LOGIC_MSIX_DOORBELL) {
> +		*val = 0x00000000;
> +		return PCIBIOS_SUCCESSFUL;
> +	} else {
> +		return dw_pcie_read(pci->dbi_base + where, size, val);
> +	}
> +}
> +
> +static int tegra_pcie_dw_wr_own_conf(struct pcie_port *pp, int where, int size,
> +				     u32 val)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +
> +	/* This is EP specific register and system hangs when it is
> +	 * accessed with link being in ASPM-L1 state.
> +	 * So skip accessing it altogether
> +	 */
> +	if (where == PORT_LOGIC_MSIX_DOORBELL)
> +		return PCIBIOS_SUCCESSFUL;
> +	else
> +		return dw_pcie_write(pci->dbi_base + where, size, val);

These two functions are almost identical and they could look more
similar.  This one has the wrong multi-line comment style, uses "EP"
instead of "endpoint", etc.  Use this style for the "if" since the
first case is really an error case:

  if (where == PORT_LOGIC_MSIX_DOORBELL) {
    ...
    return ...;
  }

  return dw_pcie_...();

> +static int tegra_pcie_dw_host_init(struct pcie_port *pp)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
> +	int count = 200;
> +	u32 val, tmp, offset;
> +	u16 val_w;
> +
> +#if defined(CONFIG_PCIEASPM)
> +	pcie->cfg_link_cap_l1sub =
> +		dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_ID_L1SS) +
> +		PCI_L1SS_CAP;
> +#endif
> +	val = dw_pcie_readl_dbi(pci, PCI_IO_BASE);
> +	val &= ~(IO_BASE_IO_DECODE | IO_BASE_IO_DECODE_BIT8);
> +	dw_pcie_writel_dbi(pci, PCI_IO_BASE, val);
> +
> +	val = dw_pcie_readl_dbi(pci, PCI_PREF_MEMORY_BASE);
> +	val |= CFG_PREF_MEM_LIMIT_BASE_MEM_DECODE;
> +	val |= CFG_PREF_MEM_LIMIT_BASE_MEM_LIMIT_DECODE;
> +	dw_pcie_writel_dbi(pci, PCI_PREF_MEMORY_BASE, val);
> +
> +	dw_pcie_writel_dbi(pci, PCI_BASE_ADDRESS_0, 0);
> +
> +	/* Configure FTS */
> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL);
> +	val &= ~(N_FTS_MASK << N_FTS_SHIFT);
> +	val |= N_FTS_VAL << N_FTS_SHIFT;
> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL, val);
> +
> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_GEN2_CTRL);
> +	val &= ~FTS_MASK;
> +	val |= FTS_VAL;
> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_GEN2_CTRL, val);
> +
> +	/* Enable as 0xFFFF0001 response for CRS */
> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_AMBA_ERROR_RESPONSE_DEFAULT);
> +	val &= ~(AMBA_ERROR_RESPONSE_CRS_MASK << AMBA_ERROR_RESPONSE_CRS_SHIFT);
> +	val |= (AMBA_ERROR_RESPONSE_CRS_OKAY_FFFF0001 <<
> +		AMBA_ERROR_RESPONSE_CRS_SHIFT);
> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_AMBA_ERROR_RESPONSE_DEFAULT, val);
> +
> +	/* Set MPS to 256 in DEV_CTL */
> +	val = dw_pcie_readl_dbi(pci, CFG_DEV_STATUS_CONTROL);
> +	val &= ~PCI_EXP_DEVCTL_PAYLOAD;
> +	val |= (1 << CFG_DEV_STATUS_CONTROL_MPS_SHIFT);
> +	dw_pcie_writel_dbi(pci, CFG_DEV_STATUS_CONTROL, val);
> +
> +	/* Configure Max Speed from DT */
> +	val = dw_pcie_readl_dbi(pci, CFG_LINK_CAP);
> +	val &= ~PCI_EXP_LNKCAP_SLS;
> +	val |= pcie->max_speed;
> +	dw_pcie_writel_dbi(pci, CFG_LINK_CAP, val);
> +
> +	val = dw_pcie_readw_dbi(pci, CFG_LINK_CONTROL_2);
> +	val &= ~PCI_EXP_LNKCTL2_TLS;
> +	val |= pcie->init_speed;
> +	dw_pcie_writew_dbi(pci, CFG_LINK_CONTROL_2, val);
> +
> +	/* Configure Max lane width from DT */
> +	val = dw_pcie_readl_dbi(pci, CFG_LINK_CAP);
> +	val &= ~PCI_EXP_LNKCAP_MLW;
> +	val |= (pcie->num_lanes << PCI_EXP_LNKSTA_NLW_SHIFT);
> +	dw_pcie_writel_dbi(pci, CFG_LINK_CAP, val);
> +
> +	config_gen3_gen4_eq_presets(pcie);
> +
> +#if defined(CONFIG_PCIEASPM)
> +	/* Enable ASPM counters */
> +	val = EVENT_COUNTER_ENABLE_ALL << EVENT_COUNTER_ENABLE_SHIFT;
> +	val |= EVENT_COUNTER_GROUP_5 << EVENT_COUNTER_GROUP_SEL_SHIFT;
> +	dw_pcie_writel_dbi(pci, pcie->event_cntr_ctrl, val);
> +
> +	/* Program T_cmrt and T_pwr_on values */
> +	val = dw_pcie_readl_dbi(pci, pcie->cfg_link_cap_l1sub);
> +	val &= ~(PCI_L1SS_CAP_CM_RESTORE_TIME | PCI_L1SS_CAP_P_PWR_ON_VALUE);
> +	val |= (pcie->aspm_cmrt << PCI_L1SS_CAP_CM_RTM_SHIFT);
> +	val |= (pcie->aspm_pwr_on_t << PCI_L1SS_CAP_PWRN_VAL_SHIFT);
> +	dw_pcie_writel_dbi(pci, pcie->cfg_link_cap_l1sub, val);
> +
> +	/* Program L0s and L1 entrance latencies */
> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL);
> +	val &= ~L0S_ENTRANCE_LAT_MASK;
> +	val |= (pcie->aspm_l0s_enter_lat << L0S_ENTRANCE_LAT_SHIFT);
> +	val |= ENTER_ASPM;
> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL, val);
> +
> +	/* Program what ASPM states sould get advertised */

s/sould/should/

> +	if (pcie->disabled_aspm_states & 0x1)
> +		disable_aspm_l0s(pcie); /* Disable L0s */
> +	if (pcie->disabled_aspm_states & 0x2) {
> +		disable_aspm_l10(pcie); /* Disable L1 */
> +		disable_aspm_l11(pcie); /* Disable L1.1 */
> +		disable_aspm_l12(pcie); /* Disable L1.2 */
> +	}
> +	if (pcie->disabled_aspm_states & 0x4)
> +		disable_aspm_l11(pcie); /* Disable L1.1 */
> +	if (pcie->disabled_aspm_states & 0x8)
> +		disable_aspm_l12(pcie); /* Disable L1.2 */
> +#endif
> +	val = dw_pcie_readl_dbi(pci, GEN3_RELATED_OFF);
> +	val &= ~GEN3_RELATED_OFF_GEN3_ZRXDC_NONCOMPL;
> +	dw_pcie_writel_dbi(pci, GEN3_RELATED_OFF, val);
> +
> +	if (pcie->update_fc_fixup) {
> +		val = dw_pcie_readl_dbi(pci, CFG_TIMER_CTRL_MAX_FUNC_NUM_OFF);
> +		val |= 0x1 << CFG_TIMER_CTRL_ACK_NAK_SHIFT;
> +		dw_pcie_writel_dbi(pci, CFG_TIMER_CTRL_MAX_FUNC_NUM_OFF, val);
> +	}
> +
> +	/* CDM check enable */
> +	if (pcie->cdm_check) {
> +		val = dw_pcie_readl_dbi(pci,
> +					PORT_LOGIC_PL_CHK_REG_CONTROL_STATUS);
> +		val |= PORT_LOGIC_PL_CHK_REG_CHK_REG_CONTINUOUS;
> +		val |= PORT_LOGIC_PL_CHK_REG_CHK_REG_START;
> +		dw_pcie_writel_dbi(pci, PORT_LOGIC_PL_CHK_REG_CONTROL_STATUS,
> +				   val);
> +	}
> +
> +	dw_pcie_setup_rc(pp);
> +
> +	clk_set_rate(pcie->core_clk, GEN4_CORE_CLK_FREQ);
> +
> +	/* assert RST */
> +	val = readl(pcie->appl_base + APPL_PINMUX);
> +	val &= ~APPL_PINMUX_PEX_RST;
> +	writel(val, pcie->appl_base + APPL_PINMUX);
> +
> +	usleep_range(100, 200);
> +
> +	/* enable LTSSM */
> +	val = readl(pcie->appl_base + APPL_CTRL);
> +	val |= APPL_CTRL_LTSSM_EN;
> +	writel(val, pcie->appl_base + APPL_CTRL);
> +
> +	/* de-assert RST */
> +	val = readl(pcie->appl_base + APPL_PINMUX);
> +	val |= APPL_PINMUX_PEX_RST;
> +	writel(val, pcie->appl_base + APPL_PINMUX);
> +
> +	msleep(100);
> +
> +	val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
> +	while (!(val_w & PCI_EXP_LNKSTA_DLLLA)) {
> +		if (!count) {
> +			val = readl(pcie->appl_base + APPL_DEBUG);
> +			val &= APPL_DEBUG_LTSSM_STATE_MASK;
> +			val >>= APPL_DEBUG_LTSSM_STATE_SHIFT;
> +			tmp = readl(pcie->appl_base + APPL_LINK_STATUS);
> +			tmp &= APPL_LINK_STATUS_RDLH_LINK_UP;
> +			if (val == 0x11 && !tmp) {
> +				dev_info(pci->dev, "link is down in DLL");
> +				dev_info(pci->dev,
> +					 "trying again with DLFE disabled\n");
> +				/* disable LTSSM */
> +				val = readl(pcie->appl_base + APPL_CTRL);
> +				val &= ~APPL_CTRL_LTSSM_EN;
> +				writel(val, pcie->appl_base + APPL_CTRL);
> +
> +				reset_control_assert(pcie->core_rst);
> +				reset_control_deassert(pcie->core_rst);
> +
> +				offset =
> +				dw_pcie_find_ext_capability(pci,
> +							    PCI_EXT_CAP_ID_DLF)
> +				+ PCI_DLF_CAP;

This capability offset doesn't change, does it?  Could it be computed
outside the loop?

> +				val = dw_pcie_readl_dbi(pci, offset);
> +				val &= ~DL_FEATURE_EXCHANGE_EN;
> +				dw_pcie_writel_dbi(pci, offset, val);
> +
> +				tegra_pcie_dw_host_init(&pcie->pci.pp);

This looks like some sort of "wait for link up" retry loop, but a
recursive call seems a little unusual.  My 5 second analysis is that
the loop could run this 200 times, and you sure don't want the
possibility of a 200-deep call chain.  Is there way to split out the
host init from the link-up polling?

> +				return 0;
> +			}
> +			dev_info(pci->dev, "link is down\n");
> +			return 0;
> +		}
> +		dev_dbg(pci->dev, "polling for link up\n");
> +		usleep_range(1000, 2000);
> +		val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
> +		count--;
> +	}
> +	dev_info(pci->dev, "link is up\n");
> +
> +	tegra_pcie_enable_interrupts(pp);
> +
> +	return 0;
> +}

> +static void tegra_pcie_dw_scan_bus(struct pcie_port *pp)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
> +	u32 speed;
> +
> +	if (!tegra_pcie_dw_link_up(pci))
> +		return;
> +
> +	speed = (dw_pcie_readw_dbi(pci, CFG_LINK_STATUS) & PCI_EXP_LNKSTA_CLS);
> +	clk_set_rate(pcie->core_clk, pcie_gen_freq[speed - 1]);

I don't understand what's happening here.  This is named
tegra_pcie_dw_scan_bus(), but it doesn't actually scan anything.
Maybe it's just a bad name for the dw_pcie_host_ops hook
(ks_pcie_v3_65_scan_bus() is the only other .scan_bus()
implementation, and it doesn't scan anything either).

dw_pcie_host_init() calls pci_scan_root_bus_bridge(), which actually
*does* scan the bus, but it does it before calling
pp->ops->scan_bus().  I'd say by the time we get to
pci_scan_root_bus_bridge(), the device-specific init should be all
done and we should be using only generic PCI core interfaces.

Maybe this stuff could/should be done in the ->host_init() hook?  The
code between ->host_init() and ->scan_bus() is all generic code with
no device-specific stuff, so I don't know why we need both hooks.

> +static int tegra_pcie_enable_phy(struct tegra_pcie_dw *pcie)
> +{
> +	int phy_count = pcie->phy_count;
> +	int ret;
> +	int i;
> +
> +	for (i = 0; i < phy_count; i++) {
> +		ret = phy_init(pcie->phy[i]);
> +		if (ret < 0)
> +			goto err_phy_init;
> +
> +		ret = phy_power_on(pcie->phy[i]);
> +		if (ret < 0) {
> +			phy_exit(pcie->phy[i]);
> +			goto err_phy_power_on;
> +		}
> +	}
> +
> +	return 0;
> +
> +	while (i >= 0) {
> +		phy_power_off(pcie->phy[i]);
> +err_phy_power_on:
> +		phy_exit(pcie->phy[i]);
> +err_phy_init:
> +		i--;
> +	}

Wow, jumping into the middle of that loop is clever ;)  Can't decide
what I think of it, but it's certainly clever!

> +	return ret;
> +}
> +
> +static int tegra_pcie_dw_parse_dt(struct tegra_pcie_dw *pcie)
> +{
> +	struct device_node *np = pcie->dev->of_node;
> +	int ret;
> +
> +#if defined(CONFIG_PCIEASPM)
> +	ret = of_property_read_u32(np, "nvidia,event-cntr-ctrl",
> +				   &pcie->event_cntr_ctrl);
> +	if (ret < 0) {
> +		dev_err(pcie->dev, "fail to read event-cntr-ctrl: %d\n", ret);
> +		return ret;
> +	}

The fact that you return error here if DT lacks the
"nvidia,event-cntr-ctrl" property, but only if CONFIG_PCIEASPM=y,
means that you have a revlock between the DT and the kernel: if you
update the kernel to enable CONFIG_PCIEASPM, you may also have to
update your DT.

Maybe that's OK, but I think it'd be nicer if you always required the
presence of these properties, even if you only actually *use* them
when CONFIG_PCIEASPM=y.

> +static int tegra_pcie_dw_probe(struct platform_device *pdev)
> +{
> +	struct tegra_pcie_dw *pcie;
> +	struct pcie_port *pp;
> +	struct dw_pcie *pci;
> +	struct phy **phy;
> +	struct resource	*dbi_res;
> +	struct resource	*atu_dma_res;
> +	const struct of_device_id *match;
> +	const struct tegra_pcie_of_data *data;
> +	char *name;
> +	int ret, i;
> +
> +	pcie = devm_kzalloc(&pdev->dev, sizeof(*pcie), GFP_KERNEL);
> +	if (!pcie)
> +		return -ENOMEM;
> +
> +	pci = &pcie->pci;
> +	pci->dev = &pdev->dev;
> +	pci->ops = &tegra_dw_pcie_ops;
> +	pp = &pci->pp;
> +	pcie->dev = &pdev->dev;
> +
> +	match = of_match_device(of_match_ptr(tegra_pcie_dw_of_match),
> +				&pdev->dev);
> +	if (!match)
> +		return -EINVAL;

Logically could be the first thing in the function since it doesn't
depend on anything.

> +	data = (struct tegra_pcie_of_data *)match->data;
> +	pcie->mode = (enum dw_pcie_device_mode)data->mode;
> +
> +	ret = tegra_pcie_dw_parse_dt(pcie);
> +	if (ret < 0) {
> +		dev_err(pcie->dev, "device tree parsing failed: %d\n", ret);
> +		return ret;
> +	}
> +
> +	if (gpio_is_valid(pcie->pex_wake)) {
> +		ret = devm_gpio_request(pcie->dev, pcie->pex_wake,
> +					"pcie_wake");
> +		if (ret < 0) {
> +			if (ret == -EBUSY) {
> +				dev_err(pcie->dev,
> +					"pex_wake already in use\n");
> +				pcie->pex_wake = -EINVAL;

This looks strange.  "pex_wake == -EINVAL" doesn't look right, and
you're about to pass it to gpio_direction_input(), which looks wrong.

> +			} else {
> +				dev_err(pcie->dev,
> +					"pcie_wake gpio_request failed %d\n",
> +					ret);
> +				return ret;
> +			}
> +		}
> +
> +		ret = gpio_direction_input(pcie->pex_wake);
> +		if (ret < 0) {
> +			dev_err(pcie->dev,
> +				"setting pcie_wake input direction failed %d\n",
> +				ret);
> +			return ret;
> +		}
> +		device_init_wakeup(pcie->dev, true);
> +	}
> +
> +	pcie->pex_ctl_reg = devm_regulator_get(&pdev->dev, "vddio-pex-ctl");
> +	if (IS_ERR(pcie->pex_ctl_reg)) {
> +		dev_err(&pdev->dev, "fail to get regulator: %ld\n",
> +			PTR_ERR(pcie->pex_ctl_reg));
> +		return PTR_ERR(pcie->pex_ctl_reg);
> +	}
> +
> +	pcie->core_clk = devm_clk_get(&pdev->dev, "core_clk");
> +	if (IS_ERR(pcie->core_clk)) {
> +		dev_err(&pdev->dev, "Failed to get core clock\n");
> +		return PTR_ERR(pcie->core_clk);
> +	}
> +
> +	pcie->appl_res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> +						      "appl");
> +	if (!pcie->appl_res) {
> +		dev_err(&pdev->dev, "missing appl space\n");
> +		return PTR_ERR(pcie->appl_res);
> +	}
> +	pcie->appl_base = devm_ioremap_resource(&pdev->dev, pcie->appl_res);
> +	if (IS_ERR(pcie->appl_base)) {
> +		dev_err(&pdev->dev, "mapping appl space failed\n");
> +		return PTR_ERR(pcie->appl_base);
> +	}
> +
> +	pcie->core_apb_rst = devm_reset_control_get(pcie->dev, "core_apb_rst");
> +	if (IS_ERR(pcie->core_apb_rst)) {
> +		dev_err(pcie->dev, "PCIE : core_apb_rst reset is missing\n");

This error message looks different from the others ("PCIE :" prefix).

> +		return PTR_ERR(pcie->core_apb_rst);
> +	}
> +
> +	phy = devm_kcalloc(pcie->dev, pcie->phy_count, sizeof(*phy),
> +			   GFP_KERNEL);
> +	if (!phy)
> +		return PTR_ERR(phy);
> +
> +	for (i = 0; i < pcie->phy_count; i++) {
> +		name = kasprintf(GFP_KERNEL, "pcie-p2u-%u", i);
> +		if (!name) {
> +			dev_err(pcie->dev, "failed to create p2u string\n");
> +			return -ENOMEM;
> +		}
> +		phy[i] = devm_phy_get(pcie->dev, name);
> +		kfree(name);
> +		if (IS_ERR(phy[i])) {
> +			ret = PTR_ERR(phy[i]);
> +			dev_err(pcie->dev, "phy_get error: %d\n", ret);
> +			return ret;
> +		}
> +	}
> +
> +	pcie->phy = phy;
> +
> +	dbi_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "dbi");
> +	if (!dbi_res) {
> +		dev_err(&pdev->dev, "missing config space\n");
> +		return PTR_ERR(dbi_res);
> +	}
> +	pcie->dbi_res = dbi_res;
> +
> +	pci->dbi_base = devm_ioremap_resource(&pdev->dev, dbi_res);
> +	if (IS_ERR(pci->dbi_base)) {
> +		dev_err(&pdev->dev, "mapping dbi space failed\n");
> +		return PTR_ERR(pci->dbi_base);
> +	}
> +
> +	/* Tegra HW locates DBI2 at a fixed offset from DBI */
> +	pci->dbi_base2 = pci->dbi_base + 0x1000;
> +
> +	atu_dma_res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> +						   "atu_dma");
> +	if (!atu_dma_res) {
> +		dev_err(&pdev->dev, "missing atu_dma space\n");
> +		return PTR_ERR(atu_dma_res);
> +	}
> +	pcie->atu_dma_res = atu_dma_res;
> +	pci->atu_base = devm_ioremap_resource(&pdev->dev, atu_dma_res);
> +	if (IS_ERR(pci->atu_base)) {
> +		dev_err(&pdev->dev, "mapping atu space failed\n");
> +		return PTR_ERR(pci->atu_base);
> +	}
> +
> +	pcie->core_rst = devm_reset_control_get(pcie->dev, "core_rst");
> +	if (IS_ERR(pcie->core_rst)) {
> +		dev_err(pcie->dev, "PCIE : core_rst reset is missing\n");

Different message format again.

> +		return PTR_ERR(pcie->core_rst);
> +	}
> +
> +	pp->irq = platform_get_irq_byname(pdev, "intr");
> +	if (!pp->irq) {
> +		dev_err(pcie->dev, "failed to get intr interrupt\n");
> +		return -ENODEV;
> +	}
> +
> +	ret = devm_request_irq(&pdev->dev, pp->irq, tegra_pcie_irq_handler,
> +			       IRQF_SHARED, "tegra-pcie-intr", pcie);
> +	if (ret) {
> +		dev_err(pcie->dev, "failed to request \"intr\" irq\n");

It'd be nice to include the actual IRQ, i.e., "IRQ %d", pp->irq.

> +		return ret;
> +	}
> +
> +	platform_set_drvdata(pdev, pcie);
> +
> +	if (pcie->mode == DW_PCIE_RC_TYPE) {
> +		ret = tegra_pcie_config_rp(pcie);
> +		if (ret == -ENOMEDIUM)
> +			ret = 0;
> +	}
> +
> +	return ret;
> +}

> +static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
> +{
> +	struct pci_dev *pdev = NULL;

Unnecessary initialization.

> +	struct pci_bus *child;
> +	struct pcie_port *pp = &pcie->pci.pp;
> +
> +	list_for_each_entry(child, &pp->bus->children, node) {
> +		/* Bring downstream devices to D0 if they are not already in */
> +		if (child->parent == pp->bus) {
> +			pdev = pci_get_slot(child, PCI_DEVFN(0, 0));
> +			pci_dev_put(pdev);
> +			if (!pdev)
> +				break;

I don't really like this dance with iterating over the bus children,
comparing parent to pp->bus, pci_get_slot(), pci_dev_put(), etc.

I guess the idea is to bring only the directly-downstream devices to
D0, not to do it for things deeper in the hierarchy?

Is this some Tegra-specific wrinkle?  I don't think other drivers do
this.

I see that an earlier patch added "bus" to struct pcie_port.  I think
it would be better to somehow connect to the pci_host_bridge struct.
Several other drivers already do this; see uses of
pci_host_bridge_from_priv().

That would give you the bus, as well as flags like no_ext_tags,
native_aer, etc, which this driver, being a host bridge driver that's
responsible for this part of the firmware/OS interface, may
conceivably need.

Rather than pci_get_slot(), couldn't you iterate over bus->devices and
just skip the non-PCI_DEVFN(0, 0) devices?

> +
> +			if (pci_set_power_state(pdev, PCI_D0))
> +				dev_err(pcie->dev, "D0 transition failed\n");
> +		}
> +	}
> +}

> +static int tegra_pcie_dw_remove(struct platform_device *pdev)
> +{
> +	struct tegra_pcie_dw *pcie = platform_get_drvdata(pdev);
> +
> +	if (pcie->mode == DW_PCIE_RC_TYPE) {

Return early if it's not RC and unindent the rest of the function.

> +		if (!pcie->link_state && pcie->power_down_en)
> +			return 0;
> +
> +		debugfs_remove_recursive(pcie->debugfs);
> +		pm_runtime_put_sync(pcie->dev);
> +		pm_runtime_disable(pcie->dev);
> +	}
> +
> +	return 0;
> +}

> +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
> +{
> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> +
> +	tegra_pcie_downstream_dev_to_D0(pcie);
> +
> +	pci_stop_root_bus(pcie->pci.pp.bus);
> +	pci_remove_root_bus(pcie->pci.pp.bus);

Why are you calling these?  No other drivers do this except in their
.remove() methods.  Is there something special about Tegra, or is this
something the other drivers *should* be doing?

> +	tegra_pcie_dw_pme_turnoff(pcie);
> +
> +	reset_control_assert(pcie->core_rst);
> +	tegra_pcie_disable_phy(pcie);
> +	reset_control_assert(pcie->core_apb_rst);
> +	clk_disable_unprepare(pcie->core_clk);
> +	regulator_disable(pcie->pex_ctl_reg);
> +	config_plat_gpio(pcie, 0);
> +
> +	if (pcie->cid != CTRL_5)
> +		tegra_pcie_bpmp_set_ctrl_state(pcie, false);
> +
> +	return 0;
> +}
> +
> +static int tegra_pcie_dw_runtime_resume(struct device *dev)
> +{
> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> +	struct dw_pcie *pci = &pcie->pci;
> +	struct pcie_port *pp = &pci->pp;
> +	int ret = 0;
> +
> +	ret = tegra_pcie_config_controller(pcie, false);
> +	if (ret < 0)
> +		return ret;
> +
> +	/* program to use MPS of 256 whereever possible */

s/whereever/wherever/

> +	pcie_bus_config = PCIE_BUS_SAFE;
> +
> +	pp->root_bus_nr = -1;
> +	pp->ops = &tegra_pcie_dw_host_ops;
> +
> +	/* Disable MSI interrupts for PME messages */

Superfluous comment; it repeats the function name.

> +static int tegra_pcie_dw_suspend_noirq(struct device *dev)
> +{
> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> +	int ret = 0;
> +
> +	if (!pcie->link_state)
> +		return 0;
> +
> +	/* save MSI interrutp vector*/

s/interrutp/interrupt/
s/vector/vector /

> +static int tegra_pcie_dw_resume_noirq(struct device *dev)
> +{
> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> +	int ret;
> +
> +	if (!pcie->link_state)
> +		return 0;
> +
> +	if (gpio_is_valid(pcie->pex_wake) && device_may_wakeup(dev)) {
> +		ret = disable_irq_wake(gpio_to_irq(pcie->pex_wake));
> +		if (ret < 0)
> +			dev_err(dev, "disable wake irq failed: %d\n", ret);
> +	}
> +
> +	ret = tegra_pcie_config_controller(pcie, true);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = tegra_pcie_dw_host_init(&pcie->pci.pp);
> +	if (ret < 0) {
> +		dev_err(dev, "failed to init host: %d\n", ret);
> +		goto fail_host_init;
> +	}
> +
> +	/* restore MSI interrutp vector*/

s/interrutp/interrupt/
s/vector/vector /

> +static void tegra_pcie_dw_shutdown(struct platform_device *pdev)
> +{
> +	struct tegra_pcie_dw *pcie = platform_get_drvdata(pdev);
> +
> +	if (pcie->mode == DW_PCIE_RC_TYPE) {

  if (pcie->mode != DW_PCIE_RC_TYPE)
    return;

Then you can unindent the whole function.

> +		if (!pcie->link_state && pcie->power_down_en)
> +			return;
> +
> +		debugfs_remove_recursive(pcie->debugfs);
> +		tegra_pcie_downstream_dev_to_D0(pcie);
> +
> +		/* Disable interrupts */

Superfluous comment.

> +		disable_irq(pcie->pci.pp.irq);
Vidya Sagar April 1, 2019, 11:46 a.m. UTC | #8
On 3/28/2019 6:03 PM, Thierry Reding wrote:
> On Tue, Mar 26, 2019 at 08:43:20PM +0530, Vidya Sagar wrote:
>> move PCIe config space capability search API to common designware file
>> as this can be used by both host and ep mode codes.
>> It also adds extended capability search APIs.
>>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>>   drivers/pci/controller/dwc/pcie-designware-ep.c | 37 +------------
>>   drivers/pci/controller/dwc/pcie-designware.c    | 73 +++++++++++++++++++++++++
>>   drivers/pci/controller/dwc/pcie-designware.h    |  3 +
>>   3 files changed, 78 insertions(+), 35 deletions(-)
> 
> Just out of curiosity: is there any reason why this driver needs to
> reimplement this? Couldn't this be made to work using the standard
> pci_find_next_capability() function?
> 
> Other than that it might be a good idea to split this into two patches,
> one that moves the existing functionality to the common code and another
> that adds the extra functionality.
> 
> Thierry
> 
pci_find_next_capability() API expects struct pci_dev * pointer and this can only
be used after PCIe devices got enumerated. APIs added in this patch solves the issue
of getting capability offsets before PCIe enumeration. FWIW, APIs in this patch
take struct dw_pcie * pointer as input.
As you suggested, I'll split this into two patches in my next series.
Vidya Sagar April 1, 2019, 12:37 p.m. UTC | #9
On 3/28/2019 10:29 PM, Thierry Reding wrote:
> On Tue, Mar 26, 2019 at 08:43:23PM +0530, Vidya Sagar wrote:
>> Add P2U (PIPE to UPHY) and PCIe controller nodes to device tree.
>> The Tegra194 SoC contains six PCIe controllers and twenty P2U instances
>> grouped into two different PHY bricks namely High-Speed IO (HSIO-12 P2Us)
>> and NVIDIA High Speed (NVHS-8 P2Us) respectively.
>>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>>   arch/arm64/boot/dts/nvidia/tegra194.dtsi | 473 +++++++++++++++++++++++++++++++
>>   1 file changed, 473 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> index c77ca211fa8f..266a3058fa66 100644
>> --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> @@ -1054,4 +1054,477 @@
>>   				(GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>;
>>   		interrupt-parent = <&gic>;
>>   	};
>> +
>> +	hsio-p2u {
>> +		compatible = "simple-bus";
>> +		#address-cells = <2>;
>> +		#size-cells = <2>;
>> +		ranges;
> 
> It doesn't look to me like there's really a bus here. Perhaps just leave
> out that top-level hsio-p2u node? If you only want to somehow organize
> these, perhaps a better way would be to add a comment.
Yes. 'hsio-p2u' and 'nvhs-p2u' are not real buses as such. I'll drop both
hsio-p2u and nvhs-p2u nodes and give comments to indicate which P2U brick
a particular P2U block falls in.

> 
> Or: the address map lists something called PIPE2UPHY_XBAR at addresses
> 0x03e00000-0x03e0ffff. Perhaps that really ought to be the "bus" in this
> case?
Not really. That is different.

> 
> Also, please leave a blank linke between the properties and the nodes
> for readability.
> 
>> +		p2u_0: p2u@03e10000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e10000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_1: p2u@03e20000 {
> 
> Please leave blank lines between nodes for readability.
Done.

> 
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e20000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_2: p2u@03e30000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e30000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_3: p2u@03e40000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e40000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_4: p2u@03e50000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e50000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_5: p2u@03e60000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e60000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_6: p2u@03e70000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e70000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_7: p2u@03e80000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e80000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_8: p2u@03e90000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03e90000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_9: p2u@03ea0000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03ea0000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_10: p2u@03f30000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03f30000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_11: p2u@03f40000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03f40000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +	};
>> +
>> +	nvhs-p2u {
>> +		compatible = "simple-bus";
>> +		#address-cells = <2>;
>> +		#size-cells = <2>;
>> +		ranges;
>> +		p2u_12: p2u@03eb0000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03eb0000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_13: p2u@03ec0000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03ec0000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_14: p2u@03ed0000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03ed0000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_15: p2u@03ee0000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03ee0000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_16: p2u@03ef0000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03ef0000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_17: p2u@03f00000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03f00000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_18: p2u@03f10000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03f10000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +		p2u_19: p2u@03f20000 {
>> +			compatible = "nvidia,tegra194-phy-p2u";
>> +			reg = <0x0 0x03f20000 0x0 0x00010000>;
>> +			reg-names = "base";
>> +
>> +			#phy-cells = <0>;
>> +		};
>> +	};
> 
> It's not clear to me why NVHS PHYs are listed as a separate bus. Also,
> these really should be sorted by unit-address. If that means that HSIO
> and NVHS PHYs are mixed, so be it. We can use comments to highlight
> which PHYs are of which type. Or perhaps we really should be using
> different compatible strings for them?
I listed NVHS P2Us separately just to distinguish them from HSIO P2Us.
As part of addressing your above comment, I'm going to remove 'hsio-p2u'
and 'nvhs-p2u' nodes and list all P2Us as per their unit address order.

> 
> Same comments on the below as for the bindings earlier.
> 
> Thierry
> 
>> +
>> +	pcie@14180000 {
>> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
>> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX8B>;
>> +		reg = <0x00 0x14180000 0x0 0x00020000   /* appl registers (128K)      */
>> +		       0x00 0x38000000 0x0 0x02000000   /* window1 (32M)              */
>> +		       0x00 0x38000000 0x0 0x00040000   /* configuration space (256K) */
>> +		       0x00 0x38040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
>> +		       0x00 0x38080000 0x0 0x00040000   /* DBI reg space (256K)       */
>> +		       0x18 0x00000000 0x4 0x00000000>; /* window2 (16G)              */
>> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
>> +
>> +		status = "disabled";
>> +
>> +		#address-cells = <3>;
>> +		#size-cells = <2>;
>> +		device_type = "pci";
>> +		num-lanes = <8>;
>> +		num-viewport = <8>;
>> +		linux,pci-domain = <0>;
>> +
>> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_0>;
>> +		clock-names = "core_clk";
>> +
>> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_0_APB>,
>> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_0>;
>> +		reset-names = "core_apb_rst", "core_rst";
>> +
>> +		interrupts = <GIC_SPI 72 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
>> +			     <GIC_SPI 73 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
>> +		interrupt-names = "intr", "msi";
>> +
>> +		#interrupt-cells = <1>;
>> +		interrupt-map-mask = <0 0 0 0>;
>> +		interrupt-map = <0 0 0 0 &gic 0 72 0x04>;
>> +
>> +		nvidia,bpmp = <&bpmp>;
>> +
>> +		nvidia,max-speed = <4>;
>> +		nvidia,disable-aspm-states = <0xf>;
>> +		nvidia,controller-id = <&bpmp 0x0>;
>> +		nvidia,aspm-cmrt = <0x3C>;
>> +		nvidia,aspm-pwr-on-t = <0x14>;
>> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
>> +
>> +		bus-range = <0x0 0xff>;
>> +		ranges = <0x81000000 0x0 0x38100000 0x0 0x38100000 0x0 0x00100000    /* downstream I/O (1MB) */
>> +			  0xc2000000 0x18 0x00000000 0x18 0x00000000 0x3 0x40000000  /* prefetchable memory (13GB) */
>> +			  0x82000000 0x0 0x40000000 0x1B 0x40000000 0x0 0xC0000000>; /* non-prefetchable memory (3GB) */
>> +
>> +		nvidia,event-cntr-ctrl = <0x1d8>;
>> +		nvidia,event-cntr-data = <0x1dc>;
>> +	};
>> +
>> +	pcie@14100000 {
>> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
>> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX1A>;
>> +		reg = <0x00 0x14100000 0x0 0x00020000   /* appl registers (128K)      */
>> +		       0x00 0x30000000 0x0 0x02000000   /* window1 (32M)              */
>> +		       0x00 0x30000000 0x0 0x00040000   /* configuration space (256K) */
>> +		       0x00 0x30040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
>> +		       0x00 0x30080000 0x0 0x00040000   /* DBI reg space (256K)       */
>> +		       0x12 0x00000000 0x0 0x40000000>; /* window2 (1G)               */
>> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
>> +
>> +		status = "disabled";
>> +
>> +		#address-cells = <3>;
>> +		#size-cells = <2>;
>> +		device_type = "pci";
>> +		num-lanes = <1>;
>> +		num-viewport = <8>;
>> +		linux,pci-domain = <1>;
>> +
>> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_1>;
>> +		clock-names = "core_clk";
>> +
>> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_1_APB>,
>> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_1>;
>> +		reset-names = "core_apb_rst", "core_rst";
>> +
>> +		interrupts = <GIC_SPI 45 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
>> +			     <GIC_SPI 46 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
>> +		interrupt-names = "intr", "msi";
>> +
>> +		#interrupt-cells = <1>;
>> +		interrupt-map-mask = <0 0 0 0>;
>> +		interrupt-map = <0 0 0 0 &gic 0 45 0x04>;
>> +
>> +		nvidia,bpmp = <&bpmp>;
>> +
>> +		nvidia,max-speed = <4>;
>> +		nvidia,disable-aspm-states = <0xf>;
>> +		nvidia,controller-id = <&bpmp 0x1>;
>> +		nvidia,aspm-cmrt = <0x3C>;
>> +		nvidia,aspm-pwr-on-t = <0x14>;
>> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
>> +
>> +		bus-range = <0x0 0xff>;
>> +		ranges = <0x81000000 0x0 0x30100000 0x0 0x30100000 0x0 0x00100000    /* downstream I/O (1MB) */
>> +			  0xc2000000 0x12 0x00000000 0x12 0x00000000 0x0 0x30000000  /* prefetchable memory (768MB) */
>> +			  0x82000000 0x0 0x40000000 0x12 0x30000000 0x0 0x10000000>; /* non-prefetchable memory (256MB) */
>> +
>> +		nvidia,event-cntr-ctrl = <0x1a8>;
>> +		nvidia,event-cntr-data = <0x1ac>;
>> +	};
>> +
>> +	pcie@14120000 {
>> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
>> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX1A>;
>> +		reg = <0x00 0x14120000 0x0 0x00020000   /* appl registers (128K)      */
>> +		       0x00 0x32000000 0x0 0x02000000   /* window1 (32M)              */
>> +		       0x00 0x32000000 0x0 0x00040000   /* configuration space (256K) */
>> +		       0x00 0x32040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
>> +		       0x00 0x32080000 0x0 0x00040000   /* DBI reg space (256K)       */
>> +		       0x12 0x40000000 0x0 0x40000000>; /* window2 (1G)               */
>> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
>> +
>> +		status = "disabled";
>> +
>> +		#address-cells = <3>;
>> +		#size-cells = <2>;
>> +		device_type = "pci";
>> +		num-lanes = <1>;
>> +		num-viewport = <8>;
>> +		linux,pci-domain = <2>;
>> +
>> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_2>;
>> +		clock-names = "core_clk";
>> +
>> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_2_APB>,
>> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_2>;
>> +		reset-names = "core_apb_rst", "core_rst";
>> +
>> +		interrupts = <GIC_SPI 47 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
>> +			     <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
>> +		interrupt-names = "intr", "msi";
>> +
>> +		#interrupt-cells = <1>;
>> +		interrupt-map-mask = <0 0 0 0>;
>> +		interrupt-map = <0 0 0 0 &gic 0 47 0x04>;
>> +
>> +		nvidia,bpmp = <&bpmp>;
>> +
>> +		nvidia,max-speed = <4>;
>> +		nvidia,disable-aspm-states = <0xf>;
>> +		nvidia,controller-id = <&bpmp 0x2>;
>> +		nvidia,aspm-cmrt = <0x3C>;
>> +		nvidia,aspm-pwr-on-t = <0x14>;
>> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
>> +
>> +		bus-range = <0x0 0xff>;
>> +		ranges = <0x81000000 0x0 0x32100000 0x0 0x32100000 0x0 0x00100000    /* downstream I/O (1MB) */
>> +			  0xc2000000 0x12 0x40000000 0x12 0x40000000 0x0 0x30000000  /* prefetchable memory (768MB) */
>> +			  0x82000000 0x0 0x40000000 0x12 0x70000000 0x0 0x10000000>; /* non-prefetchable memory (256MB) */
>> +
>> +		nvidia,event-cntr-ctrl = <0x1a8>;
>> +		nvidia,event-cntr-data = <0x1ac>;
>> +	};
>> +
>> +	pcie@14140000 {
>> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
>> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX1A>;
>> +		reg = <0x00 0x14140000 0x0 0x00020000   /* appl registers (128K)      */
>> +		       0x00 0x34000000 0x0 0x02000000   /* window1 (32M)              */
>> +		       0x00 0x34000000 0x0 0x00040000   /* configuration space (256K) */
>> +		       0x00 0x34040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
>> +		       0x00 0x34080000 0x0 0x00040000   /* DBI reg space (256K)       */
>> +		       0x12 0x80000000 0x0 0x40000000>; /* window2 (1G)               */
>> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
>> +
>> +		status = "disabled";
>> +
>> +		#address-cells = <3>;
>> +		#size-cells = <2>;
>> +		device_type = "pci";
>> +		num-lanes = <1>;
>> +		num-viewport = <8>;
>> +		linux,pci-domain = <3>;
>> +
>> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_3>;
>> +		clock-names = "core_clk";
>> +
>> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_3_APB>,
>> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_3>;
>> +		reset-names = "core_apb_rst", "core_rst";
>> +
>> +		interrupts = <GIC_SPI 49 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
>> +			     <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
>> +		interrupt-names = "intr", "msi";
>> +
>> +		#interrupt-cells = <1>;
>> +		interrupt-map-mask = <0 0 0 0>;
>> +		interrupt-map = <0 0 0 0 &gic 0 49 0x04>;
>> +
>> +		nvidia,bpmp = <&bpmp>;
>> +
>> +		nvidia,max-speed = <4>;
>> +		nvidia,disable-aspm-states = <0xf>;
>> +		nvidia,controller-id = <&bpmp 0x3>;
>> +		nvidia,aspm-cmrt = <0x3C>;
>> +		nvidia,aspm-pwr-on-t = <0x14>;
>> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
>> +
>> +		bus-range = <0x0 0xff>;
>> +		ranges = <0x81000000 0x0 0x34100000 0x0 0x34100000 0x0 0x00100000    /* downstream I/O (1MB) */
>> +			  0xc2000000 0x12 0x80000000 0x12 0x80000000 0x0 0x30000000  /* prefetchable memory (768MB) */
>> +			  0x82000000 0x0 0x40000000 0x12 0xB0000000 0x0 0x10000000>; /* non-prefetchable memory (256MB) */
>> +
>> +		nvidia,event-cntr-ctrl = <0x1a8>;
>> +		nvidia,event-cntr-data = <0x1ac>;
>> +	};
>> +
>> +	pcie@14160000 {
>> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
>> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX4A>;
>> +		reg = <0x00 0x14160000 0x0 0x00020000   /* appl registers (128K)      */
>> +		       0x00 0x36000000 0x0 0x02000000   /* window1 (32M)              */
>> +		       0x00 0x36000000 0x0 0x00040000   /* configuration space (256K) */
>> +		       0x00 0x36040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
>> +		       0x00 0x36080000 0x0 0x00040000   /* DBI reg space (256K)       */
>> +		       0x14 0x00000000 0x4 0x00000000>; /* window2 (16G)              */
>> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
>> +
>> +		status = "disabled";
>> +
>> +		#address-cells = <3>;
>> +		#size-cells = <2>;
>> +		device_type = "pci";
>> +		num-lanes = <4>;
>> +		num-viewport = <8>;
>> +		linux,pci-domain = <4>;
>> +
>> +		clocks = <&bpmp TEGRA194_CLK_PEX0_CORE_4>;
>> +		clock-names = "core_clk";
>> +
>> +		resets = <&bpmp TEGRA194_RESET_PEX0_CORE_4_APB>,
>> +			 <&bpmp TEGRA194_RESET_PEX0_CORE_4>;
>> +		reset-names = "core_apb_rst", "core_rst";
>> +
>> +		interrupts = <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
>> +			     <GIC_SPI 52 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
>> +		interrupt-names = "intr", "msi";
>> +
>> +		#interrupt-cells = <1>;
>> +		interrupt-map-mask = <0 0 0 0>;
>> +		interrupt-map = <0 0 0 0 &gic 0 51 0x04>;
>> +
>> +		nvidia,bpmp = <&bpmp>;
>> +
>> +		nvidia,max-speed = <4>;
>> +		nvidia,disable-aspm-states = <0xf>;
>> +		nvidia,controller-id = <&bpmp 0x4>;
>> +		nvidia,aspm-cmrt = <0x3C>;
>> +		nvidia,aspm-pwr-on-t = <0x14>;
>> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
>> +
>> +		bus-range = <0x0 0xff>;
>> +		ranges = <0x81000000 0x0 0x36100000 0x0 0x36100000 0x0 0x00100000    /* downstream I/O (1MB) */
>> +			  0xc2000000 0x14 0x00000000 0x14 0x00000000 0x3 0x40000000  /* prefetchable memory (13GB) */
>> +			  0x82000000 0x0 0x40000000 0x17 0x40000000 0x0 0xC0000000>; /* non-prefetchable memory (3GB) */
>> +
>> +		nvidia,event-cntr-ctrl = <0x1c4>;
>> +		nvidia,event-cntr-data = <0x1c8>;
>> +	};
>> +
>> +	pcie@141a0000 {
>> +		compatible = "nvidia,tegra194-pcie", "snps,dw-pcie";
>> +		power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX8A>;
>> +		reg = <0x00 0x141a0000 0x0 0x00020000   /* appl registers (128K)      */
>> +		       0x00 0x3a000000 0x0 0x02000000   /* window1 (32M)              */
>> +		       0x00 0x3a000000 0x0 0x00040000   /* configuration space (256K) */
>> +		       0x00 0x3a040000 0x0 0x00040000   /* iATU_DMA reg space (256K)  */
>> +		       0x00 0x3a080000 0x0 0x00040000   /* DBI reg space (256K)       */
>> +		       0x1c 0x00000000 0x4 0x00000000>; /* window2 (16G)              */
>> +		reg-names = "appl", "window1", "config", "atu_dma", "dbi", "window2";
>> +
>> +		status = "disabled";
>> +
>> +		#address-cells = <3>;
>> +		#size-cells = <2>;
>> +		device_type = "pci";
>> +		num-lanes = <8>;
>> +		num-viewport = <8>;
>> +		linux,pci-domain = <5>;
>> +
>> +		clocks = <&bpmp TEGRA194_CLK_PEX1_CORE_5>,
>> +			<&bpmp TEGRA194_CLK_PEX1_CORE_5M>;
>> +		clock-names = "core_clk", "core_clk_m";
>> +
>> +		resets = <&bpmp TEGRA194_RESET_PEX1_CORE_5_APB>,
>> +			 <&bpmp TEGRA194_RESET_PEX1_CORE_5>;
>> +		reset-names = "core_apb_rst", "core_rst";
>> +
>> +		interrupts = <GIC_SPI 53 IRQ_TYPE_LEVEL_HIGH>,	/* controller interrupt */
>> +			     <GIC_SPI 54 IRQ_TYPE_LEVEL_HIGH>;	/* MSI interrupt */
>> +		interrupt-names = "intr", "msi";
>> +
>> +		nvidia,bpmp = <&bpmp>;
>> +
>> +		#interrupt-cells = <1>;
>> +		interrupt-map-mask = <0 0 0 0>;
>> +		interrupt-map = <0 0 0 0 &gic 0 53 0x04>;
>> +
>> +		nvidia,max-speed = <4>;
>> +		nvidia,disable-aspm-states = <0xf>;
>> +		nvidia,controller-id = <&bpmp 0x5>;
>> +		nvidia,aspm-cmrt = <0x3C>;
>> +		nvidia,aspm-pwr-on-t = <0x14>;
>> +		nvidia,aspm-l0s-entrance-latency = <0x3>;
>> +
>> +		bus-range = <0x0 0xff>;
>> +		ranges = <0x81000000 0x0 0x3a100000 0x0 0x3a100000 0x0 0x00100000    /* downstream I/O (1MB) */
>> +			  0xc2000000 0x1c 0x00000000 0x1c 0x00000000 0x3 0x40000000  /* prefetchable memory (13GB) */
>> +			  0x82000000 0x0 0x40000000 0x1f 0x40000000 0x0 0xC0000000>; /* non-prefetchable memory (3GB) */
>> +
>> +		nvidia,event-cntr-ctrl = <0x1d8>;
>> +		nvidia,event-cntr-data = <0x1dc>;
>> +	};
>>   };
>> -- 
>> 2.7.4
>>
Vidya Sagar April 2, 2019, 7:17 a.m. UTC | #10
On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> Hi Vidya,
> 
> Wow, there's a lot of nice work here!  Thanks for that!
> 
> On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
>> Add support for Synopsys DesignWare core IP based PCIe host controller
>> present in Tegra194 SoC.
> 
> General comments:
> 
>    - There are a few multi-line comments that don't match the
>      prevailing style:
> 
>          /*
> 	 * Text...
> 	 */
> 
>    - Comments and dev_info()/dev_err() messages are inconsistent about
>      starting with upper-case or lower-case letters.
> 
>    - Use "MSI", "IRQ", "PCIe", "CPU", etc in comments and messages.
> 
>    - There are a few functions that use "&pdev->dev" many times; can
>      you add a "struct device *dev = &pdev->dev" to reduce the
>      redundancy?
Done.

> 
>> +#include "../../pcie/portdrv.h"
> 
> What's this for?  I didn't see any obvious uses of things from
> portdrv.h, but I didn't actually try to build without it.
This is for pcie_pme_disable_msi() API. Since this is defined in portdrv.h
file, I'm including it here.

> 
>> +struct tegra_pcie_dw {
>> +	struct device		*dev;
>> +	struct resource		*appl_res;
>> +	struct resource		*dbi_res;
>> +	struct resource		*atu_dma_res;
>> +	void __iomem		*appl_base;
>> +	struct clk		*core_clk;
>> +	struct reset_control	*core_apb_rst;
>> +	struct reset_control	*core_rst;
>> +	struct dw_pcie		pci;
>> +	enum dw_pcie_device_mode mode;
>> +
>> +	bool disable_clock_request;
>> +	bool power_down_en;
>> +	u8 init_link_width;
>> +	bool link_state;
>> +	u32 msi_ctrl_int;
>> +	u32 num_lanes;
>> +	u32 max_speed;
>> +	u32 init_speed;
>> +	bool cdm_check;
>> +	u32 cid;
>> +	int pex_wake;
>> +	bool update_fc_fixup;
>> +	int n_gpios;
>> +	int *gpios;
>> +#if defined(CONFIG_PCIEASPM)
>> +	u32 cfg_link_cap_l1sub;
>> +	u32 event_cntr_ctrl;
>> +	u32 event_cntr_data;
>> +	u32 aspm_cmrt;
>> +	u32 aspm_pwr_on_t;
>> +	u32 aspm_l0s_enter_lat;
>> +	u32 disabled_aspm_states;
>> +#endif
> 
> The above could be indented the same as the rest of the struct?
Done.

> 
>> +	struct regulator	*pex_ctl_reg;
>> +
>> +	int			phy_count;
>> +	struct phy		**phy;
>> +
>> +	struct dentry		*debugfs;
>> +};
> 
>> +static void apply_bad_link_workaround(struct pcie_port *pp)
>> +{
>> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>> +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
>> +	u16 val;
>> +
>> +	/*
>> +	 * NOTE:- Since this scenario is uncommon and link as
>> +	 * such is not stable anyway, not waiting to confirm
>> +	 * if link is really transiting to Gen-2 speed
> 
> s/transiting/transitioning/
> 
> I think there are other uses of "transit" when you mean "transition".
Done.

> 
>> +static int tegra_pcie_dw_rd_own_conf(struct pcie_port *pp, int where, int size,
>> +				     u32 *val)
>> +{
>> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>> +
>> +	/*
>> +	 * This is an endpoint mode specific register happen to appear even
>> +	 * when controller is operating in root port mode and system hangs
>> +	 * when it is accessed with link being in ASPM-L1 state.
>> +	 * So skip accessing it altogether
>> +	 */
>> +	if (where == PORT_LOGIC_MSIX_DOORBELL) {
>> +		*val = 0x00000000;
>> +		return PCIBIOS_SUCCESSFUL;
>> +	} else {
>> +		return dw_pcie_read(pci->dbi_base + where, size, val);
>> +	}
>> +}
>> +
>> +static int tegra_pcie_dw_wr_own_conf(struct pcie_port *pp, int where, int size,
>> +				     u32 val)
>> +{
>> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>> +
>> +	/* This is EP specific register and system hangs when it is
>> +	 * accessed with link being in ASPM-L1 state.
>> +	 * So skip accessing it altogether
>> +	 */
>> +	if (where == PORT_LOGIC_MSIX_DOORBELL)
>> +		return PCIBIOS_SUCCESSFUL;
>> +	else
>> +		return dw_pcie_write(pci->dbi_base + where, size, val);
> 
> These two functions are almost identical and they could look more
> similar.  This one has the wrong multi-line comment style, uses "EP"
> instead of "endpoint", etc.  Use this style for the "if" since the
> first case is really an error case:
> 
>    if (where == PORT_LOGIC_MSIX_DOORBELL) {
>      ...
>      return ...;
>    }
> 
>    return dw_pcie_...();
Done.

> 
>> +static int tegra_pcie_dw_host_init(struct pcie_port *pp)
>> +{
>> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>> +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
>> +	int count = 200;
>> +	u32 val, tmp, offset;
>> +	u16 val_w;
>> +
>> +#if defined(CONFIG_PCIEASPM)
>> +	pcie->cfg_link_cap_l1sub =
>> +		dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_ID_L1SS) +
>> +		PCI_L1SS_CAP;
>> +#endif
>> +	val = dw_pcie_readl_dbi(pci, PCI_IO_BASE);
>> +	val &= ~(IO_BASE_IO_DECODE | IO_BASE_IO_DECODE_BIT8);
>> +	dw_pcie_writel_dbi(pci, PCI_IO_BASE, val);
>> +
>> +	val = dw_pcie_readl_dbi(pci, PCI_PREF_MEMORY_BASE);
>> +	val |= CFG_PREF_MEM_LIMIT_BASE_MEM_DECODE;
>> +	val |= CFG_PREF_MEM_LIMIT_BASE_MEM_LIMIT_DECODE;
>> +	dw_pcie_writel_dbi(pci, PCI_PREF_MEMORY_BASE, val);
>> +
>> +	dw_pcie_writel_dbi(pci, PCI_BASE_ADDRESS_0, 0);
>> +
>> +	/* Configure FTS */
>> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL);
>> +	val &= ~(N_FTS_MASK << N_FTS_SHIFT);
>> +	val |= N_FTS_VAL << N_FTS_SHIFT;
>> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL, val);
>> +
>> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_GEN2_CTRL);
>> +	val &= ~FTS_MASK;
>> +	val |= FTS_VAL;
>> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_GEN2_CTRL, val);
>> +
>> +	/* Enable as 0xFFFF0001 response for CRS */
>> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_AMBA_ERROR_RESPONSE_DEFAULT);
>> +	val &= ~(AMBA_ERROR_RESPONSE_CRS_MASK << AMBA_ERROR_RESPONSE_CRS_SHIFT);
>> +	val |= (AMBA_ERROR_RESPONSE_CRS_OKAY_FFFF0001 <<
>> +		AMBA_ERROR_RESPONSE_CRS_SHIFT);
>> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_AMBA_ERROR_RESPONSE_DEFAULT, val);
>> +
>> +	/* Set MPS to 256 in DEV_CTL */
>> +	val = dw_pcie_readl_dbi(pci, CFG_DEV_STATUS_CONTROL);
>> +	val &= ~PCI_EXP_DEVCTL_PAYLOAD;
>> +	val |= (1 << CFG_DEV_STATUS_CONTROL_MPS_SHIFT);
>> +	dw_pcie_writel_dbi(pci, CFG_DEV_STATUS_CONTROL, val);
>> +
>> +	/* Configure Max Speed from DT */
>> +	val = dw_pcie_readl_dbi(pci, CFG_LINK_CAP);
>> +	val &= ~PCI_EXP_LNKCAP_SLS;
>> +	val |= pcie->max_speed;
>> +	dw_pcie_writel_dbi(pci, CFG_LINK_CAP, val);
>> +
>> +	val = dw_pcie_readw_dbi(pci, CFG_LINK_CONTROL_2);
>> +	val &= ~PCI_EXP_LNKCTL2_TLS;
>> +	val |= pcie->init_speed;
>> +	dw_pcie_writew_dbi(pci, CFG_LINK_CONTROL_2, val);
>> +
>> +	/* Configure Max lane width from DT */
>> +	val = dw_pcie_readl_dbi(pci, CFG_LINK_CAP);
>> +	val &= ~PCI_EXP_LNKCAP_MLW;
>> +	val |= (pcie->num_lanes << PCI_EXP_LNKSTA_NLW_SHIFT);
>> +	dw_pcie_writel_dbi(pci, CFG_LINK_CAP, val);
>> +
>> +	config_gen3_gen4_eq_presets(pcie);
>> +
>> +#if defined(CONFIG_PCIEASPM)
>> +	/* Enable ASPM counters */
>> +	val = EVENT_COUNTER_ENABLE_ALL << EVENT_COUNTER_ENABLE_SHIFT;
>> +	val |= EVENT_COUNTER_GROUP_5 << EVENT_COUNTER_GROUP_SEL_SHIFT;
>> +	dw_pcie_writel_dbi(pci, pcie->event_cntr_ctrl, val);
>> +
>> +	/* Program T_cmrt and T_pwr_on values */
>> +	val = dw_pcie_readl_dbi(pci, pcie->cfg_link_cap_l1sub);
>> +	val &= ~(PCI_L1SS_CAP_CM_RESTORE_TIME | PCI_L1SS_CAP_P_PWR_ON_VALUE);
>> +	val |= (pcie->aspm_cmrt << PCI_L1SS_CAP_CM_RTM_SHIFT);
>> +	val |= (pcie->aspm_pwr_on_t << PCI_L1SS_CAP_PWRN_VAL_SHIFT);
>> +	dw_pcie_writel_dbi(pci, pcie->cfg_link_cap_l1sub, val);
>> +
>> +	/* Program L0s and L1 entrance latencies */
>> +	val = dw_pcie_readl_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL);
>> +	val &= ~L0S_ENTRANCE_LAT_MASK;
>> +	val |= (pcie->aspm_l0s_enter_lat << L0S_ENTRANCE_LAT_SHIFT);
>> +	val |= ENTER_ASPM;
>> +	dw_pcie_writel_dbi(pci, PORT_LOGIC_ACK_F_ASPM_CTRL, val);
>> +
>> +	/* Program what ASPM states sould get advertised */
> 
> s/sould/should/
Done.

> 
>> +	if (pcie->disabled_aspm_states & 0x1)
>> +		disable_aspm_l0s(pcie); /* Disable L0s */
>> +	if (pcie->disabled_aspm_states & 0x2) {
>> +		disable_aspm_l10(pcie); /* Disable L1 */
>> +		disable_aspm_l11(pcie); /* Disable L1.1 */
>> +		disable_aspm_l12(pcie); /* Disable L1.2 */
>> +	}
>> +	if (pcie->disabled_aspm_states & 0x4)
>> +		disable_aspm_l11(pcie); /* Disable L1.1 */
>> +	if (pcie->disabled_aspm_states & 0x8)
>> +		disable_aspm_l12(pcie); /* Disable L1.2 */
>> +#endif
>> +	val = dw_pcie_readl_dbi(pci, GEN3_RELATED_OFF);
>> +	val &= ~GEN3_RELATED_OFF_GEN3_ZRXDC_NONCOMPL;
>> +	dw_pcie_writel_dbi(pci, GEN3_RELATED_OFF, val);
>> +
>> +	if (pcie->update_fc_fixup) {
>> +		val = dw_pcie_readl_dbi(pci, CFG_TIMER_CTRL_MAX_FUNC_NUM_OFF);
>> +		val |= 0x1 << CFG_TIMER_CTRL_ACK_NAK_SHIFT;
>> +		dw_pcie_writel_dbi(pci, CFG_TIMER_CTRL_MAX_FUNC_NUM_OFF, val);
>> +	}
>> +
>> +	/* CDM check enable */
>> +	if (pcie->cdm_check) {
>> +		val = dw_pcie_readl_dbi(pci,
>> +					PORT_LOGIC_PL_CHK_REG_CONTROL_STATUS);
>> +		val |= PORT_LOGIC_PL_CHK_REG_CHK_REG_CONTINUOUS;
>> +		val |= PORT_LOGIC_PL_CHK_REG_CHK_REG_START;
>> +		dw_pcie_writel_dbi(pci, PORT_LOGIC_PL_CHK_REG_CONTROL_STATUS,
>> +				   val);
>> +	}
>> +
>> +	dw_pcie_setup_rc(pp);
>> +
>> +	clk_set_rate(pcie->core_clk, GEN4_CORE_CLK_FREQ);
>> +
>> +	/* assert RST */
>> +	val = readl(pcie->appl_base + APPL_PINMUX);
>> +	val &= ~APPL_PINMUX_PEX_RST;
>> +	writel(val, pcie->appl_base + APPL_PINMUX);
>> +
>> +	usleep_range(100, 200);
>> +
>> +	/* enable LTSSM */
>> +	val = readl(pcie->appl_base + APPL_CTRL);
>> +	val |= APPL_CTRL_LTSSM_EN;
>> +	writel(val, pcie->appl_base + APPL_CTRL);
>> +
>> +	/* de-assert RST */
>> +	val = readl(pcie->appl_base + APPL_PINMUX);
>> +	val |= APPL_PINMUX_PEX_RST;
>> +	writel(val, pcie->appl_base + APPL_PINMUX);
>> +
>> +	msleep(100);
>> +
>> +	val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
>> +	while (!(val_w & PCI_EXP_LNKSTA_DLLLA)) {
>> +		if (!count) {
>> +			val = readl(pcie->appl_base + APPL_DEBUG);
>> +			val &= APPL_DEBUG_LTSSM_STATE_MASK;
>> +			val >>= APPL_DEBUG_LTSSM_STATE_SHIFT;
>> +			tmp = readl(pcie->appl_base + APPL_LINK_STATUS);
>> +			tmp &= APPL_LINK_STATUS_RDLH_LINK_UP;
>> +			if (val == 0x11 && !tmp) {
>> +				dev_info(pci->dev, "link is down in DLL");
>> +				dev_info(pci->dev,
>> +					 "trying again with DLFE disabled\n");
>> +				/* disable LTSSM */
>> +				val = readl(pcie->appl_base + APPL_CTRL);
>> +				val &= ~APPL_CTRL_LTSSM_EN;
>> +				writel(val, pcie->appl_base + APPL_CTRL);
>> +
>> +				reset_control_assert(pcie->core_rst);
>> +				reset_control_deassert(pcie->core_rst);
>> +
>> +				offset =
>> +				dw_pcie_find_ext_capability(pci,
>> +							    PCI_EXT_CAP_ID_DLF)
>> +				+ PCI_DLF_CAP;
> 
> This capability offset doesn't change, does it?  Could it be computed
> outside the loop?
This is the only place where DLF offset is needed and gets calculated and this
scenario is very rare as so far only a legacy ASMedia USB3.0 card requires DLF
to be disabled to get PCIe link up. So, I thought of calculating the offset
here itself instead of using a separate variable.

> 
>> +				val = dw_pcie_readl_dbi(pci, offset);
>> +				val &= ~DL_FEATURE_EXCHANGE_EN;
>> +				dw_pcie_writel_dbi(pci, offset, val);
>> +
>> +				tegra_pcie_dw_host_init(&pcie->pci.pp);
> 
> This looks like some sort of "wait for link up" retry loop, but a
> recursive call seems a little unusual.  My 5 second analysis is that
> the loop could run this 200 times, and you sure don't want the
> possibility of a 200-deep call chain.  Is there way to split out the
> host init from the link-up polling?
Again, this recursive calling comes into picture only for a legacy ASMedia
USB3.0 card and it is going to be a 1-deep call chain as the recursion takes
place only once depending on the condition. Apart from the legacy ASMedia card,
there is no other card at this point in time out of a huge number of cards that we have
tested.

> 
>> +				return 0;
>> +			}
>> +			dev_info(pci->dev, "link is down\n");
>> +			return 0;
>> +		}
>> +		dev_dbg(pci->dev, "polling for link up\n");
>> +		usleep_range(1000, 2000);
>> +		val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
>> +		count--;
>> +	}
>> +	dev_info(pci->dev, "link is up\n");
>> +
>> +	tegra_pcie_enable_interrupts(pp);
>> +
>> +	return 0;
>> +}
> 
>> +static void tegra_pcie_dw_scan_bus(struct pcie_port *pp)
>> +{
>> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>> +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
>> +	u32 speed;
>> +
>> +	if (!tegra_pcie_dw_link_up(pci))
>> +		return;
>> +
>> +	speed = (dw_pcie_readw_dbi(pci, CFG_LINK_STATUS) & PCI_EXP_LNKSTA_CLS);
>> +	clk_set_rate(pcie->core_clk, pcie_gen_freq[speed - 1]);
> 
> I don't understand what's happening here.  This is named
> tegra_pcie_dw_scan_bus(), but it doesn't actually scan anything.
> Maybe it's just a bad name for the dw_pcie_host_ops hook
> (ks_pcie_v3_65_scan_bus() is the only other .scan_bus()
> implementation, and it doesn't scan anything either).
> 
> dw_pcie_host_init() calls pci_scan_root_bus_bridge(), which actually
> *does* scan the bus, but it does it before calling
> pp->ops->scan_bus().  I'd say by the time we get to
> pci_scan_root_bus_bridge(), the device-specific init should be all
> done and we should be using only generic PCI core interfaces.
> 
> Maybe this stuff could/should be done in the ->host_init() hook?  The
> code between ->host_init() and ->scan_bus() is all generic code with
> no device-specific stuff, so I don't know why we need both hooks.
Agree. At least whatever I'm going here as part of scan_bus can be moved to
host_init() itslef. I'm not sure about the original intention of the scan_bus
but my understanding is that, after PCIe sub-system enumerates all devices, if
something needs to be done, then, probably scan_bus() can be implemented for that.
I had some other code initially which was accessing downstream devices, hence I
implemented scan_bus(), but, now, given all that is gone, I can move this code to
host_init() itself.

> 
>> +static int tegra_pcie_enable_phy(struct tegra_pcie_dw *pcie)
>> +{
>> +	int phy_count = pcie->phy_count;
>> +	int ret;
>> +	int i;
>> +
>> +	for (i = 0; i < phy_count; i++) {
>> +		ret = phy_init(pcie->phy[i]);
>> +		if (ret < 0)
>> +			goto err_phy_init;
>> +
>> +		ret = phy_power_on(pcie->phy[i]);
>> +		if (ret < 0) {
>> +			phy_exit(pcie->phy[i]);
>> +			goto err_phy_power_on;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +
>> +	while (i >= 0) {
>> +		phy_power_off(pcie->phy[i]);
>> +err_phy_power_on:
>> +		phy_exit(pcie->phy[i]);
>> +err_phy_init:
>> +		i--;
>> +	}
> 
> Wow, jumping into the middle of that loop is clever ;)  Can't decide
> what I think of it, but it's certainly clever!
> 
>> +	return ret;
>> +}
>> +
>> +static int tegra_pcie_dw_parse_dt(struct tegra_pcie_dw *pcie)
>> +{
>> +	struct device_node *np = pcie->dev->of_node;
>> +	int ret;
>> +
>> +#if defined(CONFIG_PCIEASPM)
>> +	ret = of_property_read_u32(np, "nvidia,event-cntr-ctrl",
>> +				   &pcie->event_cntr_ctrl);
>> +	if (ret < 0) {
>> +		dev_err(pcie->dev, "fail to read event-cntr-ctrl: %d\n", ret);
>> +		return ret;
>> +	}
> 
> The fact that you return error here if DT lacks the
> "nvidia,event-cntr-ctrl" property, but only if CONFIG_PCIEASPM=y,
> means that you have a revlock between the DT and the kernel: if you
> update the kernel to enable CONFIG_PCIEASPM, you may also have to
> update your DT.
> 
> Maybe that's OK, but I think it'd be nicer if you always required the
> presence of these properties, even if you only actually *use* them
> when CONFIG_PCIEASPM=y.
Done

> 
>> +static int tegra_pcie_dw_probe(struct platform_device *pdev)
>> +{
>> +	struct tegra_pcie_dw *pcie;
>> +	struct pcie_port *pp;
>> +	struct dw_pcie *pci;
>> +	struct phy **phy;
>> +	struct resource	*dbi_res;
>> +	struct resource	*atu_dma_res;
>> +	const struct of_device_id *match;
>> +	const struct tegra_pcie_of_data *data;
>> +	char *name;
>> +	int ret, i;
>> +
>> +	pcie = devm_kzalloc(&pdev->dev, sizeof(*pcie), GFP_KERNEL);
>> +	if (!pcie)
>> +		return -ENOMEM;
>> +
>> +	pci = &pcie->pci;
>> +	pci->dev = &pdev->dev;
>> +	pci->ops = &tegra_dw_pcie_ops;
>> +	pp = &pci->pp;
>> +	pcie->dev = &pdev->dev;
>> +
>> +	match = of_match_device(of_match_ptr(tegra_pcie_dw_of_match),
>> +				&pdev->dev);
>> +	if (!match)
>> +		return -EINVAL;
> 
> Logically could be the first thing in the function since it doesn't
> depend on anything.
Done

> 
>> +	data = (struct tegra_pcie_of_data *)match->data;
>> +	pcie->mode = (enum dw_pcie_device_mode)data->mode;
>> +
>> +	ret = tegra_pcie_dw_parse_dt(pcie);
>> +	if (ret < 0) {
>> +		dev_err(pcie->dev, "device tree parsing failed: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	if (gpio_is_valid(pcie->pex_wake)) {
>> +		ret = devm_gpio_request(pcie->dev, pcie->pex_wake,
>> +					"pcie_wake");
>> +		if (ret < 0) {
>> +			if (ret == -EBUSY) {
>> +				dev_err(pcie->dev,
>> +					"pex_wake already in use\n");
>> +				pcie->pex_wake = -EINVAL;
> 
> This looks strange.  "pex_wake == -EINVAL" doesn't look right, and
> you're about to pass it to gpio_direction_input(), which looks wrong.
Done.

> 
>> +			} else {
>> +				dev_err(pcie->dev,
>> +					"pcie_wake gpio_request failed %d\n",
>> +					ret);
>> +				return ret;
>> +			}
>> +		}
>> +
>> +		ret = gpio_direction_input(pcie->pex_wake);
>> +		if (ret < 0) {
>> +			dev_err(pcie->dev,
>> +				"setting pcie_wake input direction failed %d\n",
>> +				ret);
>> +			return ret;
>> +		}
>> +		device_init_wakeup(pcie->dev, true);
>> +	}
>> +
>> +	pcie->pex_ctl_reg = devm_regulator_get(&pdev->dev, "vddio-pex-ctl");
>> +	if (IS_ERR(pcie->pex_ctl_reg)) {
>> +		dev_err(&pdev->dev, "fail to get regulator: %ld\n",
>> +			PTR_ERR(pcie->pex_ctl_reg));
>> +		return PTR_ERR(pcie->pex_ctl_reg);
>> +	}
>> +
>> +	pcie->core_clk = devm_clk_get(&pdev->dev, "core_clk");
>> +	if (IS_ERR(pcie->core_clk)) {
>> +		dev_err(&pdev->dev, "Failed to get core clock\n");
>> +		return PTR_ERR(pcie->core_clk);
>> +	}
>> +
>> +	pcie->appl_res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
>> +						      "appl");
>> +	if (!pcie->appl_res) {
>> +		dev_err(&pdev->dev, "missing appl space\n");
>> +		return PTR_ERR(pcie->appl_res);
>> +	}
>> +	pcie->appl_base = devm_ioremap_resource(&pdev->dev, pcie->appl_res);
>> +	if (IS_ERR(pcie->appl_base)) {
>> +		dev_err(&pdev->dev, "mapping appl space failed\n");
>> +		return PTR_ERR(pcie->appl_base);
>> +	}
>> +
>> +	pcie->core_apb_rst = devm_reset_control_get(pcie->dev, "core_apb_rst");
>> +	if (IS_ERR(pcie->core_apb_rst)) {
>> +		dev_err(pcie->dev, "PCIE : core_apb_rst reset is missing\n");
> 
> This error message looks different from the others ("PCIE :" prefix).
Done.

> 
>> +		return PTR_ERR(pcie->core_apb_rst);
>> +	}
>> +
>> +	phy = devm_kcalloc(pcie->dev, pcie->phy_count, sizeof(*phy),
>> +			   GFP_KERNEL);
>> +	if (!phy)
>> +		return PTR_ERR(phy);
>> +
>> +	for (i = 0; i < pcie->phy_count; i++) {
>> +		name = kasprintf(GFP_KERNEL, "pcie-p2u-%u", i);
>> +		if (!name) {
>> +			dev_err(pcie->dev, "failed to create p2u string\n");
>> +			return -ENOMEM;
>> +		}
>> +		phy[i] = devm_phy_get(pcie->dev, name);
>> +		kfree(name);
>> +		if (IS_ERR(phy[i])) {
>> +			ret = PTR_ERR(phy[i]);
>> +			dev_err(pcie->dev, "phy_get error: %d\n", ret);
>> +			return ret;
>> +		}
>> +	}
>> +
>> +	pcie->phy = phy;
>> +
>> +	dbi_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "dbi");
>> +	if (!dbi_res) {
>> +		dev_err(&pdev->dev, "missing config space\n");
>> +		return PTR_ERR(dbi_res);
>> +	}
>> +	pcie->dbi_res = dbi_res;
>> +
>> +	pci->dbi_base = devm_ioremap_resource(&pdev->dev, dbi_res);
>> +	if (IS_ERR(pci->dbi_base)) {
>> +		dev_err(&pdev->dev, "mapping dbi space failed\n");
>> +		return PTR_ERR(pci->dbi_base);
>> +	}
>> +
>> +	/* Tegra HW locates DBI2 at a fixed offset from DBI */
>> +	pci->dbi_base2 = pci->dbi_base + 0x1000;
>> +
>> +	atu_dma_res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
>> +						   "atu_dma");
>> +	if (!atu_dma_res) {
>> +		dev_err(&pdev->dev, "missing atu_dma space\n");
>> +		return PTR_ERR(atu_dma_res);
>> +	}
>> +	pcie->atu_dma_res = atu_dma_res;
>> +	pci->atu_base = devm_ioremap_resource(&pdev->dev, atu_dma_res);
>> +	if (IS_ERR(pci->atu_base)) {
>> +		dev_err(&pdev->dev, "mapping atu space failed\n");
>> +		return PTR_ERR(pci->atu_base);
>> +	}
>> +
>> +	pcie->core_rst = devm_reset_control_get(pcie->dev, "core_rst");
>> +	if (IS_ERR(pcie->core_rst)) {
>> +		dev_err(pcie->dev, "PCIE : core_rst reset is missing\n");
> 
> Different message format again.
Done.

> 
>> +		return PTR_ERR(pcie->core_rst);
>> +	}
>> +
>> +	pp->irq = platform_get_irq_byname(pdev, "intr");
>> +	if (!pp->irq) {
>> +		dev_err(pcie->dev, "failed to get intr interrupt\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	ret = devm_request_irq(&pdev->dev, pp->irq, tegra_pcie_irq_handler,
>> +			       IRQF_SHARED, "tegra-pcie-intr", pcie);
>> +	if (ret) {
>> +		dev_err(pcie->dev, "failed to request \"intr\" irq\n");
> 
> It'd be nice to include the actual IRQ, i.e., "IRQ %d", pp->irq.
Done.

> 
>> +		return ret;
>> +	}
>> +
>> +	platform_set_drvdata(pdev, pcie);
>> +
>> +	if (pcie->mode == DW_PCIE_RC_TYPE) {
>> +		ret = tegra_pcie_config_rp(pcie);
>> +		if (ret == -ENOMEDIUM)
>> +			ret = 0;
>> +	}
>> +
>> +	return ret;
>> +}
> 
>> +static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
>> +{
>> +	struct pci_dev *pdev = NULL;
> 
> Unnecessary initialization.
Done.

> 
>> +	struct pci_bus *child;
>> +	struct pcie_port *pp = &pcie->pci.pp;
>> +
>> +	list_for_each_entry(child, &pp->bus->children, node) {
>> +		/* Bring downstream devices to D0 if they are not already in */
>> +		if (child->parent == pp->bus) {
>> +			pdev = pci_get_slot(child, PCI_DEVFN(0, 0));
>> +			pci_dev_put(pdev);
>> +			if (!pdev)
>> +				break;
> 
> I don't really like this dance with iterating over the bus children,
> comparing parent to pp->bus, pci_get_slot(), pci_dev_put(), etc.
> 
> I guess the idea is to bring only the directly-downstream devices to
> D0, not to do it for things deeper in the hierarchy?
Yes.

> 
> Is this some Tegra-specific wrinkle?  I don't think other drivers do
> this.
With Tegra PCIe controller, if the downstream device is in non-D0 states,
link doesn't go into L2 state. We observed this behavior with some of the
devices and the solution would be to bring them to D0 state and then attempt
sending PME_TurnOff message to put the link to L2 state.
Since spec also supports this mechanism (Rev.4.0 Ver.1.0 Page #428), we chose
to implement this.

> 
> I see that an earlier patch added "bus" to struct pcie_port.  I think
> it would be better to somehow connect to the pci_host_bridge struct.
> Several other drivers already do this; see uses of
> pci_host_bridge_from_priv().
All non-DesignWare based implementations save their private data structure
in 'private' pointer of struct pci_host_bridge and use pci_host_bridge_from_priv()
to get it back. But, DesignWare based implementations save pcie_port in 'sysdata'
and nothing in 'private' pointer. So,  I'm not sure if pci_host_bridge_from_priv()
can be used in this case. Please do let me know if you think otherwise.

> 
> That would give you the bus, as well as flags like no_ext_tags,
> native_aer, etc, which this driver, being a host bridge driver that's
> responsible for this part of the firmware/OS interface, may
> conceivably need.
> 
> Rather than pci_get_slot(), couldn't you iterate over bus->devices and
> just skip the non-PCI_DEVFN(0, 0) devices?
> 
>> +
>> +			if (pci_set_power_state(pdev, PCI_D0))
>> +				dev_err(pcie->dev, "D0 transition failed\n");
>> +		}
>> +	}
>> +}
Yeah. This seems to be a better method. I'll implement this.

> 
>> +static int tegra_pcie_dw_remove(struct platform_device *pdev)
>> +{
>> +	struct tegra_pcie_dw *pcie = platform_get_drvdata(pdev);
>> +
>> +	if (pcie->mode == DW_PCIE_RC_TYPE) {
> 
> Return early if it's not RC and unindent the rest of the function.
Done.

> 
>> +		if (!pcie->link_state && pcie->power_down_en)
>> +			return 0;
>> +
>> +		debugfs_remove_recursive(pcie->debugfs);
>> +		pm_runtime_put_sync(pcie->dev);
>> +		pm_runtime_disable(pcie->dev);
>> +	}
>> +
>> +	return 0;
>> +}
> 
>> +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
>> +{
>> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
>> +
>> +	tegra_pcie_downstream_dev_to_D0(pcie);
>> +
>> +	pci_stop_root_bus(pcie->pci.pp.bus);
>> +	pci_remove_root_bus(pcie->pci.pp.bus);
> 
> Why are you calling these?  No other drivers do this except in their
> .remove() methods.  Is there something special about Tegra, or is this
> something the other drivers *should* be doing?
Since this API is called by remove, I'm removing the hierarchy to safely
bring down all the devices. I'll have to re-visit this part as
Jisheng Zhang's patches https://patchwork.kernel.org/project/linux-pci/list/?series=98559
are now approved and I need to verify this part after cherry-picking
Jisheng's changes.

> 
>> +	tegra_pcie_dw_pme_turnoff(pcie);
>> +
>> +	reset_control_assert(pcie->core_rst);
>> +	tegra_pcie_disable_phy(pcie);
>> +	reset_control_assert(pcie->core_apb_rst);
>> +	clk_disable_unprepare(pcie->core_clk);
>> +	regulator_disable(pcie->pex_ctl_reg);
>> +	config_plat_gpio(pcie, 0);
>> +
>> +	if (pcie->cid != CTRL_5)
>> +		tegra_pcie_bpmp_set_ctrl_state(pcie, false);
>> +
>> +	return 0;
>> +}
>> +
>> +static int tegra_pcie_dw_runtime_resume(struct device *dev)
>> +{
>> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
>> +	struct dw_pcie *pci = &pcie->pci;
>> +	struct pcie_port *pp = &pci->pp;
>> +	int ret = 0;
>> +
>> +	ret = tegra_pcie_config_controller(pcie, false);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	/* program to use MPS of 256 whereever possible */
> 
> s/whereever/wherever/
Done.

> 
>> +	pcie_bus_config = PCIE_BUS_SAFE;
>> +
>> +	pp->root_bus_nr = -1;
>> +	pp->ops = &tegra_pcie_dw_host_ops;
>> +
>> +	/* Disable MSI interrupts for PME messages */
> 
> Superfluous comment; it repeats the function name.
Removed it.

> 
>> +static int tegra_pcie_dw_suspend_noirq(struct device *dev)
>> +{
>> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
>> +	int ret = 0;
>> +
>> +	if (!pcie->link_state)
>> +		return 0;
>> +
>> +	/* save MSI interrutp vector*/
> 
> s/interrutp/interrupt/
> s/vector/vector /
Done.

> 
>> +static int tegra_pcie_dw_resume_noirq(struct device *dev)
>> +{
>> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
>> +	int ret;
>> +
>> +	if (!pcie->link_state)
>> +		return 0;
>> +
>> +	if (gpio_is_valid(pcie->pex_wake) && device_may_wakeup(dev)) {
>> +		ret = disable_irq_wake(gpio_to_irq(pcie->pex_wake));
>> +		if (ret < 0)
>> +			dev_err(dev, "disable wake irq failed: %d\n", ret);
>> +	}
>> +
>> +	ret = tegra_pcie_config_controller(pcie, true);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = tegra_pcie_dw_host_init(&pcie->pci.pp);
>> +	if (ret < 0) {
>> +		dev_err(dev, "failed to init host: %d\n", ret);
>> +		goto fail_host_init;
>> +	}
>> +
>> +	/* restore MSI interrutp vector*/
> 
> s/interrutp/interrupt/
> s/vector/vector /
Done.

> 
>> +static void tegra_pcie_dw_shutdown(struct platform_device *pdev)
>> +{
>> +	struct tegra_pcie_dw *pcie = platform_get_drvdata(pdev);
>> +
>> +	if (pcie->mode == DW_PCIE_RC_TYPE) {
> 
>    if (pcie->mode != DW_PCIE_RC_TYPE)
>      return;
> 
> Then you can unindent the whole function.
Done.

> 
>> +		if (!pcie->link_state && pcie->power_down_en)
>> +			return;
>> +
>> +		debugfs_remove_recursive(pcie->debugfs);
>> +		tegra_pcie_downstream_dev_to_D0(pcie);
>> +
>> +		/* Disable interrupts */
> 
> Superfluous comment.
Removed it.

> 
>> +		disable_irq(pcie->pci.pp.irq);
Thierry Reding April 2, 2019, 2:14 p.m. UTC | #11
On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
[...]
> > > +static int tegra_pcie_dw_host_init(struct pcie_port *pp)
> > > +{
[...]
> > > +	val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
> > > +	while (!(val_w & PCI_EXP_LNKSTA_DLLLA)) {
> > > +		if (!count) {
> > > +			val = readl(pcie->appl_base + APPL_DEBUG);
> > > +			val &= APPL_DEBUG_LTSSM_STATE_MASK;
> > > +			val >>= APPL_DEBUG_LTSSM_STATE_SHIFT;
> > > +			tmp = readl(pcie->appl_base + APPL_LINK_STATUS);
> > > +			tmp &= APPL_LINK_STATUS_RDLH_LINK_UP;
> > > +			if (val == 0x11 && !tmp) {
> > > +				dev_info(pci->dev, "link is down in DLL");
> > > +				dev_info(pci->dev,
> > > +					 "trying again with DLFE disabled\n");
> > > +				/* disable LTSSM */
> > > +				val = readl(pcie->appl_base + APPL_CTRL);
> > > +				val &= ~APPL_CTRL_LTSSM_EN;
> > > +				writel(val, pcie->appl_base + APPL_CTRL);
> > > +
> > > +				reset_control_assert(pcie->core_rst);
> > > +				reset_control_deassert(pcie->core_rst);
> > > +
> > > +				offset =
> > > +				dw_pcie_find_ext_capability(pci,
> > > +							    PCI_EXT_CAP_ID_DLF)
> > > +				+ PCI_DLF_CAP;
> > 
> > This capability offset doesn't change, does it?  Could it be computed
> > outside the loop?
> This is the only place where DLF offset is needed and gets calculated and this
> scenario is very rare as so far only a legacy ASMedia USB3.0 card requires DLF
> to be disabled to get PCIe link up. So, I thought of calculating the offset
> here itself instead of using a separate variable.
> 
> > 
> > > +				val = dw_pcie_readl_dbi(pci, offset);
> > > +				val &= ~DL_FEATURE_EXCHANGE_EN;
> > > +				dw_pcie_writel_dbi(pci, offset, val);
> > > +
> > > +				tegra_pcie_dw_host_init(&pcie->pci.pp);
> > 
> > This looks like some sort of "wait for link up" retry loop, but a
> > recursive call seems a little unusual.  My 5 second analysis is that
> > the loop could run this 200 times, and you sure don't want the
> > possibility of a 200-deep call chain.  Is there way to split out the
> > host init from the link-up polling?
> Again, this recursive calling comes into picture only for a legacy ASMedia
> USB3.0 card and it is going to be a 1-deep call chain as the recursion takes
> place only once depending on the condition. Apart from the legacy ASMedia card,
> there is no other card at this point in time out of a huge number of cards that we have
> tested.

A more idiomatic way would be to add a "retry:" label somewhere and goto
that after disabling DLFE. That way you achieve the same effect, but you
can avoid the recursion, even if it is harmless in practice.

> > > +static int tegra_pcie_dw_probe(struct platform_device *pdev)
> > > +{
> > > +	struct tegra_pcie_dw *pcie;
> > > +	struct pcie_port *pp;
> > > +	struct dw_pcie *pci;
> > > +	struct phy **phy;
> > > +	struct resource	*dbi_res;
> > > +	struct resource	*atu_dma_res;
> > > +	const struct of_device_id *match;
> > > +	const struct tegra_pcie_of_data *data;
> > > +	char *name;
> > > +	int ret, i;
> > > +
> > > +	pcie = devm_kzalloc(&pdev->dev, sizeof(*pcie), GFP_KERNEL);
> > > +	if (!pcie)
> > > +		return -ENOMEM;
> > > +
> > > +	pci = &pcie->pci;
> > > +	pci->dev = &pdev->dev;
> > > +	pci->ops = &tegra_dw_pcie_ops;
> > > +	pp = &pci->pp;
> > > +	pcie->dev = &pdev->dev;
> > > +
> > > +	match = of_match_device(of_match_ptr(tegra_pcie_dw_of_match),
> > > +				&pdev->dev);
> > > +	if (!match)
> > > +		return -EINVAL;
> > 
> > Logically could be the first thing in the function since it doesn't
> > depend on anything.
> Done
> 
> > 
> > > +	data = (struct tegra_pcie_of_data *)match->data;

of_device_get_match_data() can help remove some of the above
boilerplate. Also, there's no reason to check for a failure with these
functions. The driver is OF-only and can only ever be probed if the
device exists, in which case match (or data for that matter) will never
be NULL.

> > I see that an earlier patch added "bus" to struct pcie_port.  I think
> > it would be better to somehow connect to the pci_host_bridge struct.
> > Several other drivers already do this; see uses of
> > pci_host_bridge_from_priv().
> All non-DesignWare based implementations save their private data structure
> in 'private' pointer of struct pci_host_bridge and use pci_host_bridge_from_priv()
> to get it back. But, DesignWare based implementations save pcie_port in 'sysdata'
> and nothing in 'private' pointer. So,  I'm not sure if pci_host_bridge_from_priv()
> can be used in this case. Please do let me know if you think otherwise.

If nothing is currently stored in the private pointer, why not do like
the other drivers and store the struct pci_host_bridge pointer there?

Thierry
Bjorn Helgaas April 2, 2019, 6:31 p.m. UTC | #12
On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> > On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
> > > Add support for Synopsys DesignWare core IP based PCIe host controller
> > > present in Tegra194 SoC.

> > > +#include "../../pcie/portdrv.h"
> > 
> > What's this for?  I didn't see any obvious uses of things from
> > portdrv.h, but I didn't actually try to build without it.
> This is for pcie_pme_disable_msi() API. Since this is defined in portdrv.h
> file, I'm including it here.

Hmm, OK, I missed that.  If you need pcie_pme_disable_msi(), you
definitely need portdrv.h.  But you're still a singleton in terms of
including it, so it leads to follow-up questions:

  - Why does this chip require pcie_pme_disable_msi()?  The only other
    use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
    ("PCI PM: Make it possible to force using INTx for PCIe PME
    signaling").

  - Is this a workaround for a Tegra194 defect?  If so, it should be
    separated out and identified as such.  Otherwise it will get
    copied to other places where it doesn't belong.

  - You only call it from the .runtime_resume() hook.  That doesn't
    make sense to me: if you never suspend, you never disable MSI for
    PME signaling.

> > > +	val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
> > > +	while (!(val_w & PCI_EXP_LNKSTA_DLLLA)) {
> > > +		if (!count) {
> > > +			val = readl(pcie->appl_base + APPL_DEBUG);
> > > +			val &= APPL_DEBUG_LTSSM_STATE_MASK;
> > > +			val >>= APPL_DEBUG_LTSSM_STATE_SHIFT;
> > > +			tmp = readl(pcie->appl_base + APPL_LINK_STATUS);
> > > +			tmp &= APPL_LINK_STATUS_RDLH_LINK_UP;
> > > +			if (val == 0x11 && !tmp) {
> > > +				dev_info(pci->dev, "link is down in DLL");
> > > +				dev_info(pci->dev,
> > > +					 "trying again with DLFE disabled\n");
> > > +				/* disable LTSSM */
> > > +				val = readl(pcie->appl_base + APPL_CTRL);
> > > +				val &= ~APPL_CTRL_LTSSM_EN;
> > > +				writel(val, pcie->appl_base + APPL_CTRL);
> > > +
> > > +				reset_control_assert(pcie->core_rst);
> > > +				reset_control_deassert(pcie->core_rst);
> > > +
> > > +				offset =
> > > +				dw_pcie_find_ext_capability(pci,
> > > +							    PCI_EXT_CAP_ID_DLF)
> > > +				+ PCI_DLF_CAP;
> > 
> > This capability offset doesn't change, does it?  Could it be computed
> > outside the loop?
> This is the only place where DLF offset is needed and gets calculated and this
> scenario is very rare as so far only a legacy ASMedia USB3.0 card requires DLF
> to be disabled to get PCIe link up. So, I thought of calculating the offset
> here itself instead of using a separate variable.

Hmm.  Sounds like this is essentially a quirk to work around some
hardware issue in Tegra194.

Is there some way you can pull this out into a separate function so it
doesn't clutter the normal path and it's more obvious that it's a
workaround?

> > > +				val = dw_pcie_readl_dbi(pci, offset);
> > > +				val &= ~DL_FEATURE_EXCHANGE_EN;
> > > +				dw_pcie_writel_dbi(pci, offset, val);
> > > +
> > > +				tegra_pcie_dw_host_init(&pcie->pci.pp);
> > 
> > This looks like some sort of "wait for link up" retry loop, but a
> > recursive call seems a little unusual.  My 5 second analysis is that
> > the loop could run this 200 times, and you sure don't want the
> > possibility of a 200-deep call chain.  Is there way to split out the
> > host init from the link-up polling?
> Again, this recursive calling comes into picture only for a legacy ASMedia
> USB3.0 card and it is going to be a 1-deep call chain as the recursion takes
> place only once depending on the condition. Apart from the legacy ASMedia card,
> there is no other card at this point in time out of a huge number of cards that we have
> tested.

We need to be able to analyze the code without spending time working
out what arcane PCIe spec details might limit this to fewer than 200
iterations, or relying on assumptions about how many cards have been
tested.

If you can restructure this so the "wait for link up" loop looks like
all the other drivers, and the DLF issue is separated out and just
causes a retry of the "wait for link up", I think that will help a
lot.

> > > +static void tegra_pcie_dw_scan_bus(struct pcie_port *pp)
> > > +{
> > > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
> > > +	u32 speed;
> > > +
> > > +	if (!tegra_pcie_dw_link_up(pci))
> > > +		return;
> > > +
> > > +	speed = (dw_pcie_readw_dbi(pci, CFG_LINK_STATUS) & PCI_EXP_LNKSTA_CLS);
> > > +	clk_set_rate(pcie->core_clk, pcie_gen_freq[speed - 1]);
> > 
> > I don't understand what's happening here.  This is named
> > tegra_pcie_dw_scan_bus(), but it doesn't actually scan anything.
> > Maybe it's just a bad name for the dw_pcie_host_ops hook
> > (ks_pcie_v3_65_scan_bus() is the only other .scan_bus()
> > implementation, and it doesn't scan anything either).
> > 
> > dw_pcie_host_init() calls pci_scan_root_bus_bridge(), which actually
> > *does* scan the bus, but it does it before calling
> > pp->ops->scan_bus().  I'd say by the time we get to
> > pci_scan_root_bus_bridge(), the device-specific init should be all
> > done and we should be using only generic PCI core interfaces.
> > 
> > Maybe this stuff could/should be done in the ->host_init() hook?  The
> > code between ->host_init() and ->scan_bus() is all generic code with
> > no device-specific stuff, so I don't know why we need both hooks.
> Agree. At least whatever I'm going here as part of scan_bus can be moved to
> host_init() itslef. I'm not sure about the original intention of the scan_bus
> but my understanding is that, after PCIe sub-system enumerates all devices, if
> something needs to be done, then, probably scan_bus() can be implemented for that.
> I had some other code initially which was accessing downstream devices, hence I
> implemented scan_bus(), but, now, given all that is gone, I can move this code to
> host_init() itself.

That'd be perfect.  I suspect ks_pcie_v3_65_scan_bus() is left over
from before the generic PCI core scan interface, and it could probably
be moved to host_init() as well.

> > > +static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
> > > +{
> > > +	struct pci_dev *pdev = NULL;
> > 
> > Unnecessary initialization.
> Done.
> 
> > > +	struct pci_bus *child;
> > > +	struct pcie_port *pp = &pcie->pci.pp;
> > > +
> > > +	list_for_each_entry(child, &pp->bus->children, node) {
> > > +		/* Bring downstream devices to D0 if they are not already in */
> > > +		if (child->parent == pp->bus) {
> > > +			pdev = pci_get_slot(child, PCI_DEVFN(0, 0));
> > > +			pci_dev_put(pdev);
> > > +			if (!pdev)
> > > +				break;
> > 
> > I don't really like this dance with iterating over the bus children,
> > comparing parent to pp->bus, pci_get_slot(), pci_dev_put(), etc.
> > 
> > I guess the idea is to bring only the directly-downstream devices to
> > D0, not to do it for things deeper in the hierarchy?
> Yes.
> 
> > Is this some Tegra-specific wrinkle?  I don't think other drivers do
> > this.
> With Tegra PCIe controller, if the downstream device is in non-D0 states,
> link doesn't go into L2 state. We observed this behavior with some of the
> devices and the solution would be to bring them to D0 state and then attempt
> sending PME_TurnOff message to put the link to L2 state.
> Since spec also supports this mechanism (Rev.4.0 Ver.1.0 Page #428), we chose
> to implement this.

Sounds like a Tegra oddity that should be documented as such so it
doesn't get copied elsewhere.

> > I see that an earlier patch added "bus" to struct pcie_port.  I think
> > it would be better to somehow connect to the pci_host_bridge struct.
> > Several other drivers already do this; see uses of
> > pci_host_bridge_from_priv().
> All non-DesignWare based implementations save their private data structure
> in 'private' pointer of struct pci_host_bridge and use pci_host_bridge_from_priv()
> to get it back. But, DesignWare based implementations save pcie_port in 'sysdata'
> and nothing in 'private' pointer. So,  I'm not sure if pci_host_bridge_from_priv()
> can be used in this case. Please do let me know if you think otherwise.

DesignWare-based drivers should have a way to retrieve the
pci_host_bridge pointer.  It doesn't have to be *exactly* the
same as non-DesignWare drivers, but it should be similar.

> > That would give you the bus, as well as flags like no_ext_tags,
> > native_aer, etc, which this driver, being a host bridge driver that's
> > responsible for this part of the firmware/OS interface, may
> > conceivably need.

> > > +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
> > > +{
> > > +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> > > +
> > > +	tegra_pcie_downstream_dev_to_D0(pcie);
> > > +
> > > +	pci_stop_root_bus(pcie->pci.pp.bus);
> > > +	pci_remove_root_bus(pcie->pci.pp.bus);
> > 
> > Why are you calling these?  No other drivers do this except in their
> > .remove() methods.  Is there something special about Tegra, or is this
> > something the other drivers *should* be doing?
> Since this API is called by remove, I'm removing the hierarchy to safely
> bring down all the devices. I'll have to re-visit this part as
> Jisheng Zhang's patches https://patchwork.kernel.org/project/linux-pci/list/?series=98559
> are now approved and I need to verify this part after cherry-picking
> Jisheng's changes.

Tegra194 should do this the same way as other drivers, independent of
Jisheng's changes.

Bjorn
Kishon Vijay Abraham I April 3, 2019, 8:05 a.m. UTC | #13
Hi,

On 26/03/19 8:43 PM, Vidya Sagar wrote:
> Synopsys DesignWare core based PCIe controllers in Tegra 194 SoC interface
> with Universal PHY (UPHY) module through a PIPE2UPHY (P2U) module.
> For each PCIe lane of a controller, there is a P2U unit instantiated at
> hardware level. This driver provides support for the programming required
> for each P2U that is going to be used for a PCIe controller.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  drivers/phy/tegra/Kconfig             |   7 ++
>  drivers/phy/tegra/Makefile            |   1 +
>  drivers/phy/tegra/pcie-p2u-tegra194.c | 138 ++++++++++++++++++++++++++++++++++
>  3 files changed, 146 insertions(+)
>  create mode 100644 drivers/phy/tegra/pcie-p2u-tegra194.c
> 
> diff --git a/drivers/phy/tegra/Kconfig b/drivers/phy/tegra/Kconfig
> index a3b1de953fb7..1460c060fa70 100644
> --- a/drivers/phy/tegra/Kconfig
> +++ b/drivers/phy/tegra/Kconfig
> @@ -6,3 +6,10 @@ config PHY_TEGRA_XUSB
>  
>  	  To compile this driver as a module, choose M here: the module will
>  	  be called phy-tegra-xusb.
> +
> +config PHY_TEGRA194_PCIE_P2U
> +        tristate "NVIDIA Tegra P2U PHY Driver"
> +        depends on ARCH_TEGRA

COMPILE_TEST
> +        select GENERIC_PHY
> +        help
> +          Enable this to support the P2U (PIPE to UPHY) that is part of Tegra 19x SOCs.
> diff --git a/drivers/phy/tegra/Makefile b/drivers/phy/tegra/Makefile
> index 898589238fd9..f85b2c86643d 100644
> --- a/drivers/phy/tegra/Makefile
> +++ b/drivers/phy/tegra/Makefile
> @@ -4,3 +4,4 @@ phy-tegra-xusb-y += xusb.o
>  phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_124_SOC) += xusb-tegra124.o
>  phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_132_SOC) += xusb-tegra124.o
>  phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_210_SOC) += xusb-tegra210.o
> +obj-$(CONFIG_PHY_TEGRA194_PCIE_P2U) += pcie-p2u-tegra194.o
> diff --git a/drivers/phy/tegra/pcie-p2u-tegra194.c b/drivers/phy/tegra/pcie-p2u-tegra194.c
> new file mode 100644
> index 000000000000..bb2412ec4765
> --- /dev/null
> +++ b/drivers/phy/tegra/pcie-p2u-tegra194.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * P2U (PIPE to UPHY) driver for Tegra T194 SoC
> + *
> + * Copyright (C) 2018 NVIDIA Corporation.

2019
> + *
> + * Author: Vidya Sagar <vidyas@nvidia.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/phy/phy.h>
> +#include <linux/of.h>
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/interrupt.h>
> +#include <linux/delay.h>
> +#include <linux/of_platform.h>
> +#include <soc/tegra/bpmp-abi.h>
> +
> +#define P2U_PERIODIC_EQ_CTRL_GEN3	0xc0
> +#define P2U_PERIODIC_EQ_CTRL_GEN3_PERIODIC_EQ_EN		BIT(0)
> +#define P2U_PERIODIC_EQ_CTRL_GEN3_INIT_PRESET_EQ_TRAIN_EN	BIT(1)
> +#define P2U_PERIODIC_EQ_CTRL_GEN4	0xc4
> +#define P2U_PERIODIC_EQ_CTRL_GEN4_INIT_PRESET_EQ_TRAIN_EN	BIT(1)
> +
> +#define P2U_RX_DEBOUNCE_TIME				0xa4
> +#define P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_MASK	0xFFFF
> +#define P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_VAL		160
> +
> +struct tegra_p2u {
> +	void __iomem		*base;
> +};
> +
> +static int tegra_p2u_power_off(struct phy *x)
> +{
> +	return 0;

Empty phy_ops are not required.
> +}
> +
> +static int tegra_p2u_power_on(struct phy *x)
> +{
> +	u32 val;
> +	struct tegra_p2u *phy = phy_get_drvdata(x);
> +
> +	val = readl(phy->base + P2U_PERIODIC_EQ_CTRL_GEN3);
> +	val &= ~P2U_PERIODIC_EQ_CTRL_GEN3_PERIODIC_EQ_EN;
> +	val |= P2U_PERIODIC_EQ_CTRL_GEN3_INIT_PRESET_EQ_TRAIN_EN;
> +	writel(val, phy->base + P2U_PERIODIC_EQ_CTRL_GEN3);
> +
> +	val = readl(phy->base + P2U_PERIODIC_EQ_CTRL_GEN4);
> +	val |= P2U_PERIODIC_EQ_CTRL_GEN4_INIT_PRESET_EQ_TRAIN_EN;
> +	writel(val, phy->base + P2U_PERIODIC_EQ_CTRL_GEN4);
> +
> +	val = readl(phy->base + P2U_RX_DEBOUNCE_TIME);
> +	val &= ~P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_MASK;
> +	val |= P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_VAL;
> +	writel(val, phy->base + P2U_RX_DEBOUNCE_TIME);

This looks more like a init configuration rather than power on.
> +
> +	return 0;
> +}
> +
> +static int tegra_p2u_init(struct phy *x)
> +{
> +	return 0;
> +}
> +
> +static int tegra_p2u_exit(struct phy *x)
> +{
> +	return 0;
> +}

Empty functions are not required.
> +
> +static const struct phy_ops ops = {
> +	.init		= tegra_p2u_init,
> +	.exit		= tegra_p2u_exit,
> +	.power_on	= tegra_p2u_power_on,
> +	.power_off	= tegra_p2u_power_off,
> +	.owner		= THIS_MODULE,
> +};
> +
> +static int tegra_p2u_probe(struct platform_device *pdev)
> +{
> +	struct tegra_p2u *phy;
> +	struct phy *generic_phy;
> +	struct phy_provider *phy_provider;
> +	struct device *dev = &pdev->dev;
> +	struct resource *res;
> +
> +	phy = devm_kzalloc(dev, sizeof(*phy), GFP_KERNEL);
> +	if (!phy)
> +		return -ENOMEM;
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "base");
> +	phy->base = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(phy->base))
> +		return PTR_ERR(phy->base);
> +
> +	platform_set_drvdata(pdev, phy);
> +
> +	generic_phy = devm_phy_create(dev, NULL, &ops);
> +	if (IS_ERR(generic_phy))
> +		return PTR_ERR(generic_phy);
> +
> +	phy_set_drvdata(generic_phy, phy);
> +
> +	phy_provider = devm_of_phy_provider_register(dev, of_phy_simple_xlate);
> +	if (IS_ERR(phy_provider))
> +		return PTR_ERR(phy_provider);
> +

return PTR_ERR_OR_ZERO(phy_provider);
> +	return 0;
> +}
> +
> +static int tegra_p2u_remove(struct platform_device *pdev)
> +{
> +	return 0;
> +}

not required.

Thanks
Kishon
Vidya Sagar April 3, 2019, 9:15 a.m. UTC | #14
On 4/2/2019 7:44 PM, Thierry Reding wrote:
> On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
>> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> [...]
>>>> +static int tegra_pcie_dw_host_init(struct pcie_port *pp)
>>>> +{
> [...]
>>>> +	val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
>>>> +	while (!(val_w & PCI_EXP_LNKSTA_DLLLA)) {
>>>> +		if (!count) {
>>>> +			val = readl(pcie->appl_base + APPL_DEBUG);
>>>> +			val &= APPL_DEBUG_LTSSM_STATE_MASK;
>>>> +			val >>= APPL_DEBUG_LTSSM_STATE_SHIFT;
>>>> +			tmp = readl(pcie->appl_base + APPL_LINK_STATUS);
>>>> +			tmp &= APPL_LINK_STATUS_RDLH_LINK_UP;
>>>> +			if (val == 0x11 && !tmp) {
>>>> +				dev_info(pci->dev, "link is down in DLL");
>>>> +				dev_info(pci->dev,
>>>> +					 "trying again with DLFE disabled\n");
>>>> +				/* disable LTSSM */
>>>> +				val = readl(pcie->appl_base + APPL_CTRL);
>>>> +				val &= ~APPL_CTRL_LTSSM_EN;
>>>> +				writel(val, pcie->appl_base + APPL_CTRL);
>>>> +
>>>> +				reset_control_assert(pcie->core_rst);
>>>> +				reset_control_deassert(pcie->core_rst);
>>>> +
>>>> +				offset =
>>>> +				dw_pcie_find_ext_capability(pci,
>>>> +							    PCI_EXT_CAP_ID_DLF)
>>>> +				+ PCI_DLF_CAP;
>>>
>>> This capability offset doesn't change, does it?  Could it be computed
>>> outside the loop?
>> This is the only place where DLF offset is needed and gets calculated and this
>> scenario is very rare as so far only a legacy ASMedia USB3.0 card requires DLF
>> to be disabled to get PCIe link up. So, I thought of calculating the offset
>> here itself instead of using a separate variable.
>>
>>>
>>>> +				val = dw_pcie_readl_dbi(pci, offset);
>>>> +				val &= ~DL_FEATURE_EXCHANGE_EN;
>>>> +				dw_pcie_writel_dbi(pci, offset, val);
>>>> +
>>>> +				tegra_pcie_dw_host_init(&pcie->pci.pp);
>>>
>>> This looks like some sort of "wait for link up" retry loop, but a
>>> recursive call seems a little unusual.  My 5 second analysis is that
>>> the loop could run this 200 times, and you sure don't want the
>>> possibility of a 200-deep call chain.  Is there way to split out the
>>> host init from the link-up polling?
>> Again, this recursive calling comes into picture only for a legacy ASMedia
>> USB3.0 card and it is going to be a 1-deep call chain as the recursion takes
>> place only once depending on the condition. Apart from the legacy ASMedia card,
>> there is no other card at this point in time out of a huge number of cards that we have
>> tested.
> 
> A more idiomatic way would be to add a "retry:" label somewhere and goto
> that after disabling DLFE. That way you achieve the same effect, but you
> can avoid the recursion, even if it is harmless in practice.
Initially I thought of using goto to keep it simple, but I thought it would be
discouraged and hence used recursion. But, yeah.. agree that goto would keep
it simple and I'll switch to goto now.

> 
>>>> +static int tegra_pcie_dw_probe(struct platform_device *pdev)
>>>> +{
>>>> +	struct tegra_pcie_dw *pcie;
>>>> +	struct pcie_port *pp;
>>>> +	struct dw_pcie *pci;
>>>> +	struct phy **phy;
>>>> +	struct resource	*dbi_res;
>>>> +	struct resource	*atu_dma_res;
>>>> +	const struct of_device_id *match;
>>>> +	const struct tegra_pcie_of_data *data;
>>>> +	char *name;
>>>> +	int ret, i;
>>>> +
>>>> +	pcie = devm_kzalloc(&pdev->dev, sizeof(*pcie), GFP_KERNEL);
>>>> +	if (!pcie)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	pci = &pcie->pci;
>>>> +	pci->dev = &pdev->dev;
>>>> +	pci->ops = &tegra_dw_pcie_ops;
>>>> +	pp = &pci->pp;
>>>> +	pcie->dev = &pdev->dev;
>>>> +
>>>> +	match = of_match_device(of_match_ptr(tegra_pcie_dw_of_match),
>>>> +				&pdev->dev);
>>>> +	if (!match)
>>>> +		return -EINVAL;
>>>
>>> Logically could be the first thing in the function since it doesn't
>>> depend on anything.
>> Done
>>
>>>
>>>> +	data = (struct tegra_pcie_of_data *)match->data;
> 
> of_device_get_match_data() can help remove some of the above
> boilerplate. Also, there's no reason to check for a failure with these
> functions. The driver is OF-only and can only ever be probed if the
> device exists, in which case match (or data for that matter) will never
> be NULL.
Done.

> 
>>> I see that an earlier patch added "bus" to struct pcie_port.  I think
>>> it would be better to somehow connect to the pci_host_bridge struct.
>>> Several other drivers already do this; see uses of
>>> pci_host_bridge_from_priv().
>> All non-DesignWare based implementations save their private data structure
>> in 'private' pointer of struct pci_host_bridge and use pci_host_bridge_from_priv()
>> to get it back. But, DesignWare based implementations save pcie_port in 'sysdata'
>> and nothing in 'private' pointer. So,  I'm not sure if pci_host_bridge_from_priv()
>> can be used in this case. Please do let me know if you think otherwise.
> 
> If nothing is currently stored in the private pointer, why not do like
> the other drivers and store the struct pci_host_bridge pointer there?
non-designware drivers get their private data allocated as part of pci_alloc_host_bridge()
by passing the size of their private structure and use pci_host_bridge_from_priv() to get
pointer to their own private structure (which is within struct pci_host_bridge).
Whereas in Designware core, we can get the memory for struct pcie_port much before than
calling pci_alloc_host_bridge() API, in fact, size '0' is passed as an argument to alloc API.
This is the reason why struct pcie_port pointer is saved in 'sysdata'.

> 
> Thierry
>
Vidya Sagar April 3, 2019, 9:43 a.m. UTC | #15
On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
> On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
>> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
>>> On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
>>>> Add support for Synopsys DesignWare core IP based PCIe host controller
>>>> present in Tegra194 SoC.
> 
>>>> +#include "../../pcie/portdrv.h"
>>>
>>> What's this for?  I didn't see any obvious uses of things from
>>> portdrv.h, but I didn't actually try to build without it.
>> This is for pcie_pme_disable_msi() API. Since this is defined in portdrv.h
>> file, I'm including it here.
> 
> Hmm, OK, I missed that.  If you need pcie_pme_disable_msi(), you
> definitely need portdrv.h.  But you're still a singleton in terms of
> including it, so it leads to follow-up questions:
> 
>    - Why does this chip require pcie_pme_disable_msi()?  The only other
>      use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
>      ("PCI PM: Make it possible to force using INTx for PCIe PME
>      signaling").
Because Tegra194 doesn't support raising PME interrupts through MSI line.
> 
>    - Is this a workaround for a Tegra194 defect?  If so, it should be
>      separated out and identified as such.  Otherwise it will get
>      copied to other places where it doesn't belong.
This is a guideline from the hardware team that MSI for PME should be disabled.
I'll make an explicit comment in the code that it is specific to Tegra194.
> 
>    - You only call it from the .runtime_resume() hook.  That doesn't
>      make sense to me: if you never suspend, you never disable MSI for
>      PME signaling.
.runtime_resume() not only gets called during resume, but also in normal path
as we are using PM framework and pm_runtime_get_sync() gets called finally
in the probe which executes .runtime_resume() hook.

> 
>>>> +	val_w = dw_pcie_readw_dbi(pci, CFG_LINK_STATUS);
>>>> +	while (!(val_w & PCI_EXP_LNKSTA_DLLLA)) {
>>>> +		if (!count) {
>>>> +			val = readl(pcie->appl_base + APPL_DEBUG);
>>>> +			val &= APPL_DEBUG_LTSSM_STATE_MASK;
>>>> +			val >>= APPL_DEBUG_LTSSM_STATE_SHIFT;
>>>> +			tmp = readl(pcie->appl_base + APPL_LINK_STATUS);
>>>> +			tmp &= APPL_LINK_STATUS_RDLH_LINK_UP;
>>>> +			if (val == 0x11 && !tmp) {
>>>> +				dev_info(pci->dev, "link is down in DLL");
>>>> +				dev_info(pci->dev,
>>>> +					 "trying again with DLFE disabled\n");
>>>> +				/* disable LTSSM */
>>>> +				val = readl(pcie->appl_base + APPL_CTRL);
>>>> +				val &= ~APPL_CTRL_LTSSM_EN;
>>>> +				writel(val, pcie->appl_base + APPL_CTRL);
>>>> +
>>>> +				reset_control_assert(pcie->core_rst);
>>>> +				reset_control_deassert(pcie->core_rst);
>>>> +
>>>> +				offset =
>>>> +				dw_pcie_find_ext_capability(pci,
>>>> +							    PCI_EXT_CAP_ID_DLF)
>>>> +				+ PCI_DLF_CAP;
>>>
>>> This capability offset doesn't change, does it?  Could it be computed
>>> outside the loop?
>> This is the only place where DLF offset is needed and gets calculated and this
>> scenario is very rare as so far only a legacy ASMedia USB3.0 card requires DLF
>> to be disabled to get PCIe link up. So, I thought of calculating the offset
>> here itself instead of using a separate variable.
> 
> Hmm.  Sounds like this is essentially a quirk to work around some
> hardware issue in Tegra194.
> 
> Is there some way you can pull this out into a separate function so it
> doesn't clutter the normal path and it's more obvious that it's a
> workaround?
> 
>>>> +				val = dw_pcie_readl_dbi(pci, offset);
>>>> +				val &= ~DL_FEATURE_EXCHANGE_EN;
>>>> +				dw_pcie_writel_dbi(pci, offset, val);
>>>> +
>>>> +				tegra_pcie_dw_host_init(&pcie->pci.pp);
>>>
>>> This looks like some sort of "wait for link up" retry loop, but a
>>> recursive call seems a little unusual.  My 5 second analysis is that
>>> the loop could run this 200 times, and you sure don't want the
>>> possibility of a 200-deep call chain.  Is there way to split out the
>>> host init from the link-up polling?
>> Again, this recursive calling comes into picture only for a legacy ASMedia
>> USB3.0 card and it is going to be a 1-deep call chain as the recursion takes
>> place only once depending on the condition. Apart from the legacy ASMedia card,
>> there is no other card at this point in time out of a huge number of cards that we have
>> tested.
> 
> We need to be able to analyze the code without spending time working
> out what arcane PCIe spec details might limit this to fewer than 200
> iterations, or relying on assumptions about how many cards have been
> tested.
> 
> If you can restructure this so the "wait for link up" loop looks like
> all the other drivers, and the DLF issue is separated out and just
> causes a retry of the "wait for link up", I think that will help a
> lot.
As per Thierry Reding's suggestion, I'll be using a goto statement to avoid
recursion and that should simplify things here.

> 
>>>> +static void tegra_pcie_dw_scan_bus(struct pcie_port *pp)
>>>> +{
>>>> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>>> +	struct tegra_pcie_dw *pcie = dw_pcie_to_tegra_pcie(pci);
>>>> +	u32 speed;
>>>> +
>>>> +	if (!tegra_pcie_dw_link_up(pci))
>>>> +		return;
>>>> +
>>>> +	speed = (dw_pcie_readw_dbi(pci, CFG_LINK_STATUS) & PCI_EXP_LNKSTA_CLS);
>>>> +	clk_set_rate(pcie->core_clk, pcie_gen_freq[speed - 1]);
>>>
>>> I don't understand what's happening here.  This is named
>>> tegra_pcie_dw_scan_bus(), but it doesn't actually scan anything.
>>> Maybe it's just a bad name for the dw_pcie_host_ops hook
>>> (ks_pcie_v3_65_scan_bus() is the only other .scan_bus()
>>> implementation, and it doesn't scan anything either).
>>>
>>> dw_pcie_host_init() calls pci_scan_root_bus_bridge(), which actually
>>> *does* scan the bus, but it does it before calling
>>> pp->ops->scan_bus().  I'd say by the time we get to
>>> pci_scan_root_bus_bridge(), the device-specific init should be all
>>> done and we should be using only generic PCI core interfaces.
>>>
>>> Maybe this stuff could/should be done in the ->host_init() hook?  The
>>> code between ->host_init() and ->scan_bus() is all generic code with
>>> no device-specific stuff, so I don't know why we need both hooks.
>> Agree. At least whatever I'm going here as part of scan_bus can be moved to
>> host_init() itslef. I'm not sure about the original intention of the scan_bus
>> but my understanding is that, after PCIe sub-system enumerates all devices, if
>> something needs to be done, then, probably scan_bus() can be implemented for that.
>> I had some other code initially which was accessing downstream devices, hence I
>> implemented scan_bus(), but, now, given all that is gone, I can move this code to
>> host_init() itself.
> 
> That'd be perfect.  I suspect ks_pcie_v3_65_scan_bus() is left over
> from before the generic PCI core scan interface, and it could probably
> be moved to host_init() as well.
I think so.

> 
>>>> +static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
>>>> +{
>>>> +	struct pci_dev *pdev = NULL;
>>>
>>> Unnecessary initialization.
>> Done.
>>
>>>> +	struct pci_bus *child;
>>>> +	struct pcie_port *pp = &pcie->pci.pp;
>>>> +
>>>> +	list_for_each_entry(child, &pp->bus->children, node) {
>>>> +		/* Bring downstream devices to D0 if they are not already in */
>>>> +		if (child->parent == pp->bus) {
>>>> +			pdev = pci_get_slot(child, PCI_DEVFN(0, 0));
>>>> +			pci_dev_put(pdev);
>>>> +			if (!pdev)
>>>> +				break;
>>>
>>> I don't really like this dance with iterating over the bus children,
>>> comparing parent to pp->bus, pci_get_slot(), pci_dev_put(), etc.
>>>
>>> I guess the idea is to bring only the directly-downstream devices to
>>> D0, not to do it for things deeper in the hierarchy?
>> Yes.
>>
>>> Is this some Tegra-specific wrinkle?  I don't think other drivers do
>>> this.
>> With Tegra PCIe controller, if the downstream device is in non-D0 states,
>> link doesn't go into L2 state. We observed this behavior with some of the
>> devices and the solution would be to bring them to D0 state and then attempt
>> sending PME_TurnOff message to put the link to L2 state.
>> Since spec also supports this mechanism (Rev.4.0 Ver.1.0 Page #428), we chose
>> to implement this.
> 
> Sounds like a Tegra oddity that should be documented as such so it
> doesn't get copied elsewhere.
I'll add a comment explicitly to state the same.

> 
>>> I see that an earlier patch added "bus" to struct pcie_port.  I think
>>> it would be better to somehow connect to the pci_host_bridge struct.
>>> Several other drivers already do this; see uses of
>>> pci_host_bridge_from_priv().
>> All non-DesignWare based implementations save their private data structure
>> in 'private' pointer of struct pci_host_bridge and use pci_host_bridge_from_priv()
>> to get it back. But, DesignWare based implementations save pcie_port in 'sysdata'
>> and nothing in 'private' pointer. So,  I'm not sure if pci_host_bridge_from_priv()
>> can be used in this case. Please do let me know if you think otherwise.
> 
> DesignWare-based drivers should have a way to retrieve the
> pci_host_bridge pointer.  It doesn't have to be *exactly* the
> same as non-DesignWare drivers, but it should be similar.
I gave my reasoning as to why with the current code, it is not possible to get the
pci_host_bridge structure pointer from struct pcie_port pointer in another thread as
a reply to Thierry Reding's comments. Since Jishen'g changes to support remove functionality
are accepted, I think using bus pointer saved in struct pcie_port pointer shouldn't be
any issue now. Please do let me know if that is something not acceptable.

> 
>>> That would give you the bus, as well as flags like no_ext_tags,
>>> native_aer, etc, which this driver, being a host bridge driver that's
>>> responsible for this part of the firmware/OS interface, may
>>> conceivably need.
> 
>>>> +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
>>>> +{
>>>> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
>>>> +
>>>> +	tegra_pcie_downstream_dev_to_D0(pcie);
>>>> +
>>>> +	pci_stop_root_bus(pcie->pci.pp.bus);
>>>> +	pci_remove_root_bus(pcie->pci.pp.bus);
>>>
>>> Why are you calling these?  No other drivers do this except in their
>>> .remove() methods.  Is there something special about Tegra, or is this
>>> something the other drivers *should* be doing?
>> Since this API is called by remove, I'm removing the hierarchy to safely
>> bring down all the devices. I'll have to re-visit this part as
>> Jisheng Zhang's patches https://patchwork.kernel.org/project/linux-pci/list/?series=98559
>> are now approved and I need to verify this part after cherry-picking
>> Jisheng's changes.
> 
> Tegra194 should do this the same way as other drivers, independent of
> Jisheng's changes.
When other Designware implementations add remove functionality, even they should
be calling these APIs (Jisheng also mentioned the same in his commit message)

> 
> Bjorn
>
Vidya Sagar April 3, 2019, 10:45 a.m. UTC | #16
On 4/3/2019 1:35 PM, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On 26/03/19 8:43 PM, Vidya Sagar wrote:
>> Synopsys DesignWare core based PCIe controllers in Tegra 194 SoC interface
>> with Universal PHY (UPHY) module through a PIPE2UPHY (P2U) module.
>> For each PCIe lane of a controller, there is a P2U unit instantiated at
>> hardware level. This driver provides support for the programming required
>> for each P2U that is going to be used for a PCIe controller.
>>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>>   drivers/phy/tegra/Kconfig             |   7 ++
>>   drivers/phy/tegra/Makefile            |   1 +
>>   drivers/phy/tegra/pcie-p2u-tegra194.c | 138 ++++++++++++++++++++++++++++++++++
>>   3 files changed, 146 insertions(+)
>>   create mode 100644 drivers/phy/tegra/pcie-p2u-tegra194.c
>>
>> diff --git a/drivers/phy/tegra/Kconfig b/drivers/phy/tegra/Kconfig
>> index a3b1de953fb7..1460c060fa70 100644
>> --- a/drivers/phy/tegra/Kconfig
>> +++ b/drivers/phy/tegra/Kconfig
>> @@ -6,3 +6,10 @@ config PHY_TEGRA_XUSB
>>   
>>   	  To compile this driver as a module, choose M here: the module will
>>   	  be called phy-tegra-xusb.
>> +
>> +config PHY_TEGRA194_PCIE_P2U
>> +        tristate "NVIDIA Tegra P2U PHY Driver"
>> +        depends on ARCH_TEGRA
> 
> COMPILE_TEST
Done.

>> +        select GENERIC_PHY
>> +        help
>> +          Enable this to support the P2U (PIPE to UPHY) that is part of Tegra 19x SOCs.
>> diff --git a/drivers/phy/tegra/Makefile b/drivers/phy/tegra/Makefile
>> index 898589238fd9..f85b2c86643d 100644
>> --- a/drivers/phy/tegra/Makefile
>> +++ b/drivers/phy/tegra/Makefile
>> @@ -4,3 +4,4 @@ phy-tegra-xusb-y += xusb.o
>>   phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_124_SOC) += xusb-tegra124.o
>>   phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_132_SOC) += xusb-tegra124.o
>>   phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_210_SOC) += xusb-tegra210.o
>> +obj-$(CONFIG_PHY_TEGRA194_PCIE_P2U) += pcie-p2u-tegra194.o
>> diff --git a/drivers/phy/tegra/pcie-p2u-tegra194.c b/drivers/phy/tegra/pcie-p2u-tegra194.c
>> new file mode 100644
>> index 000000000000..bb2412ec4765
>> --- /dev/null
>> +++ b/drivers/phy/tegra/pcie-p2u-tegra194.c
>> @@ -0,0 +1,138 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * P2U (PIPE to UPHY) driver for Tegra T194 SoC
>> + *
>> + * Copyright (C) 2018 NVIDIA Corporation.
> 
> 2019
Done.

>> + *
>> + * Author: Vidya Sagar <vidyas@nvidia.com>
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/phy/phy.h>
>> +#include <linux/of.h>
>> +#include <linux/err.h>
>> +#include <linux/io.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/delay.h>
>> +#include <linux/of_platform.h>
>> +#include <soc/tegra/bpmp-abi.h>
>> +
>> +#define P2U_PERIODIC_EQ_CTRL_GEN3	0xc0
>> +#define P2U_PERIODIC_EQ_CTRL_GEN3_PERIODIC_EQ_EN		BIT(0)
>> +#define P2U_PERIODIC_EQ_CTRL_GEN3_INIT_PRESET_EQ_TRAIN_EN	BIT(1)
>> +#define P2U_PERIODIC_EQ_CTRL_GEN4	0xc4
>> +#define P2U_PERIODIC_EQ_CTRL_GEN4_INIT_PRESET_EQ_TRAIN_EN	BIT(1)
>> +
>> +#define P2U_RX_DEBOUNCE_TIME				0xa4
>> +#define P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_MASK	0xFFFF
>> +#define P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_VAL		160
>> +
>> +struct tegra_p2u {
>> +	void __iomem		*base;
>> +};
>> +
>> +static int tegra_p2u_power_off(struct phy *x)
>> +{
>> +	return 0;
> 
> Empty phy_ops are not required.
Done.

>> +}
>> +
>> +static int tegra_p2u_power_on(struct phy *x)
>> +{
>> +	u32 val;
>> +	struct tegra_p2u *phy = phy_get_drvdata(x);
>> +
>> +	val = readl(phy->base + P2U_PERIODIC_EQ_CTRL_GEN3);
>> +	val &= ~P2U_PERIODIC_EQ_CTRL_GEN3_PERIODIC_EQ_EN;
>> +	val |= P2U_PERIODIC_EQ_CTRL_GEN3_INIT_PRESET_EQ_TRAIN_EN;
>> +	writel(val, phy->base + P2U_PERIODIC_EQ_CTRL_GEN3);
>> +
>> +	val = readl(phy->base + P2U_PERIODIC_EQ_CTRL_GEN4);
>> +	val |= P2U_PERIODIC_EQ_CTRL_GEN4_INIT_PRESET_EQ_TRAIN_EN;
>> +	writel(val, phy->base + P2U_PERIODIC_EQ_CTRL_GEN4);
>> +
>> +	val = readl(phy->base + P2U_RX_DEBOUNCE_TIME);
>> +	val &= ~P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_MASK;
>> +	val |= P2U_RX_DEBOUNCE_TIME_DEBOUNCE_TIMER_VAL;
>> +	writel(val, phy->base + P2U_RX_DEBOUNCE_TIME);
> 
> This looks more like a init configuration rather than power on.
>> +
>> +	return 0;
>> +}
>> +
>> +static int tegra_p2u_init(struct phy *x)
>> +{
>> +	return 0;
>> +}
>> +
>> +static int tegra_p2u_exit(struct phy *x)
>> +{
>> +	return 0;
>> +}
> 
> Empty functions are not required.
Done.

>> +
>> +static const struct phy_ops ops = {
>> +	.init		= tegra_p2u_init,
>> +	.exit		= tegra_p2u_exit,
>> +	.power_on	= tegra_p2u_power_on,
>> +	.power_off	= tegra_p2u_power_off,
>> +	.owner		= THIS_MODULE,
>> +};
>> +
>> +static int tegra_p2u_probe(struct platform_device *pdev)
>> +{
>> +	struct tegra_p2u *phy;
>> +	struct phy *generic_phy;
>> +	struct phy_provider *phy_provider;
>> +	struct device *dev = &pdev->dev;
>> +	struct resource *res;
>> +
>> +	phy = devm_kzalloc(dev, sizeof(*phy), GFP_KERNEL);
>> +	if (!phy)
>> +		return -ENOMEM;
>> +
>> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "base");
>> +	phy->base = devm_ioremap_resource(dev, res);
>> +	if (IS_ERR(phy->base))
>> +		return PTR_ERR(phy->base);
>> +
>> +	platform_set_drvdata(pdev, phy);
>> +
>> +	generic_phy = devm_phy_create(dev, NULL, &ops);
>> +	if (IS_ERR(generic_phy))
>> +		return PTR_ERR(generic_phy);
>> +
>> +	phy_set_drvdata(generic_phy, phy);
>> +
>> +	phy_provider = devm_of_phy_provider_register(dev, of_phy_simple_xlate);
>> +	if (IS_ERR(phy_provider))
>> +		return PTR_ERR(phy_provider);
>> +
> 
> return PTR_ERR_OR_ZERO(phy_provider);
Done.

>> +	return 0;
>> +}
>> +
>> +static int tegra_p2u_remove(struct platform_device *pdev)
>> +{
>> +	return 0;
>> +}
> 
> not required.
I think this is still required as this driver can be made as a module right?

> 
> Thanks
> Kishon
>
Bjorn Helgaas April 3, 2019, 5:36 p.m. UTC | #17
On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
> On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
> > On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
> > > On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> > > > On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
> > > > > Add support for Synopsys DesignWare core IP based PCIe host controller
> > > > > present in Tegra194 SoC.
> > 
> >    - Why does this chip require pcie_pme_disable_msi()?  The only other
> >      use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
> >      ("PCI PM: Make it possible to force using INTx for PCIe PME
> >      signaling").
>
> Because Tegra194 doesn't support raising PME interrupts through MSI line.

What does the spec say about this?  Is hardware supposed to support
MSI for PME?  Given that MSI Wind U-100 and Tegra194 are the only two
cases we know about where PME via MSI isn't supported, it seems like
there must be either a requirement for that or some mechanism for the
OS to figure this out, e.g., a capability bit.

> > > > I see that an earlier patch added "bus" to struct pcie_port.
> > > > I think it would be better to somehow connect to the
> > > > pci_host_bridge struct.  Several other drivers already do
> > > > this; see uses of pci_host_bridge_from_priv().
> > >
> > > All non-DesignWare based implementations save their private data
> > > structure in 'private' pointer of struct pci_host_bridge and use
> > > pci_host_bridge_from_priv() to get it back. But, DesignWare
> > > based implementations save pcie_port in 'sysdata' and nothing in
> > > 'private' pointer. So,  I'm not sure if
> > > pci_host_bridge_from_priv() can be used in this case. Please do
> > > let me know if you think otherwise.
> > 
> > DesignWare-based drivers should have a way to retrieve the
> > pci_host_bridge pointer.  It doesn't have to be *exactly* the same
> > as non-DesignWare drivers, but it should be similar.
>
> I gave my reasoning as to why with the current code, it is not
> possible to get the pci_host_bridge structure pointer from struct
> pcie_port pointer in another thread as a reply to Thierry Reding's
> comments. Since Jishen'g changes to support remove functionality are
> accepted, I think using bus pointer saved in struct pcie_port
> pointer shouldn't be any issue now. Please do let me know if that is
> something not acceptable.
>
> > > > That would give you the bus, as well as flags like
> > > > no_ext_tags, native_aer, etc, which this driver, being a host
> > > > bridge driver that's responsible for this part of the
> > > > firmware/OS interface, may conceivably need.

I think saving the pp->root_bus pointer as Jisheng's patch does is a
sub-optimal solution.  If we figure out how to save the
pci_host_bridge pointer, we automatically get the root bus pointer as
well.

It may require some restructuring to save the pci_host_bridge pointer,
but I doubt it's really *impossible*.

> > > > > +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
> > > > > +{
> > > > > +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> > > > > +
> > > > > +	tegra_pcie_downstream_dev_to_D0(pcie);
> > > > > +
> > > > > +	pci_stop_root_bus(pcie->pci.pp.bus);
> > > > > +	pci_remove_root_bus(pcie->pci.pp.bus);
> > > > 
> > > > Why are you calling these?  No other drivers do this except in
> > > > their .remove() methods.  Is there something special about
> > > > Tegra, or is this something the other drivers *should* be
> > > > doing?
> > >
> > > Since this API is called by remove, I'm removing the hierarchy
> > > to safely bring down all the devices. I'll have to re-visit this
> > > part as Jisheng Zhang's patches
> > > https://patchwork.kernel.org/project/linux-pci/list/?series=98559
> > > are now approved and I need to verify this part after
> > > cherry-picking Jisheng's changes.
> > 
> > Tegra194 should do this the same way as other drivers, independent
> > of Jisheng's changes.
>
> When other Designware implementations add remove functionality, even
> they should be calling these APIs (Jisheng also mentioned the same
> in his commit message)

My point is that these APIs should be called from driver .remove()
methods, not from .runtime_suspend() methods.

Bjorn
Vidya Sagar April 4, 2019, 7:53 p.m. UTC | #18
On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
> On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
>> On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
>>> On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
>>>> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
>>>>> On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
>>>>>> Add support for Synopsys DesignWare core IP based PCIe host controller
>>>>>> present in Tegra194 SoC.
>>>
>>>     - Why does this chip require pcie_pme_disable_msi()?  The only other
>>>       use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
>>>       ("PCI PM: Make it possible to force using INTx for PCIe PME
>>>       signaling").
>>
>> Because Tegra194 doesn't support raising PME interrupts through MSI line.
> 
> What does the spec say about this?  Is hardware supposed to support
> MSI for PME?  Given that MSI Wind U-100 and Tegra194 are the only two
> cases we know about where PME via MSI isn't supported, it seems like
> there must be either a requirement for that or some mechanism for the
> OS to figure this out, e.g., a capability bit.
AFAIU, Spec doesn't say anything about whether PME interrupt should be through MSI
or by other means. As far as Tegra194 is concerned, there are only two
interrupt lanes that go from PCIe IP to GIC, one being legacy interrupt (that
represents legacy interrupts coming over PCIe bus from downstream devices and
also the events happening internal to root port) and the other being MSI
interrupt (that represents MSI interrupts coming over PCIe bus from downstream
devices). Since hardware folks had a choice to route PME interrupts either
through legacy interrupt line or through MSI interrupt line and legacy interrupt
line was chosen as a design choice. That being the case at hardware level, I tried
to inform the same through pcie_pme_disable_msi() to PCIe sub-system that
PME interrupts are not expected over MSI.

> 
>>>>> I see that an earlier patch added "bus" to struct pcie_port.
>>>>> I think it would be better to somehow connect to the
>>>>> pci_host_bridge struct.  Several other drivers already do
>>>>> this; see uses of pci_host_bridge_from_priv().
>>>>
>>>> All non-DesignWare based implementations save their private data
>>>> structure in 'private' pointer of struct pci_host_bridge and use
>>>> pci_host_bridge_from_priv() to get it back. But, DesignWare
>>>> based implementations save pcie_port in 'sysdata' and nothing in
>>>> 'private' pointer. So,  I'm not sure if
>>>> pci_host_bridge_from_priv() can be used in this case. Please do
>>>> let me know if you think otherwise.
>>>
>>> DesignWare-based drivers should have a way to retrieve the
>>> pci_host_bridge pointer.  It doesn't have to be *exactly* the same
>>> as non-DesignWare drivers, but it should be similar.
>>
>> I gave my reasoning as to why with the current code, it is not
>> possible to get the pci_host_bridge structure pointer from struct
>> pcie_port pointer in another thread as a reply to Thierry Reding's
>> comments. Since Jishen'g changes to support remove functionality are
>> accepted, I think using bus pointer saved in struct pcie_port
>> pointer shouldn't be any issue now. Please do let me know if that is
>> something not acceptable.
>>
>>>>> That would give you the bus, as well as flags like
>>>>> no_ext_tags, native_aer, etc, which this driver, being a host
>>>>> bridge driver that's responsible for this part of the
>>>>> firmware/OS interface, may conceivably need.
> 
> I think saving the pp->root_bus pointer as Jisheng's patch does is a
> sub-optimal solution.  If we figure out how to save the
> pci_host_bridge pointer, we automatically get the root bus pointer as
> well.
> 
> It may require some restructuring to save the pci_host_bridge pointer,
> but I doubt it's really *impossible*.
Is it OK to save pci_host_bridge pointer in pcie_port structure directly? I see
that as another way to get pci_host_bridge pointer from pcie_port
structure. My understanding is that, to get pci_host_bridge pointer, either
pcie_port structure should be part of pci_host_bridge structure (which is the
case with all non-DW implementations) or pcie_port should have a pointer to
some structure which is directly (and not by means of a pointer) part of
pci_host_bridge structure so that container_of() can be used to get pci_host_bridge.
As I see, there is no data object of pci_host_bridge whose pointer is saved in
pcie_port structure. In fact, in reverse, pcie_port's struct dev pointer is saved
as parent to pci_host_bridge's struct dev. So, another way would be to iterate over
the children of pcie_port's struct dev pointer to find pci_host_bridge's dev pointer
and from there get pci_host_bridge through container_of. But, I think is complicating
it more than using bus pointer from pcie_port. I'm not sure if I'm able to convey the
issue I'm facing here to get pci_host_bridge from pcie_port, but doing any other thing seems complicating it even more.

> 
>>>>>> +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
>>>>>> +{
>>>>>> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
>>>>>> +
>>>>>> +	tegra_pcie_downstream_dev_to_D0(pcie);
>>>>>> +
>>>>>> +	pci_stop_root_bus(pcie->pci.pp.bus);
>>>>>> +	pci_remove_root_bus(pcie->pci.pp.bus);
>>>>>
>>>>> Why are you calling these?  No other drivers do this except in
>>>>> their .remove() methods.  Is there something special about
>>>>> Tegra, or is this something the other drivers *should* be
>>>>> doing?
>>>>
>>>> Since this API is called by remove, I'm removing the hierarchy
>>>> to safely bring down all the devices. I'll have to re-visit this
>>>> part as Jisheng Zhang's patches
>>>> https://patchwork.kernel.org/project/linux-pci/list/?series=98559
>>>> are now approved and I need to verify this part after
>>>> cherry-picking Jisheng's changes.
>>>
>>> Tegra194 should do this the same way as other drivers, independent
>>> of Jisheng's changes.
>>
>> When other Designware implementations add remove functionality, even
>> they should be calling these APIs (Jisheng also mentioned the same
>> in his commit message)
> 
> My point is that these APIs should be called from driver .remove()
> methods, not from .runtime_suspend() methods.
.remove() internally calls pm_runtime_put_sync() API which calls
.runtime_suspend(). I made a new patch to add a host_deinit() call
which make all these calls. Since host_init() is called from inside
.runtime_resume() of this driver, to be in sync, I'm now calling
host_deinit() from inside .runtime_suspend() API.
  
> 
> Bjorn
>
Bjorn Helgaas April 5, 2019, 6:58 p.m. UTC | #19
On Fri, Apr 05, 2019 at 01:23:51AM +0530, Vidya Sagar wrote:
> On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
> > On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
> > > On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
> > > > On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
> > > > > On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> > > > > > On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
> > > > > > > Add support for Synopsys DesignWare core IP based PCIe host controller
> > > > > > > present in Tegra194 SoC.
> > > > 
> > > >     - Why does this chip require pcie_pme_disable_msi()?  The only other
> > > >       use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
> > > >       ("PCI PM: Make it possible to force using INTx for PCIe PME
> > > >       signaling").
> > > 
> > > Because Tegra194 doesn't support raising PME interrupts through MSI line.
> > 
> > What does the spec say about this?  Is hardware supposed to
> > support MSI for PME?  Given that MSI Wind U-100 and Tegra194 are
> > the only two cases we know about where PME via MSI isn't
> > supported, it seems like there must be either a requirement for
> > that or some mechanism for the OS to figure this out, e.g., a
> > capability bit.
>
> AFAIU, Spec doesn't say anything about whether PME interrupt should
> be through MSI or by other means. As far as Tegra194 is concerned,
> there are only two interrupt lanes that go from PCIe IP to GIC, one
> being legacy interrupt (that represents legacy interrupts coming
> over PCIe bus from downstream devices and also the events happening
> internal to root port) and the other being MSI interrupt (that
> represents MSI interrupts coming over PCIe bus from downstream
> devices). Since hardware folks had a choice to route PME interrupts
> either through legacy interrupt line or through MSI interrupt line
> and legacy interrupt line was chosen as a design choice. That being
> the case at hardware level, I tried to inform the same through
> pcie_pme_disable_msi() to PCIe sub-system that PME interrupts are
> not expected over MSI.

There's something wrong here.  Either the question of how PME is
signaled is generic and the spec provides a way for the OS to discover
that method, or it's part of the device-specific architecture that
each host bridge driver has to know about its device.  If the former,
we need to make the PCI core smart enough to figure it out.  If the
latter, we need a better interface than this ad hoc
pcie_pme_disable_msi() thing.  But if it is truly the latter, your
current code is sufficient and we can refine it over time.

> > > > > > I see that an earlier patch added "bus" to struct pcie_port.
> > > > > > I think it would be better to somehow connect to the
> > > > > > pci_host_bridge struct.  Several other drivers already do
> > > > > > this; see uses of pci_host_bridge_from_priv().
> > > > > 
> > > > > All non-DesignWare based implementations save their private data
> > > > > structure in 'private' pointer of struct pci_host_bridge and use
> > > > > pci_host_bridge_from_priv() to get it back. But, DesignWare
> > > > > based implementations save pcie_port in 'sysdata' and nothing in
> > > > > 'private' pointer. So,  I'm not sure if
> > > > > pci_host_bridge_from_priv() can be used in this case. Please do
> > > > > let me know if you think otherwise.
> > > > 
> > > > DesignWare-based drivers should have a way to retrieve the
> > > > pci_host_bridge pointer.  It doesn't have to be *exactly* the same
> > > > as non-DesignWare drivers, but it should be similar.
> > > 
> > > I gave my reasoning as to why with the current code, it is not
> > > possible to get the pci_host_bridge structure pointer from struct
> > > pcie_port pointer in another thread as a reply to Thierry Reding's
> > > comments. Since Jishen'g changes to support remove functionality are
> > > accepted, I think using bus pointer saved in struct pcie_port
> > > pointer shouldn't be any issue now. Please do let me know if that is
> > > something not acceptable.
> > > 
> > > > > > That would give you the bus, as well as flags like
> > > > > > no_ext_tags, native_aer, etc, which this driver, being a host
> > > > > > bridge driver that's responsible for this part of the
> > > > > > firmware/OS interface, may conceivably need.
> > 
> > I think saving the pp->root_bus pointer as Jisheng's patch does is a
> > sub-optimal solution.  If we figure out how to save the
> > pci_host_bridge pointer, we automatically get the root bus pointer as
> > well.
> > 
> > It may require some restructuring to save the pci_host_bridge pointer,
> > but I doubt it's really *impossible*.
>
> Is it OK to save pci_host_bridge pointer in pcie_port structure
> directly? I see that as another way to get pci_host_bridge pointer
> from pcie_port structure. My understanding is that, to get
> pci_host_bridge pointer, either pcie_port structure should be part
> of pci_host_bridge structure (which is the case with all non-DW
> implementations) or pcie_port should have a pointer to some
> structure which is directly (and not by means of a pointer) part of
> pci_host_bridge structure so that container_of() can be used to get
> pci_host_bridge.  As I see, there is no data object of
> pci_host_bridge whose pointer is saved in pcie_port structure. In
> fact, in reverse, pcie_port's struct dev pointer is saved as parent
> to pci_host_bridge's struct dev. So, another way would be to iterate
> over the children of pcie_port's struct dev pointer to find
> pci_host_bridge's dev pointer and from there get pci_host_bridge
> through container_of. But, I think is complicating it more than
> using bus pointer from pcie_port. I'm not sure if I'm able to convey
> the issue I'm facing here to get pci_host_bridge from pcie_port, but
> doing any other thing seems complicating it even more.

What I suspect should happen eventually is the DWC driver should call
devm_pci_alloc_host_bridge() directly, as all the non-DWC drivers do.
That would require a little reorganization of the DWC data structures,
but it would be good to be more consistent.

For now, I think stashing the pointer in pcie_port or dw_pcie would be
OK.  I'm not 100% clear on the difference, but it looks like either
should be common across all the DWC drivers, which is what we want.

(Tangent, dw_pcie_host_init() currently uses pci_alloc_host_bridge(),
not devm_pci_alloc_host_bridge(), even though it uses other devm
interfaces.  This looks like a probable buglet.)

(Tangent 2, devm_pci_alloc_host_bridge() doesn't initialize
bridge->native_aer, etc, as pci_alloc_host_bridge() does.  This looks
like my fault from 02bfeb484230 ("PCI/portdrv: Simplify PCIe feature
permission checking"), and it probably means none of those PCIe
services work correctly for these native host bridge drivers.)

> > > > > > > +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
> > > > > > > +{
> > > > > > > +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> > > > > > > +
> > > > > > > +	tegra_pcie_downstream_dev_to_D0(pcie);
> > > > > > > +
> > > > > > > +	pci_stop_root_bus(pcie->pci.pp.bus);
> > > > > > > +	pci_remove_root_bus(pcie->pci.pp.bus);
> > > > > > 
> > > > > > Why are you calling these?  No other drivers do this except in
> > > > > > their .remove() methods.  Is there something special about
> > > > > > Tegra, or is this something the other drivers *should* be
> > > > > > doing?
> > > > > 
> > > > > Since this API is called by remove, I'm removing the hierarchy
> > > > > to safely bring down all the devices. I'll have to re-visit this
> > > > > part as Jisheng Zhang's patches
> > > > > https://patchwork.kernel.org/project/linux-pci/list/?series=98559
> > > > > are now approved and I need to verify this part after
> > > > > cherry-picking Jisheng's changes.
> > > > 
> > > > Tegra194 should do this the same way as other drivers, independent
> > > > of Jisheng's changes.
> > > 
> > > When other Designware implementations add remove functionality, even
> > > they should be calling these APIs (Jisheng also mentioned the same
> > > in his commit message)
> > 
> > My point is that these APIs should be called from driver .remove()
> > methods, not from .runtime_suspend() methods.
>
> .remove() internally calls pm_runtime_put_sync() API which calls
> .runtime_suspend(). I made a new patch to add a host_deinit() call
> which make all these calls. Since host_init() is called from inside
> .runtime_resume() of this driver, to be in sync, I'm now calling
> host_deinit() from inside .runtime_suspend() API.

I think this is wrong.  pci_stop_root_bus() will detach all the
drivers from all the devices.  We don't want to do that if we're
merely runtime suspending the host bridge, do we?

Bjorn
Vidya Sagar April 9, 2019, 11:30 a.m. UTC | #20
On 4/6/2019 12:28 AM, Bjorn Helgaas wrote:
> On Fri, Apr 05, 2019 at 01:23:51AM +0530, Vidya Sagar wrote:
>> On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
>>> On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
>>>> On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
>>>>> On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
>>>>>> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
>>>>>>> On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
>>>>>>>> Add support for Synopsys DesignWare core IP based PCIe host controller
>>>>>>>> present in Tegra194 SoC.
>>>>>
>>>>>      - Why does this chip require pcie_pme_disable_msi()?  The only other
>>>>>        use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
>>>>>        ("PCI PM: Make it possible to force using INTx for PCIe PME
>>>>>        signaling").
>>>>
>>>> Because Tegra194 doesn't support raising PME interrupts through MSI line.
>>>
>>> What does the spec say about this?  Is hardware supposed to
>>> support MSI for PME?  Given that MSI Wind U-100 and Tegra194 are
>>> the only two cases we know about where PME via MSI isn't
>>> supported, it seems like there must be either a requirement for
>>> that or some mechanism for the OS to figure this out, e.g., a
>>> capability bit.
>>
>> AFAIU, Spec doesn't say anything about whether PME interrupt should
>> be through MSI or by other means. As far as Tegra194 is concerned,
>> there are only two interrupt lanes that go from PCIe IP to GIC, one
>> being legacy interrupt (that represents legacy interrupts coming
>> over PCIe bus from downstream devices and also the events happening
>> internal to root port) and the other being MSI interrupt (that
>> represents MSI interrupts coming over PCIe bus from downstream
>> devices). Since hardware folks had a choice to route PME interrupts
>> either through legacy interrupt line or through MSI interrupt line
>> and legacy interrupt line was chosen as a design choice. That being
>> the case at hardware level, I tried to inform the same through
>> pcie_pme_disable_msi() to PCIe sub-system that PME interrupts are
>> not expected over MSI.
> 
> There's something wrong here.  Either the question of how PME is
> signaled is generic and the spec provides a way for the OS to discover
> that method, or it's part of the device-specific architecture that
> each host bridge driver has to know about its device.  If the former,
> we need to make the PCI core smart enough to figure it out.  If the
> latter, we need a better interface than this ad hoc
> pcie_pme_disable_msi() thing.  But if it is truly the latter, your
> current code is sufficient and we can refine it over time.
In case of Tegra194, it is the latter case.

> 
>>>>>>> I see that an earlier patch added "bus" to struct pcie_port.
>>>>>>> I think it would be better to somehow connect to the
>>>>>>> pci_host_bridge struct.  Several other drivers already do
>>>>>>> this; see uses of pci_host_bridge_from_priv().
>>>>>>
>>>>>> All non-DesignWare based implementations save their private data
>>>>>> structure in 'private' pointer of struct pci_host_bridge and use
>>>>>> pci_host_bridge_from_priv() to get it back. But, DesignWare
>>>>>> based implementations save pcie_port in 'sysdata' and nothing in
>>>>>> 'private' pointer. So,  I'm not sure if
>>>>>> pci_host_bridge_from_priv() can be used in this case. Please do
>>>>>> let me know if you think otherwise.
>>>>>
>>>>> DesignWare-based drivers should have a way to retrieve the
>>>>> pci_host_bridge pointer.  It doesn't have to be *exactly* the same
>>>>> as non-DesignWare drivers, but it should be similar.
>>>>
>>>> I gave my reasoning as to why with the current code, it is not
>>>> possible to get the pci_host_bridge structure pointer from struct
>>>> pcie_port pointer in another thread as a reply to Thierry Reding's
>>>> comments. Since Jishen'g changes to support remove functionality are
>>>> accepted, I think using bus pointer saved in struct pcie_port
>>>> pointer shouldn't be any issue now. Please do let me know if that is
>>>> something not acceptable.
>>>>
>>>>>>> That would give you the bus, as well as flags like
>>>>>>> no_ext_tags, native_aer, etc, which this driver, being a host
>>>>>>> bridge driver that's responsible for this part of the
>>>>>>> firmware/OS interface, may conceivably need.
>>>
>>> I think saving the pp->root_bus pointer as Jisheng's patch does is a
>>> sub-optimal solution.  If we figure out how to save the
>>> pci_host_bridge pointer, we automatically get the root bus pointer as
>>> well.
>>>
>>> It may require some restructuring to save the pci_host_bridge pointer,
>>> but I doubt it's really *impossible*.
>>
>> Is it OK to save pci_host_bridge pointer in pcie_port structure
>> directly? I see that as another way to get pci_host_bridge pointer
>> from pcie_port structure. My understanding is that, to get
>> pci_host_bridge pointer, either pcie_port structure should be part
>> of pci_host_bridge structure (which is the case with all non-DW
>> implementations) or pcie_port should have a pointer to some
>> structure which is directly (and not by means of a pointer) part of
>> pci_host_bridge structure so that container_of() can be used to get
>> pci_host_bridge.  As I see, there is no data object of
>> pci_host_bridge whose pointer is saved in pcie_port structure. In
>> fact, in reverse, pcie_port's struct dev pointer is saved as parent
>> to pci_host_bridge's struct dev. So, another way would be to iterate
>> over the children of pcie_port's struct dev pointer to find
>> pci_host_bridge's dev pointer and from there get pci_host_bridge
>> through container_of. But, I think is complicating it more than
>> using bus pointer from pcie_port. I'm not sure if I'm able to convey
>> the issue I'm facing here to get pci_host_bridge from pcie_port, but
>> doing any other thing seems complicating it even more.
> 
> What I suspect should happen eventually is the DWC driver should call
> devm_pci_alloc_host_bridge() directly, as all the non-DWC drivers do.
> That would require a little reorganization of the DWC data structures,
> but it would be good to be more consistent.
> 
> For now, I think stashing the pointer in pcie_port or dw_pcie would be
> OK.  I'm not 100% clear on the difference, but it looks like either
> should be common across all the DWC drivers, which is what we want.
Since dw_pcie is common for both root port and end point mode structures,
I think it makes sense to keep the pointer in pcie_port as this structure
is specific to root port mode of operation.
I'll make a note to reorganize code to have devm_pci_alloc_host_bridge()
used in the beginning itself to be inline with non-DWC implementations.
But, I'll take it up later (after I'm done with upstreaming current series)

> 
> (Tangent, dw_pcie_host_init() currently uses pci_alloc_host_bridge(),
> not devm_pci_alloc_host_bridge(), even though it uses other devm
> interfaces.  This looks like a probable buglet.)
> 
> (Tangent 2, devm_pci_alloc_host_bridge() doesn't initialize
> bridge->native_aer, etc, as pci_alloc_host_bridge() does.  This looks
> like my fault from 02bfeb484230 ("PCI/portdrv: Simplify PCIe feature
> permission checking"), and it probably means none of those PCIe
> services work correctly for these native host bridge drivers.)
> 
>>>>>>>> +static int tegra_pcie_dw_runtime_suspend(struct device *dev)
>>>>>>>> +{
>>>>>>>> +	struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
>>>>>>>> +
>>>>>>>> +	tegra_pcie_downstream_dev_to_D0(pcie);
>>>>>>>> +
>>>>>>>> +	pci_stop_root_bus(pcie->pci.pp.bus);
>>>>>>>> +	pci_remove_root_bus(pcie->pci.pp.bus);
>>>>>>>
>>>>>>> Why are you calling these?  No other drivers do this except in
>>>>>>> their .remove() methods.  Is there something special about
>>>>>>> Tegra, or is this something the other drivers *should* be
>>>>>>> doing?
>>>>>>
>>>>>> Since this API is called by remove, I'm removing the hierarchy
>>>>>> to safely bring down all the devices. I'll have to re-visit this
>>>>>> part as Jisheng Zhang's patches
>>>>>> https://patchwork.kernel.org/project/linux-pci/list/?series=98559
>>>>>> are now approved and I need to verify this part after
>>>>>> cherry-picking Jisheng's changes.
>>>>>
>>>>> Tegra194 should do this the same way as other drivers, independent
>>>>> of Jisheng's changes.
>>>>
>>>> When other Designware implementations add remove functionality, even
>>>> they should be calling these APIs (Jisheng also mentioned the same
>>>> in his commit message)
>>>
>>> My point is that these APIs should be called from driver .remove()
>>> methods, not from .runtime_suspend() methods.
>>
>> .remove() internally calls pm_runtime_put_sync() API which calls
>> .runtime_suspend(). I made a new patch to add a host_deinit() call
>> which make all these calls. Since host_init() is called from inside
>> .runtime_resume() of this driver, to be in sync, I'm now calling
>> host_deinit() from inside .runtime_suspend() API.
> 
> I think this is wrong.  pci_stop_root_bus() will detach all the
> drivers from all the devices.  We don't want to do that if we're
> merely runtime suspending the host bridge, do we?
In the current driver, the scenarios in which .runtime_suspend() is called
are
a) during .remove() call and
b) when there is no endpoint found and controller would be shutdown
In both cases, it is required to stop the root bus and remove all devices,
so, instead of having same call present in respective paths, I kept them
in .runtime_suspend() itself to avoid code duplication.

> 
> Bjorn
>
Bjorn Helgaas April 9, 2019, 1:26 p.m. UTC | #21
On Tue, Apr 09, 2019 at 05:00:53PM +0530, Vidya Sagar wrote:
> On 4/6/2019 12:28 AM, Bjorn Helgaas wrote:
> > On Fri, Apr 05, 2019 at 01:23:51AM +0530, Vidya Sagar wrote:
> > > On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
> > > > On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
> > > > > On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
> > > > > > On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
> > > > > > > On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> > > > > > > > On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
> > > > > > > > > Add support for Synopsys DesignWare core IP based PCIe host controller
> > > > > > > > > present in Tegra194 SoC.
> > > > > > 
> > > > > >      - Why does this chip require pcie_pme_disable_msi()?  The only other
> > > > > >        use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
> > > > > >        ("PCI PM: Make it possible to force using INTx for PCIe PME
> > > > > >        signaling").
> > > > > 
> > > > > Because Tegra194 doesn't support raising PME interrupts through MSI line.

> > There's something wrong here.  Either the question of how PME is
> > signaled is generic and the spec provides a way for the OS to discover
> > that method, or it's part of the device-specific architecture that
> > each host bridge driver has to know about its device.  If the former,
> > we need to make the PCI core smart enough to figure it out.  If the
> > latter, we need a better interface than this ad hoc
> > pcie_pme_disable_msi() thing.  But if it is truly the latter, your
> > current code is sufficient and we can refine it over time.
>
> In case of Tegra194, it is the latter case.

This isn't a Tegra194 question; it's a question of whether this
behavior is covered by the PCIe spec.

> > What I suspect should happen eventually is the DWC driver should call
> > devm_pci_alloc_host_bridge() directly, as all the non-DWC drivers do.
> > That would require a little reorganization of the DWC data structures,
> > but it would be good to be more consistent.
> > 
> > For now, I think stashing the pointer in pcie_port or dw_pcie would be
> > OK.  I'm not 100% clear on the difference, but it looks like either
> > should be common across all the DWC drivers, which is what we want.
>
> Since dw_pcie is common for both root port and end point mode structures,
> I think it makes sense to keep the pointer in pcie_port as this structure
> is specific to root port mode of operation.
> I'll make a note to reorganize code to have devm_pci_alloc_host_bridge()
> used in the beginning itself to be inline with non-DWC implementations.
> But, I'll take it up later (after I'm done with upstreaming current series)

Fair enough.

> > > .remove() internally calls pm_runtime_put_sync() API which calls
> > > .runtime_suspend(). I made a new patch to add a host_deinit() call
> > > which make all these calls. Since host_init() is called from inside
> > > .runtime_resume() of this driver, to be in sync, I'm now calling
> > > host_deinit() from inside .runtime_suspend() API.
> > 
> > I think this is wrong.  pci_stop_root_bus() will detach all the
> > drivers from all the devices.  We don't want to do that if we're
> > merely runtime suspending the host bridge, do we?
>
> In the current driver, the scenarios in which .runtime_suspend() is called
> are
> a) during .remove() call and

It makes sense that you should call pci_stop_root_bus() during
.remove(), i.e., when the host controller driver is being removed,
because the PCI bus will no longer be accessible.  I think you should
call it *directly* from tegra_pcie_dw_remove() because that will match
what other drivers do.

> b) when there is no endpoint found and controller would be shutdown
> In both cases, it is required to stop the root bus and remove all devices,
> so, instead of having same call present in respective paths, I kept them
> in .runtime_suspend() itself to avoid code duplication.

I don't understand this part.  We should be able to runtime suspend
the host controller without detaching drivers for child devices.

If you shutdown the controller completely and detach the *host
controller driver*, sure, it makes sense to detach drivers from child
devices.  But that would be handled by the host controller .remove()
method, not by the runtime suspend method.

Bjorn
Vidya Sagar April 10, 2019, 6:10 a.m. UTC | #22
On 4/9/2019 6:56 PM, Bjorn Helgaas wrote:
> On Tue, Apr 09, 2019 at 05:00:53PM +0530, Vidya Sagar wrote:
>> On 4/6/2019 12:28 AM, Bjorn Helgaas wrote:
>>> On Fri, Apr 05, 2019 at 01:23:51AM +0530, Vidya Sagar wrote:
>>>> On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
>>>>> On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
>>>>>> On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
>>>>>>> On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
>>>>>>>> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
>>>>>>>>> On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
>>>>>>>>>> Add support for Synopsys DesignWare core IP based PCIe host controller
>>>>>>>>>> present in Tegra194 SoC.
>>>>>>>
>>>>>>>       - Why does this chip require pcie_pme_disable_msi()?  The only other
>>>>>>>         use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
>>>>>>>         ("PCI PM: Make it possible to force using INTx for PCIe PME
>>>>>>>         signaling").
>>>>>>
>>>>>> Because Tegra194 doesn't support raising PME interrupts through MSI line.
> 
>>> There's something wrong here.  Either the question of how PME is
>>> signaled is generic and the spec provides a way for the OS to discover
>>> that method, or it's part of the device-specific architecture that
>>> each host bridge driver has to know about its device.  If the former,
>>> we need to make the PCI core smart enough to figure it out.  If the
>>> latter, we need a better interface than this ad hoc
>>> pcie_pme_disable_msi() thing.  But if it is truly the latter, your
>>> current code is sufficient and we can refine it over time.
>>
>> In case of Tegra194, it is the latter case.
> 
> This isn't a Tegra194 question; it's a question of whether this
> behavior is covered by the PCIe spec.
AFAIU the spec and what I heard from Nvidia hardware folks is that spec doesn't
explicitly talk about this and it was a design choice made by Nvidia hardware
folks to route these interrupts through legacy line instead of MSI line.

> 
>>> What I suspect should happen eventually is the DWC driver should call
>>> devm_pci_alloc_host_bridge() directly, as all the non-DWC drivers do.
>>> That would require a little reorganization of the DWC data structures,
>>> but it would be good to be more consistent.
>>>
>>> For now, I think stashing the pointer in pcie_port or dw_pcie would be
>>> OK.  I'm not 100% clear on the difference, but it looks like either
>>> should be common across all the DWC drivers, which is what we want.
>>
>> Since dw_pcie is common for both root port and end point mode structures,
>> I think it makes sense to keep the pointer in pcie_port as this structure
>> is specific to root port mode of operation.
>> I'll make a note to reorganize code to have devm_pci_alloc_host_bridge()
>> used in the beginning itself to be inline with non-DWC implementations.
>> But, I'll take it up later (after I'm done with upstreaming current series)
> 
> Fair enough.
> 
>>>> .remove() internally calls pm_runtime_put_sync() API which calls
>>>> .runtime_suspend(). I made a new patch to add a host_deinit() call
>>>> which make all these calls. Since host_init() is called from inside
>>>> .runtime_resume() of this driver, to be in sync, I'm now calling
>>>> host_deinit() from inside .runtime_suspend() API.
>>>
>>> I think this is wrong.  pci_stop_root_bus() will detach all the
>>> drivers from all the devices.  We don't want to do that if we're
>>> merely runtime suspending the host bridge, do we?
>>
>> In the current driver, the scenarios in which .runtime_suspend() is called
>> are
>> a) during .remove() call and
> 
> It makes sense that you should call pci_stop_root_bus() during
> .remove(), i.e., when the host controller driver is being removed,
> because the PCI bus will no longer be accessible.  I think you should
> call it *directly* from tegra_pcie_dw_remove() because that will match
> what other drivers do.
> 
>> b) when there is no endpoint found and controller would be shutdown
>> In both cases, it is required to stop the root bus and remove all devices,
>> so, instead of having same call present in respective paths, I kept them
>> in .runtime_suspend() itself to avoid code duplication.
> 
> I don't understand this part.  We should be able to runtime suspend
> the host controller without detaching drivers for child devices.
> 
> If you shutdown the controller completely and detach the *host
> controller driver*, sure, it makes sense to detach drivers from child
> devices.  But that would be handled by the host controller .remove()
> method, not by the runtime suspend method.
I think it is time I give some background about why I chose to implement
.runtime_suspend() and .runtime_resume() APIs in the first place. We wanted to
powerdown PCIe controller if there is no link up (i.e. slot is open and no endpoint
devices are connected). We want to achieve this without returning a failure in
.probe() call. Given PCIe IP power partitioning is handled by generic power domain
framework, power partition gets unpowergated before .probe() gets called and gets
powergated either when a failure is returned in .probe() or when pm_runtime_put_sync()
is called. So, I chose to call pm_runtime_put_sync() in no-link-up scenario and chose
to implement .runtime_suspend() to handle all the cleanup work before PCIe partition
getting powergated. In fact, to match this, I'm doing all the PCIe IP bring up
activity in .runtime_resume() implementation which gets invoked by pm_runtime_get_sync()
which in turn is called in .probe() path. In fact the very dw_pcie_host_init() itself
is called from .runtime_resume() implementation. So, it is because of these reasons that
I called pci_stop_root_bus() and pci_remove_root_bus() as part of .runtime_suspend()
implementation as pm_runtime_put_sync() is called from both .remove() and also during
no-link-up scenario. Please do let me know if there is a better way to do this.

> 
> Bjorn
>
Liviu Dudau April 10, 2019, 8:14 a.m. UTC | #23
On Wed, Apr 10, 2019 at 11:40:40AM +0530, Vidya Sagar wrote:
> On 4/9/2019 6:56 PM, Bjorn Helgaas wrote:
> > On Tue, Apr 09, 2019 at 05:00:53PM +0530, Vidya Sagar wrote:
> > > On 4/6/2019 12:28 AM, Bjorn Helgaas wrote:
> > > > On Fri, Apr 05, 2019 at 01:23:51AM +0530, Vidya Sagar wrote:
> > > > > On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
> > > > > > On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
> > > > > > > On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
> > > > > > > > On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
> > > > > > > > > On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> > > > > > > > > > On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
> > > > > > > > > > > Add support for Synopsys DesignWare core IP based PCIe host controller
> > > > > > > > > > > present in Tegra194 SoC.
> > > > > > > > 
> > > > > > > >       - Why does this chip require pcie_pme_disable_msi()?  The only other
> > > > > > > >         use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
> > > > > > > >         ("PCI PM: Make it possible to force using INTx for PCIe PME
> > > > > > > >         signaling").
> > > > > > > 
> > > > > > > Because Tegra194 doesn't support raising PME interrupts through MSI line.
> > 
> > > > There's something wrong here.  Either the question of how PME is
> > > > signaled is generic and the spec provides a way for the OS to discover
> > > > that method, or it's part of the device-specific architecture that
> > > > each host bridge driver has to know about its device.  If the former,
> > > > we need to make the PCI core smart enough to figure it out.  If the
> > > > latter, we need a better interface than this ad hoc
> > > > pcie_pme_disable_msi() thing.  But if it is truly the latter, your
> > > > current code is sufficient and we can refine it over time.
> > > 
> > > In case of Tegra194, it is the latter case.
> > 
> > This isn't a Tegra194 question; it's a question of whether this
> > behavior is covered by the PCIe spec.
> AFAIU the spec and what I heard from Nvidia hardware folks is that spec doesn't
> explicitly talk about this and it was a design choice made by Nvidia hardware
> folks to route these interrupts through legacy line instead of MSI line.
> 
> > 
> > > > What I suspect should happen eventually is the DWC driver should call
> > > > devm_pci_alloc_host_bridge() directly, as all the non-DWC drivers do.
> > > > That would require a little reorganization of the DWC data structures,
> > > > but it would be good to be more consistent.
> > > > 
> > > > For now, I think stashing the pointer in pcie_port or dw_pcie would be
> > > > OK.  I'm not 100% clear on the difference, but it looks like either
> > > > should be common across all the DWC drivers, which is what we want.
> > > 
> > > Since dw_pcie is common for both root port and end point mode structures,
> > > I think it makes sense to keep the pointer in pcie_port as this structure
> > > is specific to root port mode of operation.
> > > I'll make a note to reorganize code to have devm_pci_alloc_host_bridge()
> > > used in the beginning itself to be inline with non-DWC implementations.
> > > But, I'll take it up later (after I'm done with upstreaming current series)
> > 
> > Fair enough.
> > 
> > > > > .remove() internally calls pm_runtime_put_sync() API which calls
> > > > > .runtime_suspend(). I made a new patch to add a host_deinit() call
> > > > > which make all these calls. Since host_init() is called from inside
> > > > > .runtime_resume() of this driver, to be in sync, I'm now calling
> > > > > host_deinit() from inside .runtime_suspend() API.
> > > > 
> > > > I think this is wrong.  pci_stop_root_bus() will detach all the
> > > > drivers from all the devices.  We don't want to do that if we're
> > > > merely runtime suspending the host bridge, do we?
> > > 
> > > In the current driver, the scenarios in which .runtime_suspend() is called
> > > are
> > > a) during .remove() call and
> > 
> > It makes sense that you should call pci_stop_root_bus() during
> > .remove(), i.e., when the host controller driver is being removed,
> > because the PCI bus will no longer be accessible.  I think you should
> > call it *directly* from tegra_pcie_dw_remove() because that will match
> > what other drivers do.
> > 
> > > b) when there is no endpoint found and controller would be shutdown
> > > In both cases, it is required to stop the root bus and remove all devices,
> > > so, instead of having same call present in respective paths, I kept them
> > > in .runtime_suspend() itself to avoid code duplication.
> > 
> > I don't understand this part.  We should be able to runtime suspend
> > the host controller without detaching drivers for child devices.
> > 
> > If you shutdown the controller completely and detach the *host
> > controller driver*, sure, it makes sense to detach drivers from child
> > devices.  But that would be handled by the host controller .remove()
> > method, not by the runtime suspend method.
> I think it is time I give some background about why I chose to implement
> .runtime_suspend() and .runtime_resume() APIs in the first place. We wanted to
> powerdown PCIe controller if there is no link up (i.e. slot is open and no endpoint
> devices are connected). We want to achieve this without returning a failure in
> .probe() call. Given PCIe IP power partitioning is handled by generic power domain
> framework, power partition gets unpowergated before .probe() gets called and gets
> powergated either when a failure is returned in .probe() or when pm_runtime_put_sync()
> is called. So, I chose to call pm_runtime_put_sync() in no-link-up scenario and chose
> to implement .runtime_suspend() to handle all the cleanup work before PCIe partition
> getting powergated. In fact, to match this, I'm doing all the PCIe IP bring up
> activity in .runtime_resume() implementation which gets invoked by pm_runtime_get_sync()
> which in turn is called in .probe() path. In fact the very dw_pcie_host_init() itself
> is called from .runtime_resume() implementation. So, it is because of these reasons that
> I called pci_stop_root_bus() and pci_remove_root_bus() as part of .runtime_suspend()
> implementation as pm_runtime_put_sync() is called from both .remove() and also during
> no-link-up scenario. Please do let me know if there is a better way to do this.

I think you're missing the case where .runtime_suspend() is called when
there are no *active* devices on the bus, i.e. everyone is dormant. It
doesn't mean that you need to remove them from the bus and re-probe them
back on .runtime_resume(). Most of the drivers for PCI devices don't
expect to be removed during idle, as they will configure the hardware to
be in a "quick wake" state (see PCIe Dx power states).

You should probe and configure the bus during .probe() and remove and
detach all drivers during .remove(). For .runtime_suspend() all you need
to do is put the host controller in low power mode if it has one, or
stop all clocks that are not required for responding to devices waking
up from PCIe Dx state. For .runtime_resume() you then restore the
clocks, without re-scanning the bus.

Best regards,
Liviu


> 
> > 
> > Bjorn
> > 
>
Vidya Sagar April 10, 2019, 9:53 a.m. UTC | #24
On 4/10/2019 1:44 PM, Liviu Dudau wrote:
> On Wed, Apr 10, 2019 at 11:40:40AM +0530, Vidya Sagar wrote:
>> On 4/9/2019 6:56 PM, Bjorn Helgaas wrote:
>>> On Tue, Apr 09, 2019 at 05:00:53PM +0530, Vidya Sagar wrote:
>>>> On 4/6/2019 12:28 AM, Bjorn Helgaas wrote:
>>>>> On Fri, Apr 05, 2019 at 01:23:51AM +0530, Vidya Sagar wrote:
>>>>>> On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
>>>>>>> On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
>>>>>>>> On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
>>>>>>>>> On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
>>>>>>>>>> On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
>>>>>>>>>>> On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
>>>>>>>>>>>> Add support for Synopsys DesignWare core IP based PCIe host controller
>>>>>>>>>>>> present in Tegra194 SoC.
>>>>>>>>>
>>>>>>>>>        - Why does this chip require pcie_pme_disable_msi()?  The only other
>>>>>>>>>          use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
>>>>>>>>>          ("PCI PM: Make it possible to force using INTx for PCIe PME
>>>>>>>>>          signaling").
>>>>>>>>
>>>>>>>> Because Tegra194 doesn't support raising PME interrupts through MSI line.
>>>
>>>>> There's something wrong here.  Either the question of how PME is
>>>>> signaled is generic and the spec provides a way for the OS to discover
>>>>> that method, or it's part of the device-specific architecture that
>>>>> each host bridge driver has to know about its device.  If the former,
>>>>> we need to make the PCI core smart enough to figure it out.  If the
>>>>> latter, we need a better interface than this ad hoc
>>>>> pcie_pme_disable_msi() thing.  But if it is truly the latter, your
>>>>> current code is sufficient and we can refine it over time.
>>>>
>>>> In case of Tegra194, it is the latter case.
>>>
>>> This isn't a Tegra194 question; it's a question of whether this
>>> behavior is covered by the PCIe spec.
>> AFAIU the spec and what I heard from Nvidia hardware folks is that spec doesn't
>> explicitly talk about this and it was a design choice made by Nvidia hardware
>> folks to route these interrupts through legacy line instead of MSI line.
>>
>>>
>>>>> What I suspect should happen eventually is the DWC driver should call
>>>>> devm_pci_alloc_host_bridge() directly, as all the non-DWC drivers do.
>>>>> That would require a little reorganization of the DWC data structures,
>>>>> but it would be good to be more consistent.
>>>>>
>>>>> For now, I think stashing the pointer in pcie_port or dw_pcie would be
>>>>> OK.  I'm not 100% clear on the difference, but it looks like either
>>>>> should be common across all the DWC drivers, which is what we want.
>>>>
>>>> Since dw_pcie is common for both root port and end point mode structures,
>>>> I think it makes sense to keep the pointer in pcie_port as this structure
>>>> is specific to root port mode of operation.
>>>> I'll make a note to reorganize code to have devm_pci_alloc_host_bridge()
>>>> used in the beginning itself to be inline with non-DWC implementations.
>>>> But, I'll take it up later (after I'm done with upstreaming current series)
>>>
>>> Fair enough.
>>>
>>>>>> .remove() internally calls pm_runtime_put_sync() API which calls
>>>>>> .runtime_suspend(). I made a new patch to add a host_deinit() call
>>>>>> which make all these calls. Since host_init() is called from inside
>>>>>> .runtime_resume() of this driver, to be in sync, I'm now calling
>>>>>> host_deinit() from inside .runtime_suspend() API.
>>>>>
>>>>> I think this is wrong.  pci_stop_root_bus() will detach all the
>>>>> drivers from all the devices.  We don't want to do that if we're
>>>>> merely runtime suspending the host bridge, do we?
>>>>
>>>> In the current driver, the scenarios in which .runtime_suspend() is called
>>>> are
>>>> a) during .remove() call and
>>>
>>> It makes sense that you should call pci_stop_root_bus() during
>>> .remove(), i.e., when the host controller driver is being removed,
>>> because the PCI bus will no longer be accessible.  I think you should
>>> call it *directly* from tegra_pcie_dw_remove() because that will match
>>> what other drivers do.
>>>
>>>> b) when there is no endpoint found and controller would be shutdown
>>>> In both cases, it is required to stop the root bus and remove all devices,
>>>> so, instead of having same call present in respective paths, I kept them
>>>> in .runtime_suspend() itself to avoid code duplication.
>>>
>>> I don't understand this part.  We should be able to runtime suspend
>>> the host controller without detaching drivers for child devices.
>>>
>>> If you shutdown the controller completely and detach the *host
>>> controller driver*, sure, it makes sense to detach drivers from child
>>> devices.  But that would be handled by the host controller .remove()
>>> method, not by the runtime suspend method.
>> I think it is time I give some background about why I chose to implement
>> .runtime_suspend() and .runtime_resume() APIs in the first place. We wanted to
>> powerdown PCIe controller if there is no link up (i.e. slot is open and no endpoint
>> devices are connected). We want to achieve this without returning a failure in
>> .probe() call. Given PCIe IP power partitioning is handled by generic power domain
>> framework, power partition gets unpowergated before .probe() gets called and gets
>> powergated either when a failure is returned in .probe() or when pm_runtime_put_sync()
>> is called. So, I chose to call pm_runtime_put_sync() in no-link-up scenario and chose
>> to implement .runtime_suspend() to handle all the cleanup work before PCIe partition
>> getting powergated. In fact, to match this, I'm doing all the PCIe IP bring up
>> activity in .runtime_resume() implementation which gets invoked by pm_runtime_get_sync()
>> which in turn is called in .probe() path. In fact the very dw_pcie_host_init() itself
>> is called from .runtime_resume() implementation. So, it is because of these reasons that
>> I called pci_stop_root_bus() and pci_remove_root_bus() as part of .runtime_suspend()
>> implementation as pm_runtime_put_sync() is called from both .remove() and also during
>> no-link-up scenario. Please do let me know if there is a better way to do this.
> 
> I think you're missing the case where .runtime_suspend() is called when
> there are no *active* devices on the bus, i.e. everyone is dormant. It
> doesn't mean that you need to remove them from the bus and re-probe them
> back on .runtime_resume(). Most of the drivers for PCI devices don't
> expect to be removed during idle, as they will configure the hardware to
> be in a "quick wake" state (see PCIe Dx power states).
> 
> You should probe and configure the bus during .probe() and remove and
> detach all drivers during .remove(). For .runtime_suspend() all you need
> to do is put the host controller in low power mode if it has one, or
> stop all clocks that are not required for responding to devices waking
> up from PCIe Dx state. For .runtime_resume() you then restore the
> clocks, without re-scanning the bus.
Since this is a host controller driver and the device as such is sitting on platform
bus instead of PCIe bus, is it still the case that .runtime_suspend() and
.runtime_resume() of this driver get called when devices on PCIe bus are idle?

Having asked that, I start to feel what I'm doing as part of .runtime_suspend() and
.runtime_resume() doesn't really justify these API names. Since I know where I'm
calling pm_runtime_get/put_sync() APIs (which eventually call .runtime_resume/suspend())
I should probably move the content of these APIs before calling pm_runtime_get/put_sync().
Do you agree with that?

> 
> Best regards,
> Liviu
> 
> 
>>
>>>
>>> Bjorn
>>>
>>
>
Liviu Dudau April 10, 2019, 11:35 a.m. UTC | #25
On Wed, Apr 10, 2019 at 03:23:39PM +0530, Vidya Sagar wrote:
> On 4/10/2019 1:44 PM, Liviu Dudau wrote:
> > On Wed, Apr 10, 2019 at 11:40:40AM +0530, Vidya Sagar wrote:
> > > On 4/9/2019 6:56 PM, Bjorn Helgaas wrote:
> > > > On Tue, Apr 09, 2019 at 05:00:53PM +0530, Vidya Sagar wrote:
> > > > > On 4/6/2019 12:28 AM, Bjorn Helgaas wrote:
> > > > > > On Fri, Apr 05, 2019 at 01:23:51AM +0530, Vidya Sagar wrote:
> > > > > > > On 4/3/2019 11:06 PM, Bjorn Helgaas wrote:
> > > > > > > > On Wed, Apr 03, 2019 at 03:13:09PM +0530, Vidya Sagar wrote:
> > > > > > > > > On 4/3/2019 12:01 AM, Bjorn Helgaas wrote:
> > > > > > > > > > On Tue, Apr 02, 2019 at 12:47:48PM +0530, Vidya Sagar wrote:
> > > > > > > > > > > On 3/30/2019 2:22 AM, Bjorn Helgaas wrote:
> > > > > > > > > > > > On Tue, Mar 26, 2019 at 08:43:26PM +0530, Vidya Sagar wrote:
> > > > > > > > > > > > > Add support for Synopsys DesignWare core IP based PCIe host controller
> > > > > > > > > > > > > present in Tegra194 SoC.
> > > > > > > > > > 
> > > > > > > > > >        - Why does this chip require pcie_pme_disable_msi()?  The only other
> > > > > > > > > >          use is a DMI quirk for "MSI Wind U-100", added by c39fae1416d5
> > > > > > > > > >          ("PCI PM: Make it possible to force using INTx for PCIe PME
> > > > > > > > > >          signaling").
> > > > > > > > > 
> > > > > > > > > Because Tegra194 doesn't support raising PME interrupts through MSI line.
> > > > 
> > > > > > There's something wrong here.  Either the question of how PME is
> > > > > > signaled is generic and the spec provides a way for the OS to discover
> > > > > > that method, or it's part of the device-specific architecture that
> > > > > > each host bridge driver has to know about its device.  If the former,
> > > > > > we need to make the PCI core smart enough to figure it out.  If the
> > > > > > latter, we need a better interface than this ad hoc
> > > > > > pcie_pme_disable_msi() thing.  But if it is truly the latter, your
> > > > > > current code is sufficient and we can refine it over time.
> > > > > 
> > > > > In case of Tegra194, it is the latter case.
> > > > 
> > > > This isn't a Tegra194 question; it's a question of whether this
> > > > behavior is covered by the PCIe spec.
> > > AFAIU the spec and what I heard from Nvidia hardware folks is that spec doesn't
> > > explicitly talk about this and it was a design choice made by Nvidia hardware
> > > folks to route these interrupts through legacy line instead of MSI line.
> > > 
> > > > 
> > > > > > What I suspect should happen eventually is the DWC driver should call
> > > > > > devm_pci_alloc_host_bridge() directly, as all the non-DWC drivers do.
> > > > > > That would require a little reorganization of the DWC data structures,
> > > > > > but it would be good to be more consistent.
> > > > > > 
> > > > > > For now, I think stashing the pointer in pcie_port or dw_pcie would be
> > > > > > OK.  I'm not 100% clear on the difference, but it looks like either
> > > > > > should be common across all the DWC drivers, which is what we want.
> > > > > 
> > > > > Since dw_pcie is common for both root port and end point mode structures,
> > > > > I think it makes sense to keep the pointer in pcie_port as this structure
> > > > > is specific to root port mode of operation.
> > > > > I'll make a note to reorganize code to have devm_pci_alloc_host_bridge()
> > > > > used in the beginning itself to be inline with non-DWC implementations.
> > > > > But, I'll take it up later (after I'm done with upstreaming current series)
> > > > 
> > > > Fair enough.
> > > > 
> > > > > > > .remove() internally calls pm_runtime_put_sync() API which calls
> > > > > > > .runtime_suspend(). I made a new patch to add a host_deinit() call
> > > > > > > which make all these calls. Since host_init() is called from inside
> > > > > > > .runtime_resume() of this driver, to be in sync, I'm now calling
> > > > > > > host_deinit() from inside .runtime_suspend() API.
> > > > > > 
> > > > > > I think this is wrong.  pci_stop_root_bus() will detach all the
> > > > > > drivers from all the devices.  We don't want to do that if we're
> > > > > > merely runtime suspending the host bridge, do we?
> > > > > 
> > > > > In the current driver, the scenarios in which .runtime_suspend() is called
> > > > > are
> > > > > a) during .remove() call and
> > > > 
> > > > It makes sense that you should call pci_stop_root_bus() during
> > > > .remove(), i.e., when the host controller driver is being removed,
> > > > because the PCI bus will no longer be accessible.  I think you should
> > > > call it *directly* from tegra_pcie_dw_remove() because that will match
> > > > what other drivers do.
> > > > 
> > > > > b) when there is no endpoint found and controller would be shutdown
> > > > > In both cases, it is required to stop the root bus and remove all devices,
> > > > > so, instead of having same call present in respective paths, I kept them
> > > > > in .runtime_suspend() itself to avoid code duplication.
> > > > 
> > > > I don't understand this part.  We should be able to runtime suspend
> > > > the host controller without detaching drivers for child devices.
> > > > 
> > > > If you shutdown the controller completely and detach the *host
> > > > controller driver*, sure, it makes sense to detach drivers from child
> > > > devices.  But that would be handled by the host controller .remove()
> > > > method, not by the runtime suspend method.
> > > I think it is time I give some background about why I chose to implement
> > > .runtime_suspend() and .runtime_resume() APIs in the first place. We wanted to
> > > powerdown PCIe controller if there is no link up (i.e. slot is open and no endpoint
> > > devices are connected). We want to achieve this without returning a failure in
> > > .probe() call. Given PCIe IP power partitioning is handled by generic power domain
> > > framework, power partition gets unpowergated before .probe() gets called and gets
> > > powergated either when a failure is returned in .probe() or when pm_runtime_put_sync()
> > > is called. So, I chose to call pm_runtime_put_sync() in no-link-up scenario and chose
> > > to implement .runtime_suspend() to handle all the cleanup work before PCIe partition
> > > getting powergated. In fact, to match this, I'm doing all the PCIe IP bring up
> > > activity in .runtime_resume() implementation which gets invoked by pm_runtime_get_sync()
> > > which in turn is called in .probe() path. In fact the very dw_pcie_host_init() itself
> > > is called from .runtime_resume() implementation. So, it is because of these reasons that
> > > I called pci_stop_root_bus() and pci_remove_root_bus() as part of .runtime_suspend()
> > > implementation as pm_runtime_put_sync() is called from both .remove() and also during
> > > no-link-up scenario. Please do let me know if there is a better way to do this.
> > 
> > I think you're missing the case where .runtime_suspend() is called when
> > there are no *active* devices on the bus, i.e. everyone is dormant. It
> > doesn't mean that you need to remove them from the bus and re-probe them
> > back on .runtime_resume(). Most of the drivers for PCI devices don't
> > expect to be removed during idle, as they will configure the hardware to
> > be in a "quick wake" state (see PCIe Dx power states).
> > 
> > You should probe and configure the bus during .probe() and remove and
> > detach all drivers during .remove(). For .runtime_suspend() all you need
> > to do is put the host controller in low power mode if it has one, or
> > stop all clocks that are not required for responding to devices waking
> > up from PCIe Dx state. For .runtime_resume() you then restore the
> > clocks, without re-scanning the bus.
> Since this is a host controller driver and the device as such is sitting on platform
> bus instead of PCIe bus, is it still the case that .runtime_suspend() and
> .runtime_resume() of this driver get called when devices on PCIe bus are idle?

The functions will be called when the device is determined to be idle,
i.e. when there are no PM references being held by other drivers.

Think of it the other way: even if the device is sitting on the platform
bus for configuration reasons, you don't want to turn it off when the
PCIe bus needs to be active, right?


> 
> Having asked that, I start to feel what I'm doing as part of .runtime_suspend() and
> .runtime_resume() doesn't really justify these API names. Since I know where I'm
> calling pm_runtime_get/put_sync() APIs (which eventually call .runtime_resume/suspend())
> I should probably move the content of these APIs before calling pm_runtime_get/put_sync().
> Do you agree with that?

Yeah, I think you are right. Also, I believe there are pm_runtime_get()
calls in the pci framework as well, you need to audit the code, I
haven't looked at it in a while.

Best regards,
Liviu

> 
> > 
> > Best regards,
> > Liviu
> > 
> > 
> > > 
> > > > 
> > > > Bjorn
> > > > 
> > > 
> > 
>