diff mbox

ARM: mx6: Fix suspend/resume with PCI

Message ID 1407371998-11437-1-git-send-email-festevam@gmail.com
State New
Headers show

Commit Message

Fabio Estevam Aug. 7, 2014, 12:39 a.m. UTC
From: Fabio Estevam <fabio.estevam@freescale.com>

When PCI is used and a suspend/resume sequence is done we see the following
kernel hang:

root@freescale /$ echo mem > /sys/power/state 
[   16.099018] PM: Syncing filesystems ... done.
[   16.141010] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   16.150840] Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
[   16.199438] random: nonblocking pool is initialized
[   16.229639] PM: suspend of devices complete after 64.793 msecs
[   16.235488] PM: suspend devices took 0.070 seconds
[   16.245301] PM: late suspend of devices complete after 4.968 msecs
[   16.257063] PM: noirq suspend of devices complete after 5.538 msecs
[   16.263425] Disabling non-boot CPUs ...
[   16.274666] CPU1: shutdown
[   16.286351] CPU2: shutdown
[   16.294169] CPU3: shutdown
[   16.299551] Enabling non-boot CPUs ...
[   16.304155] CPU1: Booted secondary processor
[   16.305717] CPU1 is up
[   16.313078] CPU2: Booted secondary processor
[   16.313456] CPU2 is up
[   16.320778] CPU3: Booted secondary processor
[   16.321174] CPU3 is up
	(hangs here)

Implement a workaround for the erratum ERR005723: "PCIe does not support L2
Power Down", which consists in toggling bit 18 (TEST_POWERDOWN) of GPR1 register.

Tested on a mx6qsabresd TO1.2 revC2.

Reported-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
---
Richard,

Is there any better method to fix this issue? Or should we do the same
as in the FSL kernel?

I also tried to add pm ops into the imx6 pci driver and did the toggle there,
but I still got the kernel hang.

 arch/arm/mach-imx/pm-imx6.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

Comments

Shawn Guo Aug. 7, 2014, 2:18 a.m. UTC | #1
On Wed, Aug 06, 2014 at 09:39:58PM -0300, Fabio Estevam wrote:
> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> When PCI is used and a suspend/resume sequence is done we see the following
> kernel hang:
> 
> root@freescale /$ echo mem > /sys/power/state 
> [   16.099018] PM: Syncing filesystems ... done.
> [   16.141010] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [   16.150840] Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
> [   16.199438] random: nonblocking pool is initialized
> [   16.229639] PM: suspend of devices complete after 64.793 msecs
> [   16.235488] PM: suspend devices took 0.070 seconds
> [   16.245301] PM: late suspend of devices complete after 4.968 msecs
> [   16.257063] PM: noirq suspend of devices complete after 5.538 msecs
> [   16.263425] Disabling non-boot CPUs ...
> [   16.274666] CPU1: shutdown
> [   16.286351] CPU2: shutdown
> [   16.294169] CPU3: shutdown
> [   16.299551] Enabling non-boot CPUs ...
> [   16.304155] CPU1: Booted secondary processor
> [   16.305717] CPU1 is up
> [   16.313078] CPU2: Booted secondary processor
> [   16.313456] CPU2 is up
> [   16.320778] CPU3: Booted secondary processor
> [   16.321174] CPU3 is up
> 	(hangs here)
> 
> Implement a workaround for the erratum ERR005723: "PCIe does not support L2
> Power Down", which consists in toggling bit 18 (TEST_POWERDOWN) of GPR1 register.
> 
> Tested on a mx6qsabresd TO1.2 revC2.

I intend to agree with Lucas that we should align our testing and
understanding on the issue before trying to fix it.

Again, I do not see this issue on my TO1.2 chip, but only TO1.5,
aka Rev 1.3, one.

Shawn

> 
> Reported-by: Shawn Guo <shawn.guo@freescale.com>
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Fabio Estevam Aug. 7, 2014, 2:52 a.m. UTC | #2
On Wed, Aug 6, 2014 at 11:18 PM, Shawn Guo <shawn.guo@freescale.com> wrote:

> I intend to agree with Lucas that we should align our testing and
> understanding on the issue before trying to fix it.
>
> Again, I do not see this issue on my TO1.2 chip, but only TO1.5,
> aka Rev 1.3, one.

Which version is your sabresd with TO1.2? Mine is rev C2 and the
suspend/resume issue happens 100%.
Shawn Guo Aug. 7, 2014, 3:08 a.m. UTC | #3
On Wed, Aug 06, 2014 at 11:52:29PM -0300, Fabio Estevam wrote:
> On Wed, Aug 6, 2014 at 11:18 PM, Shawn Guo <shawn.guo@freescale.com> wrote:
> 
> > I intend to agree with Lucas that we should align our testing and
> > understanding on the issue before trying to fix it.
> >
> > Again, I do not see this issue on my TO1.2 chip, but only TO1.5,
> > aka Rev 1.3, one.
> 
> Which version is your sabresd with TO1.2? Mine is rev C2 and the
> suspend/resume issue happens 100%.

The label on the board says it's Rev B.

Shawn
Fabio Estevam Aug. 7, 2014, 3:24 a.m. UTC | #4
On Thu, Aug 7, 2014 at 12:08 AM, Shawn Guo <shawn.guo@freescale.com> wrote:
> On Wed, Aug 06, 2014 at 11:52:29PM -0300, Fabio Estevam wrote:
>> On Wed, Aug 6, 2014 at 11:18 PM, Shawn Guo <shawn.guo@freescale.com> wrote:
>>
>> > I intend to agree with Lucas that we should align our testing and
>> > understanding on the issue before trying to fix it.
>> >
>> > Again, I do not see this issue on my TO1.2 chip, but only TO1.5,
>> > aka Rev 1.3, one.
>>
>> Which version is your sabresd with TO1.2? Mine is rev C2 and the
>> suspend/resume issue happens 100%.
>
> The label on the board says it's Rev B.

If I remember correctly you are testing suspend/resume with no PCI
card connected to the TO1.2 board, right?

I have just tried suspend/resume without any PCI card connect and it
does work fine.

Only when I have the PCI card connected I get the hang.
Shawn Guo Aug. 7, 2014, 3:30 a.m. UTC | #5
On Thu, Aug 07, 2014 at 12:24:37AM -0300, Fabio Estevam wrote:
> On Thu, Aug 7, 2014 at 12:08 AM, Shawn Guo <shawn.guo@freescale.com> wrote:
> > On Wed, Aug 06, 2014 at 11:52:29PM -0300, Fabio Estevam wrote:
> >> On Wed, Aug 6, 2014 at 11:18 PM, Shawn Guo <shawn.guo@freescale.com> wrote:
> >>
> >> > I intend to agree with Lucas that we should align our testing and
> >> > understanding on the issue before trying to fix it.
> >> >
> >> > Again, I do not see this issue on my TO1.2 chip, but only TO1.5,
> >> > aka Rev 1.3, one.
> >>
> >> Which version is your sabresd with TO1.2? Mine is rev C2 and the
> >> suspend/resume issue happens 100%.
> >
> > The label on the board says it's Rev B.
> 
> If I remember correctly you are testing suspend/resume with no PCI
> card connected to the TO1.2 board, right?

Right.  I do not have any PCIe device on hands, so all my testing are
without PCIe device connected, and TO1.2 works while TO1.5 doesn't.

Shawn

> 
> I have just tried suspend/resume without any PCI card connect and it
> does work fine.
> 
> Only when I have the PCI card connected I get the hang.
Lucas Stach Aug. 7, 2014, 10:04 a.m. UTC | #6
Am Mittwoch, den 06.08.2014, 21:39 -0300 schrieb Fabio Estevam:
> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> When PCI is used and a suspend/resume sequence is done we see the following
> kernel hang:
> 
> root@freescale /$ echo mem > /sys/power/state 
> [   16.099018] PM: Syncing filesystems ... done.
> [   16.141010] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [   16.150840] Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
> [   16.199438] random: nonblocking pool is initialized
> [   16.229639] PM: suspend of devices complete after 64.793 msecs
> [   16.235488] PM: suspend devices took 0.070 seconds
> [   16.245301] PM: late suspend of devices complete after 4.968 msecs
> [   16.257063] PM: noirq suspend of devices complete after 5.538 msecs
> [   16.263425] Disabling non-boot CPUs ...
> [   16.274666] CPU1: shutdown
> [   16.286351] CPU2: shutdown
> [   16.294169] CPU3: shutdown
> [   16.299551] Enabling non-boot CPUs ...
> [   16.304155] CPU1: Booted secondary processor
> [   16.305717] CPU1 is up
> [   16.313078] CPU2: Booted secondary processor
> [   16.313456] CPU2 is up
> [   16.320778] CPU3: Booted secondary processor
> [   16.321174] CPU3 is up
> 	(hangs here)
> 
> Implement a workaround for the erratum ERR005723: "PCIe does not support L2
> Power Down", which consists in toggling bit 18 (TEST_POWERDOWN) of GPR1 register.
> 
> Tested on a mx6qsabresd TO1.2 revC2.
> 
> Reported-by: Shawn Guo <shawn.guo@freescale.com>
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>

NACK

To my understanding the DW PCIe core needs a specific sequence to
enter/exit L2 at suspend that we currently ignore. Contrary to the
errata description I think it can be done without a dedicated core reset
signal. Hitting the PHY PD is a sledgehammer without proper
understanding of the problem.

Also hiding such things in the platform code seems really bad. Please
keep in mind that this is the mainline kernel, not a vendor downstream
tree where you are free to smash things all over the place.

Regards,
Lucas
Fabio Estevam Aug. 7, 2014, 11:55 a.m. UTC | #7
On Thu, Aug 7, 2014 at 7:04 AM, Lucas Stach <l.stach@pengutronix.de> wrote:

> Also hiding such things in the platform code seems really bad. Please
> keep in mind that this is the mainline kernel, not a vendor downstream
> tree where you are free to smash things all over the place.

Sure, I should have marked it as RFC. I also asked for a better way to
handle this issue below the --- line.
Bjørn Erik Nilsen Sept. 30, 2014, 10:12 a.m. UTC | #8
Hi Fabio,

On Wed, 2014-08-06 at 21:39 -0300, an unknown sender wrote:
> From: Fabio Estevam <fabio.estevam at freescale.com>

> 

> When PCI is used and a suspend/resume sequence is done we see the following

> kernel hang:

> 

> root at freescale /$ echo mem > /sys/power/state 

> [   16.099018] PM: Syncing filesystems ... done.

> [   16.141010] Freezing user space processes ... (elapsed 0.002 seconds) done.

> [   16.150840] Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.

> [   16.199438] random: nonblocking pool is initialized

> [   16.229639] PM: suspend of devices complete after 64.793 msecs

> [   16.235488] PM: suspend devices took 0.070 seconds

> [   16.245301] PM: late suspend of devices complete after 4.968 msecs

> [   16.257063] PM: noirq suspend of devices complete after 5.538 msecs

> [   16.263425] Disabling non-boot CPUs ...

> [   16.274666] CPU1: shutdown

> [   16.286351] CPU2: shutdown

> [   16.294169] CPU3: shutdown

> [   16.299551] Enabling non-boot CPUs ...

> [   16.304155] CPU1: Booted secondary processor

> [   16.305717] CPU1 is up

> [   16.313078] CPU2: Booted secondary processor

> [   16.313456] CPU2 is up

> [   16.320778] CPU3: Booted secondary processor

> [   16.321174] CPU3 is up

> 	(hangs here)

> 

> Implement a workaround for the erratum ERR005723: "PCIe does not support L2

> Power Down", which consists in toggling bit 18 (TEST_POWERDOWN) of GPR1 register.

> 

> Tested on a mx6qsabresd TO1.2 revC2.

> 

> Reported-by: Shawn Guo <shawn.guo at freescale.com>

> Signed-off-by: Fabio Estevam <fabio.estevam at freescale.com>


Tested-by: Bjørn Erik Nilsen <ben@datarespons.no>


I encountered exactly the same problem and can confirm your patch works.

Thank you!

I hope this patch (or one that is considered a proper solution) will
make it to the mainline kernel any time soon.


Best regards,
Bjørn Erik Nilsen

> ---

> Richard,

> 

> Is there any better method to fix this issue? Or should we do the same

> as in the FSL kernel?

> 

> I also tried to add pm ops into the imx6 pci driver and did the toggle there,

> but I still got the kernel hang.

> 

>  arch/arm/mach-imx/pm-imx6.c | 27 +++++++++++++++++++++++++++

>  1 file changed, 27 insertions(+)

> 

> diff --git a/arch/arm/mach-imx/pm-imx6.c b/arch/arm/mach-imx/pm-imx6.c

> index 5c3af8f..92511cf 100644

> --- a/arch/arm/mach-imx/pm-imx6.c

> +++ b/arch/arm/mach-imx/pm-imx6.c

> @@ -341,6 +341,23 @@ static int imx6q_suspend_finish(unsigned long val)

>  

>  static int imx6q_pm_enter(suspend_state_t state)

>  {

> +	struct regmap *gpr;

> +	gpr = syscon_regmap_lookup_by_compatible("fsl,imx6q-iomuxc-gpr");

> +	if (IS_ERR(gpr)) {

> +		pr_err("failed to find fsl,imx6q-iomux-gpr regmap\n");

> +		return PTR_ERR(gpr);

> +	}

> +

> +	/*

> +	 * L2 can exit by 'reset' or Inband beacon (from remote EP)

> +	 * toggling phy_powerdown has same effect as 'inband beacon'

> +	 * So, toggle bit18 of GPR1 as a workaround of erratum

> +	 * ERR005723 - "PCIe does not support L2 Power Down"

> +	 */

> +	if (IS_ENABLED(CONFIG_PCI_IMX6) && !cpu_is_imx6sx())

> +		regmap_update_bits(gpr, IOMUXC_GPR1, IMX6Q_GPR1_PCIE_TEST_PD,

> +				   IMX6Q_GPR1_PCIE_TEST_PD);

> +

>  	switch (state) {

>  	case PM_SUSPEND_STANDBY:

>  		imx6q_set_lpm(STOP_POWER_ON);

> @@ -383,6 +400,16 @@ static int imx6q_pm_enter(suspend_state_t state)

>  		return -EINVAL;

>  	}

>  

> +	/*

> +	 * L2 can exit by 'reset' or Inband beacon (from remote EP)

> +	 * toggling phy_powerdown has same effect as 'inband beacon'

> +	 * So, toggle bit18 of GPR1 as a workaround of erratum

> +	 * ERR005723 - "PCIe does not support L2 Power Down"

> +	 */

> +	if (IS_ENABLED(CONFIG_PCI_IMX6) && !cpu_is_imx6sx())

> +		regmap_update_bits(gpr, IOMUXC_GPR1, IMX6Q_GPR1_PCIE_TEST_PD,

> +				   !IMX6Q_GPR1_PCIE_TEST_PD);

> +

>  	return 0;

>  }

>
Fabio Estevam Sept. 30, 2014, 11:42 a.m. UTC | #9
Hi Bjørn,

On Tue, Sep 30, 2014 at 7:12 AM, Bjørn Erik Nilsen <ben@datarespons.no> wrote:
> Hi Fabio,
>
> On Wed, 2014-08-06 at 21:39 -0300, an unknown sender wrote:
>> From: Fabio Estevam <fabio.estevam at freescale.com>
>>
>> When PCI is used and a suspend/resume sequence is done we see the following
>> kernel hang:
>>
>> root at freescale /$ echo mem > /sys/power/state
>> [   16.099018] PM: Syncing filesystems ... done.
>> [   16.141010] Freezing user space processes ... (elapsed 0.002 seconds) done.
>> [   16.150840] Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
>> [   16.199438] random: nonblocking pool is initialized
>> [   16.229639] PM: suspend of devices complete after 64.793 msecs
>> [   16.235488] PM: suspend devices took 0.070 seconds
>> [   16.245301] PM: late suspend of devices complete after 4.968 msecs
>> [   16.257063] PM: noirq suspend of devices complete after 5.538 msecs
>> [   16.263425] Disabling non-boot CPUs ...
>> [   16.274666] CPU1: shutdown
>> [   16.286351] CPU2: shutdown
>> [   16.294169] CPU3: shutdown
>> [   16.299551] Enabling non-boot CPUs ...
>> [   16.304155] CPU1: Booted secondary processor
>> [   16.305717] CPU1 is up
>> [   16.313078] CPU2: Booted secondary processor
>> [   16.313456] CPU2 is up
>> [   16.320778] CPU3: Booted secondary processor
>> [   16.321174] CPU3 is up
>>       (hangs here)
>>
>> Implement a workaround for the erratum ERR005723: "PCIe does not support L2
>> Power Down", which consists in toggling bit 18 (TEST_POWERDOWN) of GPR1 register.
>>
>> Tested on a mx6qsabresd TO1.2 revC2.
>>
>> Reported-by: Shawn Guo <shawn.guo at freescale.com>
>> Signed-off-by: Fabio Estevam <fabio.estevam at freescale.com>
>
> Tested-by: Bjørn Erik Nilsen <ben@datarespons.no>
>
> I encountered exactly the same problem and can confirm your patch works.
>
> Thank you!
>
> I hope this patch (or one that is considered a proper solution) will
> make it to the mainline kernel any time soon.

Thanks for testing.

This is a workaround found on FSL kernel. We still need to come up
with a proper fix for this PCI suspend/resume issue.
Fabio Estevam Oct. 22, 2014, 5:13 p.m. UTC | #10
Hi Bjørn,

On Tue, Sep 30, 2014 at 8:42 AM, Fabio Estevam <festevam@gmail.com> wrote:

> Thanks for testing.
>
> This is a workaround found on FSL kernel. We still need to come up
> with a proper fix for this PCI suspend/resume issue.

Just tried 3.18.0-rc1 and I don't see this PCI suspend/resume hang
anymore on a mx6qsabresd revC2 with silicon TO1.2.

It does hang on an old mx6sabresd revB with TO1.1 silicon though.

Can you try with 3.18-rc1?
diff mbox

Patch

diff --git a/arch/arm/mach-imx/pm-imx6.c b/arch/arm/mach-imx/pm-imx6.c
index 5c3af8f..92511cf 100644
--- a/arch/arm/mach-imx/pm-imx6.c
+++ b/arch/arm/mach-imx/pm-imx6.c
@@ -341,6 +341,23 @@  static int imx6q_suspend_finish(unsigned long val)
 
 static int imx6q_pm_enter(suspend_state_t state)
 {
+	struct regmap *gpr;
+	gpr = syscon_regmap_lookup_by_compatible("fsl,imx6q-iomuxc-gpr");
+	if (IS_ERR(gpr)) {
+		pr_err("failed to find fsl,imx6q-iomux-gpr regmap\n");
+		return PTR_ERR(gpr);
+	}
+
+	/*
+	 * L2 can exit by 'reset' or Inband beacon (from remote EP)
+	 * toggling phy_powerdown has same effect as 'inband beacon'
+	 * So, toggle bit18 of GPR1 as a workaround of erratum
+	 * ERR005723 - "PCIe does not support L2 Power Down"
+	 */
+	if (IS_ENABLED(CONFIG_PCI_IMX6) && !cpu_is_imx6sx())
+		regmap_update_bits(gpr, IOMUXC_GPR1, IMX6Q_GPR1_PCIE_TEST_PD,
+				   IMX6Q_GPR1_PCIE_TEST_PD);
+
 	switch (state) {
 	case PM_SUSPEND_STANDBY:
 		imx6q_set_lpm(STOP_POWER_ON);
@@ -383,6 +400,16 @@  static int imx6q_pm_enter(suspend_state_t state)
 		return -EINVAL;
 	}
 
+	/*
+	 * L2 can exit by 'reset' or Inband beacon (from remote EP)
+	 * toggling phy_powerdown has same effect as 'inband beacon'
+	 * So, toggle bit18 of GPR1 as a workaround of erratum
+	 * ERR005723 - "PCIe does not support L2 Power Down"
+	 */
+	if (IS_ENABLED(CONFIG_PCI_IMX6) && !cpu_is_imx6sx())
+		regmap_update_bits(gpr, IOMUXC_GPR1, IMX6Q_GPR1_PCIE_TEST_PD,
+				   !IMX6Q_GPR1_PCIE_TEST_PD);
+
 	return 0;
 }