diff mbox

PCI: Revert "PCI: Add runtime PM support for PCIe ports"

Message ID 20170103095158.GH3353@lahna.fi.intel.com
State Not Applicable
Headers show

Commit Message

Mika Westerberg Jan. 3, 2017, 9:51 a.m. UTC
On Mon, Jan 02, 2017 at 10:31:07PM +0100, Rafael J. Wysocki wrote:
> On Monday, January 02, 2017 04:48:52 PM Mika Westerberg wrote:
> > On Mon, Jan 02, 2017 at 02:10:19PM +0200, Mika Westerberg wrote:
> > > I've checked the acpidump of this machine and it does not seem to be a
> > > traditional Optimus machine. At least this one is missing the magic _DSM
> > > which is used to gather capabilities of the graphics device.
> > > 
> > > However, it does have _PR3 and it is attached to the device
> > > (_SB.PCI0.PEG) itself, not the root port.
> > 
> > Nah, actually PEG is the root port. So it certainly looks like
> > a traditional Optimus machine.
> 
> So can we quirk that thing somehow and see if that helps (for debugging
> purposes at least)?

I was kind of hoping disabling D3cold would do that (prevent it from
turning off power resources). But we can also just force it to use _DSM
instead and see if it makes a difference.

I guess the reason why keyboard and mouse become unresponsive is because
the driver tries to resume the device and hogs the CPU. At least it
looks like so from the dmesg in comment 27 (of the bugzilla bug) where
NMI watchdog is triggered.

Since this might be related to nouveau, adding Peter Wu to the loop.
Peter the bug in question is https://bugzilla.kernel.org/show_bug.cgi?id=190861.

Kilian, can you try the following hack as well?

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Peter Wu Jan. 3, 2017, 3:15 p.m. UTC | #1
(replying to earlier comments in the thread:)

Changing (lowering?) the cut-off date would not help as the laptop has
DMI year 2016. (For the long-term, it would probably be desirable to
lower the date or otherwise add detection of _PR3, see
https://bugs.freedesktop.org/show_bug.cgi?id=98505#c23).

Reverting the patch is not a good idea either, it would reintroduce the
memory corruption that have plagued some Lenovo models
(https://bugs.freedesktop.org/show_bug.cgi?id=78530).

On Tue, Jan 03, 2017 at 11:51:58AM +0200, Mika Westerberg wrote:
> On Mon, Jan 02, 2017 at 10:31:07PM +0100, Rafael J. Wysocki wrote:
> > On Monday, January 02, 2017 04:48:52 PM Mika Westerberg wrote:
> > > On Mon, Jan 02, 2017 at 02:10:19PM +0200, Mika Westerberg wrote:
> > > > I've checked the acpidump of this machine and it does not seem to be a
> > > > traditional Optimus machine. At least this one is missing the magic _DSM
> > > > which is used to gather capabilities of the graphics device.
> > > > 
> > > > However, it does have _PR3 and it is attached to the device
> > > > (_SB.PCI0.PEG) itself, not the root port.
> > > 
> > > Nah, actually PEG is the root port. So it certainly looks like
> > > a traditional Optimus machine.
> > 
> > So can we quirk that thing somehow and see if that helps (for debugging
> > purposes at least)?
> 
> I was kind of hoping disabling D3cold would do that (prevent it from
> turning off power resources). But we can also just force it to use _DSM
> instead and see if it makes a difference.

Disabling d3cold that way might be too late due to the short RPM suspend
delay. You would need a udev rule to activate this ASAP. E.g., create
/etc/udev/rules.d/42-nvidia-rpm.rules with:

    SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", ATTR{power/d3cold_allowed}="0"

This disables D3cold on the child device (which should also prevent the
parent PCIe port from using D3cold).

Alternatively, can you try to boot with nouveau.runpm=0 and see if it
makes any difference? When runpm is disabled, then the PCIe port and
Nvidia device should not be suspended and therefore prevent the issue
from being triggered.

> I guess the reason why keyboard and mouse become unresponsive is because
> the driver tries to resume the device and hogs the CPU. At least it
> looks like so from the dmesg in comment 27 (of the bugzilla bug) where
> NMI watchdog is triggered.
> 
> Since this might be related to nouveau, adding Peter Wu to the loop.
> Peter the bug in question is https://bugzilla.kernel.org/show_bug.cgi?id=190861.

Kilian, in the bug you had the issue with Firefox. The trace suggests
that runtime resume was triggered, so you should have this problem too
when using lspci. Can you try:

 1. Switch to a text console (e.g. Ctrl-Alt-F2).
 2. sleep 5; lspci

If that command does not return immediately, you likely have triggered
the same issue.

The acpidump from the bug does not show known issues, it *looks* fine.
There have been other issues related to resuming power on newer Nvidia
hardware (https://bugs.freedesktop.org/show_bug.cgi?id=94725,
https://bugzilla.kernel.org/show_bug.cgi?id=156341) but there is not
much progress here.  (The last time I traced the PCIe register accesses
(via kprobes) and tried to disable some of those, it still did not help
with preventing the power issue.)

> Kilian, can you try the following hack as well?
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> index 193573d191e5..50482d5c8072 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> @@ -282,7 +282,7 @@ static void nouveau_dsm_pci_probe(struct pci_dev *pdev, acpi_handle *dhandle_out
>  			 (result & OPTIMUS_DYNAMIC_PWR_CAP) ? "dynamic power, " : "",
>  			 (result & OPTIMUS_HDA_CODEC_MASK) ? "hda bios codec supported" : "");
>  
> -		*has_pr3 = nouveau_pr3_present(pdev);
> +//		*has_pr3 = nouveau_pr3_present(pdev);
>  	}
>  }
>  

This would not disable D3cold support and as a result both PR3 and DSM
would be active. Try the above with this line added to force DSM:

    pci_d3cold_disable(pdev);

(This should have the same effect as setting d3cold_allowed=0.)
Lukas Wunner Jan. 3, 2017, 4:11 p.m. UTC | #2
[cc += Dave Airlie:

Dave, we're about to lose support for newer Optimus laptops which use
_PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
a commit on his for-linus branch to remove runtime PM for PCIe ports.
This fixes a regression on Kilian Singer's laptop on which locking the
screen breaks USB and PS/2 input devices:  Mouse movements are still
visible, but button or key presses no longer have any effect.  The GPU
is powered down upon locking the screen and the current theory is that
this causes the issues.]

On Tue, Jan 03, 2017 at 04:15:47PM +0100, Peter Wu wrote:
> The acpidump from the bug does not show known issues, it *looks* fine.
> There have been other issues related to resuming power on newer Nvidia
> hardware (https://bugs.freedesktop.org/show_bug.cgi?id=94725,
> https://bugzilla.kernel.org/show_bug.cgi?id=156341) but there is not
> much progress here.  (The last time I traced the PCIe register accesses
> (via kprobes) and tried to disable some of those, it still did not help
> with preventing the power issue.)

It seems that the _DSM method works on Kilian's laptop.  Would it be
viable to default to _DSM if it's available, and only use _PR3 if not?

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Wu Jan. 3, 2017, 4:31 p.m. UTC | #3
On Tue, Jan 03, 2017 at 05:11:23PM +0100, Lukas Wunner wrote:
> [cc += Dave Airlie:
> 
> Dave, we're about to lose support for newer Optimus laptops which use
> _PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
> a commit on his for-linus branch to remove runtime PM for PCIe ports.
> This fixes a regression on Kilian Singer's laptop on which locking the
> screen breaks USB and PS/2 input devices:  Mouse movements are still
> visible, but button or key presses no longer have any effect.  The GPU
> is powered down upon locking the screen and the current theory is that
> this causes the issues.]

(+cc Alex: this might affect amdgpu/radeon too.]

Bjorn, please reconsider the rpm patch. Reverting support would
introduce other regressions (see issues below) and make future
Thunderbolt work harder (according to Lukas). If Kilian's laptop has
issues, what about a "temporary" quirk?

> On Tue, Jan 03, 2017 at 04:15:47PM +0100, Peter Wu wrote:
> > The acpidump from the bug does not show known issues, it *looks* fine.
> > There have been other issues related to resuming power on newer Nvidia
> > hardware (https://bugs.freedesktop.org/show_bug.cgi?id=94725,
> > https://bugzilla.kernel.org/show_bug.cgi?id=156341) but there is not
> > much progress here.  (The last time I traced the PCIe register accesses
> > (via kprobes) and tried to disable some of those, it still did not help
> > with preventing the power issue.)
> 
> It seems that the _DSM method works on Kilian's laptop.  Would it be
> viable to default to _DSM if it's available, and only use _PR3 if not?

DSM should not be preferred when PR3 is available:

 - After MS introduced D3cold (PR3) support to Win8+, vendors are
   unlikely to test legacy DSM and the likelihood of breakage increases.
 - On one Lenovo laptop, the DSM method causes memory corruption while
   PR3 fixes this problem.
 - On some laptops, DSM keeps the fan on while PR3 stopped the noise.
 - On some laptops, DSM does not really power off the GPU and results in
   increased power consumption during runtime/system sleep. PR3 fully
   removes the power, as desired.
Deucher, Alexander Jan. 3, 2017, 4:44 p.m. UTC | #4
> -----Original Message-----
> From: Peter Wu [mailto:peter@lekensteyn.nl]
> Sent: Tuesday, January 03, 2017 11:32 AM
> To: Lukas Wunner
> Cc: Mika Westerberg; Rafael J. Wysocki; Kilian Singer; Bjorn Helgaas; linux-pci;
> Deucher, Alexander; Dave Airlie
> Subject: Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"
> 
> On Tue, Jan 03, 2017 at 05:11:23PM +0100, Lukas Wunner wrote:
> > [cc += Dave Airlie:
> >
> > Dave, we're about to lose support for newer Optimus laptops which use
> > _PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
> > a commit on his for-linus branch to remove runtime PM for PCIe ports.
> > This fixes a regression on Kilian Singer's laptop on which locking the
> > screen breaks USB and PS/2 input devices:  Mouse movements are still
> > visible, but button or key presses no longer have any effect.  The GPU
> > is powered down upon locking the screen and the current theory is that
> > this causes the issues.]
> 
> (+cc Alex: this might affect amdgpu/radeon too.]
> 
> Bjorn, please reconsider the rpm patch. Reverting support would
> introduce other regressions (see issues below) and make future
> Thunderbolt work harder (according to Lukas). If Kilian's laptop has
> issues, what about a "temporary" quirk?
> 
> > On Tue, Jan 03, 2017 at 04:15:47PM +0100, Peter Wu wrote:
> > > The acpidump from the bug does not show known issues, it *looks* fine.
> > > There have been other issues related to resuming power on newer
> Nvidia
> > > hardware (https://bugs.freedesktop.org/show_bug.cgi?id=94725,
> > > https://bugzilla.kernel.org/show_bug.cgi?id=156341) but there is not
> > > much progress here.  (The last time I traced the PCIe register accesses
> > > (via kprobes) and tried to disable some of those, it still did not help
> > > with preventing the power issue.)
> >
> > It seems that the _DSM method works on Kilian's laptop.  Would it be
> > viable to default to _DSM if it's available, and only use _PR3 if not?
> 
> DSM should not be preferred when PR3 is available:
> 
>  - After MS introduced D3cold (PR3) support to Win8+, vendors are
>    unlikely to test legacy DSM and the likelihood of breakage increases.
>  - On one Lenovo laptop, the DSM method causes memory corruption while
>    PR3 fixes this problem.
>  - On some laptops, DSM keeps the fan on while PR3 stopped the noise.
>  - On some laptops, DSM does not really power off the GPU and results in
>    increased power consumption during runtime/system sleep. PR3 fully
>    removes the power, as desired.

Yes, this will affect just about all AMD multi-GPU laptops from late 2013 onward.  I'd much prefer a temporary quirk for this specific laptop than to disable PR3 for everything.

Alex

> --
> Kind regards,
> Peter Wu
> https://lekensteyn.nl
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Jan. 3, 2017, 5:37 p.m. UTC | #5
Iapplied the patch to noveau_acpi.c with the additional
pci_d3cold_disable(pdev);
right after 
//		*has_pr3 = nouveau_pr3_present(pdev);
and both firefox and screen lock issue are resolved.



----- Original Message -----
From: "Peter Wu" <peter@lekensteyn.nl>
To: "Mika Westerberg" <mika.westerberg@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>, "Lukas Wunner" <lukas@wunner.de>, "Kilian Singer" <kilian.singer@quantumtechnology.info>, "Bjorn Helgaas" <helgaas@kernel.org>, "linux-pci" <linux-pci@vger.kernel.org>
Sent: Tuesday, January 3, 2017 4:15:47 PM
Subject: Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

(replying to earlier comments in the thread:)

Changing (lowering?) the cut-off date would not help as the laptop has
DMI year 2016. (For the long-term, it would probably be desirable to
lower the date or otherwise add detection of _PR3, see
https://bugs.freedesktop.org/show_bug.cgi?id=98505#c23).

Reverting the patch is not a good idea either, it would reintroduce the
memory corruption that have plagued some Lenovo models
(https://bugs.freedesktop.org/show_bug.cgi?id=78530).

On Tue, Jan 03, 2017 at 11:51:58AM +0200, Mika Westerberg wrote:
> On Mon, Jan 02, 2017 at 10:31:07PM +0100, Rafael J. Wysocki wrote:
> > On Monday, January 02, 2017 04:48:52 PM Mika Westerberg wrote:
> > > On Mon, Jan 02, 2017 at 02:10:19PM +0200, Mika Westerberg wrote:
> > > > I've checked the acpidump of this machine and it does not seem to be a
> > > > traditional Optimus machine. At least this one is missing the magic _DSM
> > > > which is used to gather capabilities of the graphics device.
> > > > 
> > > > However, it does have _PR3 and it is attached to the device
> > > > (_SB.PCI0.PEG) itself, not the root port.
> > > 
> > > Nah, actually PEG is the root port. So it certainly looks like
> > > a traditional Optimus machine.
> > 
> > So can we quirk that thing somehow and see if that helps (for debugging
> > purposes at least)?
> 
> I was kind of hoping disabling D3cold would do that (prevent it from
> turning off power resources). But we can also just force it to use _DSM
> instead and see if it makes a difference.

Disabling d3cold that way might be too late due to the short RPM suspend
delay. You would need a udev rule to activate this ASAP. E.g., create
/etc/udev/rules.d/42-nvidia-rpm.rules with:

    SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", ATTR{power/d3cold_allowed}="0"

This disables D3cold on the child device (which should also prevent the
parent PCIe port from using D3cold).

Alternatively, can you try to boot with nouveau.runpm=0 and see if it
makes any difference? When runpm is disabled, then the PCIe port and
Nvidia device should not be suspended and therefore prevent the issue
from being triggered.

> I guess the reason why keyboard and mouse become unresponsive is because
> the driver tries to resume the device and hogs the CPU. At least it
> looks like so from the dmesg in comment 27 (of the bugzilla bug) where
> NMI watchdog is triggered.
> 
> Since this might be related to nouveau, adding Peter Wu to the loop.
> Peter the bug in question is https://bugzilla.kernel.org/show_bug.cgi?id=190861.

Kilian, in the bug you had the issue with Firefox. The trace suggests
that runtime resume was triggered, so you should have this problem too
when using lspci. Can you try:

 1. Switch to a text console (e.g. Ctrl-Alt-F2).
 2. sleep 5; lspci

If that command does not return immediately, you likely have triggered
the same issue.

The acpidump from the bug does not show known issues, it *looks* fine.
There have been other issues related to resuming power on newer Nvidia
hardware (https://bugs.freedesktop.org/show_bug.cgi?id=94725,
https://bugzilla.kernel.org/show_bug.cgi?id=156341) but there is not
much progress here.  (The last time I traced the PCIe register accesses
(via kprobes) and tried to disable some of those, it still did not help
with preventing the power issue.)

> Kilian, can you try the following hack as well?
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> index 193573d191e5..50482d5c8072 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> @@ -282,7 +282,7 @@ static void nouveau_dsm_pci_probe(struct pci_dev *pdev, acpi_handle *dhandle_out
>  			 (result & OPTIMUS_DYNAMIC_PWR_CAP) ? "dynamic power, " : "",
>  			 (result & OPTIMUS_HDA_CODEC_MASK) ? "hda bios codec supported" : "");
>  
> -		*has_pr3 = nouveau_pr3_present(pdev);
> +//		*has_pr3 = nouveau_pr3_present(pdev);
>  	}
>  }
>  

This would not disable D3cold support and as a result both PR3 and DSM
would be active. Try the above with this line added to force DSM:

    pci_d3cold_disable(pdev);

(This should have the same effect as setting d3cold_allowed=0.)
Lukas Wunner Jan. 3, 2017, 6:09 p.m. UTC | #6
On Tue, Jan 03, 2017 at 05:31:30PM +0100, Peter Wu wrote:
> On Tue, Jan 03, 2017 at 05:11:23PM +0100, Lukas Wunner wrote:
> > On Tue, Jan 03, 2017 at 04:15:47PM +0100, Peter Wu wrote:
> > > The acpidump from the bug does not show known issues, it *looks* fine.
> > > There have been other issues related to resuming power on newer Nvidia
> > > hardware (https://bugs.freedesktop.org/show_bug.cgi?id=94725,
> > > https://bugzilla.kernel.org/show_bug.cgi?id=156341) but there is not
> > > much progress here.  (The last time I traced the PCIe register accesses
> > > (via kprobes) and tried to disable some of those, it still did not help
> > > with preventing the power issue.)
> > 
> > It seems that the _DSM method works on Kilian's laptop.  Would it be
> > viable to default to _DSM if it's available, and only use _PR3 if not?
> 
> DSM should not be preferred when PR3 is available:
> 
>  - After MS introduced D3cold (PR3) support to Win8+, vendors are
>    unlikely to test legacy DSM and the likelihood of breakage increases.
>  - On one Lenovo laptop, the DSM method causes memory corruption while
>    PR3 fixes this problem.
>  - On some laptops, DSM keeps the fan on while PR3 stopped the noise.
>  - On some laptops, DSM does not really power off the GPU and results in
>    increased power consumption during runtime/system sleep. PR3 fully
>    removes the power, as desired.

I see.  How about adding an "optimus" module_param to nouveau which
allows users to force either DSM or PR3 on machines where the method
selected by default doesn't work properly?  At least until we've
figured out how to always select the correct method, or have debugged
the remaining issues with PR3?

When selecting DSM, I guess pci_d3cold_disable() would have to be
called to prevent usage of both methods, right?

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 3, 2017, 6:12 p.m. UTC | #7
On Tue, Jan 03, 2017 at 05:31:30PM +0100, Peter Wu wrote:
> On Tue, Jan 03, 2017 at 05:11:23PM +0100, Lukas Wunner wrote:
> > [cc += Dave Airlie:
> > 
> > Dave, we're about to lose support for newer Optimus laptops which use
> > _PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
> > a commit on his for-linus branch to remove runtime PM for PCIe ports.
> > This fixes a regression on Kilian Singer's laptop on which locking the
> > screen breaks USB and PS/2 input devices:  Mouse movements are still
> > visible, but button or key presses no longer have any effect.  The GPU
> > is powered down upon locking the screen and the current theory is that
> > this causes the issues.]
> 
> (+cc Alex: this might affect amdgpu/radeon too.]
> 
> Bjorn, please reconsider the rpm patch. Reverting support would
> introduce other regressions (see issues below) and make future
> Thunderbolt work harder (according to Lukas). If Kilian's laptop has
> issues, what about a "temporary" quirk?

As I mentioned at the beginning, the outcome I'm hoping for is a patch
that fixes Kilian's laptop while preserving the runtime PM support.

As I also mentioned at the beginning, preserving the runtime PM
support at the expense of breaking Kilian's laptop is not one of the
options.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 3, 2017, 9:26 p.m. UTC | #8
On Tuesday, January 03, 2017 05:11:23 PM Lukas Wunner wrote:
> [cc += Dave Airlie:
> 
> Dave, we're about to lose support for newer Optimus laptops which use
> _PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
> a commit on his for-linus branch to remove runtime PM for PCIe ports.
> This fixes a regression on Kilian Singer's laptop on which locking the
> screen breaks USB and PS/2 input devices:

It doesn't fix the broken system suspend/resume on his system, though.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 3, 2017, 9:38 p.m. UTC | #9
On Tuesday, January 03, 2017 12:12:21 PM Bjorn Helgaas wrote:
> On Tue, Jan 03, 2017 at 05:31:30PM +0100, Peter Wu wrote:
> > On Tue, Jan 03, 2017 at 05:11:23PM +0100, Lukas Wunner wrote:
> > > [cc += Dave Airlie:
> > > 
> > > Dave, we're about to lose support for newer Optimus laptops which use
> > > _PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
> > > a commit on his for-linus branch to remove runtime PM for PCIe ports.
> > > This fixes a regression on Kilian Singer's laptop on which locking the
> > > screen breaks USB and PS/2 input devices:  Mouse movements are still
> > > visible, but button or key presses no longer have any effect.  The GPU
> > > is powered down upon locking the screen and the current theory is that
> > > this causes the issues.]
> > 
> > (+cc Alex: this might affect amdgpu/radeon too.]
> > 
> > Bjorn, please reconsider the rpm patch. Reverting support would
> > introduce other regressions (see issues below) and make future
> > Thunderbolt work harder (according to Lukas). If Kilian's laptop has
> > issues, what about a "temporary" quirk?
> 
> As I mentioned at the beginning, the outcome I'm hoping for is a patch
> that fixes Kilian's laptop while preserving the runtime PM support.
> 
> As I also mentioned at the beginning, preserving the runtime PM
> support at the expense of breaking Kilian's laptop is not one of the
> options.

But the revert doesn't really help.

It doesn't fix system suspend/resume on that laptop, which also breaks when
PCIe ports PM is enabled on it.

If you really want to use a sledgehammer approach here (which I don't recommend,
but that's your call), you can change the initial value of pci_bridge_d3_disable to
"true" (and update the pcie_ports_pm= command line to take "on" in case
someone wants to enable the feature).  That at least will take care of the
regression entirely and not just partly.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Jan. 3, 2017, 9:52 p.m. UTC | #10
I have not checked if suspend/resume broken.
Which patch should I check suspend resume for.
When I wrote about firefox and lockscreen issue resolved
I forgot to check about the suspend/resume.


On 03-Jan-17 22:38, Rafael J. Wysocki wrote:
> On Tuesday, January 03, 2017 12:12:21 PM Bjorn Helgaas wrote:
>> On Tue, Jan 03, 2017 at 05:31:30PM +0100, Peter Wu wrote:
>>> On Tue, Jan 03, 2017 at 05:11:23PM +0100, Lukas Wunner wrote:
>>>> [cc += Dave Airlie:
>>>>
>>>> Dave, we're about to lose support for newer Optimus laptops which use
>>>> _PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
>>>> a commit on his for-linus branch to remove runtime PM for PCIe ports.
>>>> This fixes a regression on Kilian Singer's laptop on which locking the
>>>> screen breaks USB and PS/2 input devices:  Mouse movements are still
>>>> visible, but button or key presses no longer have any effect.  The GPU
>>>> is powered down upon locking the screen and the current theory is that
>>>> this causes the issues.]
>>> (+cc Alex: this might affect amdgpu/radeon too.]
>>>
>>> Bjorn, please reconsider the rpm patch. Reverting support would
>>> introduce other regressions (see issues below) and make future
>>> Thunderbolt work harder (according to Lukas). If Kilian's laptop has
>>> issues, what about a "temporary" quirk?
>> As I mentioned at the beginning, the outcome I'm hoping for is a patch
>> that fixes Kilian's laptop while preserving the runtime PM support.
>>
>> As I also mentioned at the beginning, preserving the runtime PM
>> support at the expense of breaking Kilian's laptop is not one of the
>> options.
> But the revert doesn't really help.
>
> It doesn't fix system suspend/resume on that laptop, which also breaks when
> PCIe ports PM is enabled on it.
>
> If you really want to use a sledgehammer approach here (which I don't recommend,
> but that's your call), you can change the initial value of pci_bridge_d3_disable to
> "true" (and update the pcie_ports_pm= command line to take "on" in case
> someone wants to enable the feature).  That at least will take care of the
> regression entirely and not just partly.
>
> Thanks,
> Rafael
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 3, 2017, 10:07 p.m. UTC | #11
On Tuesday, January 03, 2017 10:52:48 PM Kilian Singer wrote:
> I have not checked if suspend/resume broken.
> Which patch should I check suspend resume for.
> When I wrote about firefox and lockscreen issue resolved
> I forgot to check about the suspend/resume.

Well, here:

https://bugzilla.kernel.org/show_bug.cgi?id=190861#c26

you said:

"Shutdown and suspend/resume work on 4.7.0-1 but fail on 4.8. and 4.9."

and here:

https://bugzilla.kernel.org/show_bug.cgi?id=190861#c29

you said:

"the pci_port_pm=off
fixes both the firefox issue and the lock screen issue. Also suspend/resume work.
Tested on 4.9"

which to me means that suspend/resume is also broken for you
due to the PCIe ports PM support enabled (the command line option
should be pcie_port_pm=off, BTW, but I'm assuming that this is what you
actually used).

Now, it may be broken due to the runtime PM breakage triggering after system
resume (which may be verified by checking if the revert actually fixes system
suspend-resume too on your machine), but even then I wouldn't revert that
particular commit.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 3, 2017, 10:25 p.m. UTC | #12
On Tue, Jan 03, 2017 at 10:38:24PM +0100, Rafael J. Wysocki wrote:
> On Tuesday, January 03, 2017 12:12:21 PM Bjorn Helgaas wrote:
> > On Tue, Jan 03, 2017 at 05:31:30PM +0100, Peter Wu wrote:
> > > On Tue, Jan 03, 2017 at 05:11:23PM +0100, Lukas Wunner wrote:
> > > > [cc += Dave Airlie:
> > > > 
> > > > Dave, we're about to lose support for newer Optimus laptops which use
> > > > _PR3 to cut power to the discrete GPU because Bjorn Helgaas has queued
> > > > a commit on his for-linus branch to remove runtime PM for PCIe ports.
> > > > This fixes a regression on Kilian Singer's laptop on which locking the
> > > > screen breaks USB and PS/2 input devices:  Mouse movements are still
> > > > visible, but button or key presses no longer have any effect.  The GPU
> > > > is powered down upon locking the screen and the current theory is that
> > > > this causes the issues.]
> > > 
> > > (+cc Alex: this might affect amdgpu/radeon too.]
> > > 
> > > Bjorn, please reconsider the rpm patch. Reverting support would
> > > introduce other regressions (see issues below) and make future
> > > Thunderbolt work harder (according to Lukas). If Kilian's laptop has
> > > issues, what about a "temporary" quirk?
> > 
> > As I mentioned at the beginning, the outcome I'm hoping for is a patch
> > that fixes Kilian's laptop while preserving the runtime PM support.
> > 
> > As I also mentioned at the beginning, preserving the runtime PM
> > support at the expense of breaking Kilian's laptop is not one of the
> > options.
> 
> But the revert doesn't really help.
> 
> It doesn't fix system suspend/resume on that laptop, which also breaks when
> PCIe ports PM is enabled on it.
> 
> If you really want to use a sledgehammer approach here (which I don't recommend,
> but that's your call), you can change the initial value of pci_bridge_d3_disable to
> "true" (and update the pcie_ports_pm= command line to take "on" in case
> someone wants to enable the feature).  That at least will take care of the
> regression entirely and not just partly.

What the heck is the problem here?  I'm not trying to be difficult,
but I didn't write this code and I'm not really interested in figuring
out how to fix it, so my only real option is to solicit fixes and, if
none appear, revert changes that break things.

As I've said more than once, I hope and expect that there is a better
solution than reverting the patch.  But *I* am not going to write it.
As soon as somebody proposes a better patch, I'll use it instead of
the revert.

If you want to fix the regression by changing the
pci_bridge_d3_disable value, all you have to do is post a patch doing
that.

I really don't understand why people are so wrapped around the axle
about this.  This is just the way Linux works -- we try really hard
not to cause regressions on platforms that used to work.  I *SAID* in
the very first posting of the revert that I assume Mika will have a
better solution soon.

When a better patch appears, I'll take that and drop the revert.
What's the problem with that?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Jan. 3, 2017, 10:25 p.m. UTC | #13
Yes that is true:

"Shutdown and suspend/resume work on 4.7.0-1 but fail on 4.8. and 4.9."


I just did not check if the proposed patches fix the suspend/resume.

On 03-Jan-17 23:07, Rafael J. Wysocki wrote:
> On Tuesday, January 03, 2017 10:52:48 PM Kilian Singer wrote:
>> I have not checked if suspend/resume broken.
>> Which patch should I check suspend resume for.
>> When I wrote about firefox and lockscreen issue resolved
>> I forgot to check about the suspend/resume.
> Well, here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=190861#c26
>
> you said:
>
> "Shutdown and suspend/resume work on 4.7.0-1 but fail on 4.8. and 4.9."
>
> and here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=190861#c29
>
> you said:
>
> "the pci_port_pm=off
> fixes both the firefox issue and the lock screen issue. Also suspend/resume work.
> Tested on 4.9"
>
> which to me means that suspend/resume is also broken for you
> due to the PCIe ports PM support enabled (the command line option
> should be pcie_port_pm=off, BTW, but I'm assuming that this is what you
> actually used).
>
> Now, it may be broken due to the runtime PM breakage triggering after system
> resume (which may be verified by checking if the revert actually fixes system
> suspend-resume too on your machine), but even then I wouldn't revert that
> particular commit.
>
> Thanks,
> Rafael
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c
index 193573d191e5..50482d5c8072 100644
--- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
@@ -282,7 +282,7 @@  static void nouveau_dsm_pci_probe(struct pci_dev *pdev, acpi_handle *dhandle_out
 			 (result & OPTIMUS_DYNAMIC_PWR_CAP) ? "dynamic power, " : "",
 			 (result & OPTIMUS_HDA_CODEC_MASK) ? "hda bios codec supported" : "");
 
-		*has_pr3 = nouveau_pr3_present(pdev);
+//		*has_pr3 = nouveau_pr3_present(pdev);
 	}
 }