mbox series

[v2,0/7] Modernize vga_switcheroo by using device link for HDA

Message ID cover.1520068884.git.lukas@wunner.de
Headers show
Series Modernize vga_switcheroo by using device link for HDA | expand

Message

Lukas Wunner March 3, 2018, 9:53 a.m. UTC
Modernize vga_switcheroo by using a device link to enforce a runtime PM
dependency from an HDA controller to the GPU it's integrated into, v2.

Changes since v1:

- Replace patch [1/7] to use pci_save_state() / pci_restore_state()
  for consistency between runtime PM code path of bound and unbound
  devices. (Rafael, Bjorn)

- Patch [5/7]: Drop an unnecessary initialization. (Bjorn)
  Rephrase error message on failed link addition for clarity.

Link to v1:
https://www.spinics.net/lists/dri-devel/msg165889.html

Testing on more machines would be greatly appreciated, particularly
Nvidia Optimus or AMD PowerXpress.

The series is based on 4.16-rc3.  To test it on 4.15, you need to
cherry-pick 7506dc798993 and 2fa6d6cdaf28.  For your convenience
I've pushed a 4.15-based branch to:
https://github.com/l1k/linux/commits/switcheroo_devlink_v2

Minimal test procedure:

- Note well: Recent Optimus require that a Mini-DP or HDMI cable is
  plugged in on boot for the HDA device to be present.

- Check that HDA, GPU and root port autosuspend when not in use:
  cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status  # HDA
  cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status  # GPU
  cat /sys/bus/pci/devices/0000:00:01.0/power/runtime_status  # Root Port

- Check that all three autoresume when accessing the HDA:
  hdajacksensetest -c 1

- Unbind the HDA controller:
  echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/unbind
  Wait for GPU to power off, then rebind the HDA controller:
  echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/bind
  Check dmesg for errors, try accessing HDA with hdajacksensetest.

- If your laptop uses the root port's _PR3 to cut power to the GPU:
  Unbind the GPU:
  echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/unbind
  Allow runtime PM on the GPU:
  echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control
  Wait for GPU to power off, then rebind it:
  echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/bind
  Check dmesg for errors.  If you see any then we may need to perform
  further actions in pci_pm_runtime_resume(), see patch [1/7].

Thanks,

Lukas


Lukas Wunner (6):
  PCI: Make pci_wakeup_bus() & pci_bus_set_current_state() public
  vga_switcheroo: Update PCI current_state on power change
  vga_switcheroo: Deduplicate power state tracking
  vga_switcheroo: Use device link for HDA controller
  vga_switcheroo: Let HDA autosuspend on mux change
  drm/nouveau: Runtime suspend despite HDA being unbound

Rafael J. Wysocki (1):
  PCI: Restore config space on runtime resume despite being unbound

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |   2 -
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  46 ----------
 drivers/gpu/drm/nouveau/nouveau_drv.h   |   1 -
 drivers/gpu/drm/radeon/radeon_drv.c     |   2 -
 drivers/gpu/vga/vga_switcheroo.c        | 152 ++++++++------------------------
 drivers/pci/pci-driver.c                |  17 ++--
 drivers/pci/pci.c                       |   8 +-
 drivers/pci/quirks.c                    |  39 ++++++++
 include/linux/pci.h                     |   2 +
 include/linux/pci_ids.h                 |   1 +
 include/linux/vga_switcheroo.h          |   6 --
 include/sound/hdaudio.h                 |   3 -
 sound/pci/hda/hda_intel.c               |  36 +++++---
 sound/pci/hda/hda_intel.h               |   3 -
 14 files changed, 117 insertions(+), 201 deletions(-)

Comments

Peter Wu March 5, 2018, 8:58 p.m. UTC | #1
Hi Lukas,

Sorry for the delay, I finally found some time to reviewd and test the
patches and found some issues (some of them might already be present in
v4.15 without your patches though, I did not try).

Test environment:
- Branch switcheroo_devlink_v2 (commit v4.15-20-gb33d50c5c6ad)
- Laptop: Clevo P651RA (nouveau uses PR3), lspci:
  00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
  01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 965M] [10de:13d9] (rev a1)
  01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
- Distribution: Arch Linux x86_64
- Tested from (1) console and (2) Xorg 1.19.3-2 (Openbox)
- Booted with HDMI cable connected and nouveau/snd-hda-intel unloaded.

To check the runtime PM status, I used this "rpm-status" script:

    grep --color . /sys/bus/pci/devices/0000:00:01.0/{0000:01:00.0/,0000:01:00.1/,}power/control
    grep --color . /sys/bus/pci/devices/0000:00:01.0/{0000:01:00.0/,0000:01:00.1/,}power/runtime_{enabled,usage,active_kids,status}


To test audio output (via HDMI; DP port does not seem to support audio):

    speaker-test -Dhdmi:CARD=NVidia,DEV=0 -c2

Below I first list some issues, then some good news.

Issue 1 - GPU does not suspend on text console.
When present at the text console and an external monitor is connected
through HDMI or DP, the RPM counter is 1. Only when the cable is removed
(or when "echo off > /sys/class/drm/card1-HDMI-A-1/status"), the RPM
count drops to 0 and the GPU device suspends. When Xorg is started
(startx), the RPM counter also drops to 0 though.

Issue 2 - RPM counter for audio function drops below 0 on system sleep.
When both nouveau and snd-hda-intel are loaded and a HDMI (or DP?) cable
is connected, the RPM counter becomes one after suspend/resume. This
happens both with text console and Xorg.

Issue 3 - invalid PCI config reads to audio device if disconnected.
When no HDMI/DP cable is connected, the HDMI audio function will be
inaccessible after runtime/system resume. Assume nouveau loaded before
s/r, then loading snd-hda-intel will fail with:

    0]: pam_unix(sudo:session): session closed for user root
    hdaudio hdaudioC1D0: no AFG or MFG node found
    hdaudio hdaudioC1D1: no AFG or MFG node found
    hdaudio hdaudioC1D2: no AFG or MFG node found
    hdaudio hdaudioC1D3: no AFG or MFG node found
    hdaudio hdaudioC1D4: no AFG or MFG node found
    hdaudio hdaudioC1D5: no AFG or MFG node found
    hdaudio hdaudioC1D6: no AFG or MFG node found
    hdaudio hdaudioC1D7: no AFG or MFG node found
    snd_hda_intel 0000:01:00.1: no codecs initialized

After rmmod snd-hda-intel and system suspend/resume:

    pci 0000:01:00.1: restoring config space at offset 0x3c (was 0xffffffff, writing 0x200)
    pci 0000:01:00.1: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
    pci 0000:01:00.1: restoring config space at offset 0x34 (was 0xffffffff, writing 0x60)

This persists until removing both PCI devices and rescanning the root
port. (When no HDMI/DP cable is connected, the audio function will not
appear; remove+rescan is required to recover.)

Issue 4 - runtime_active_kids leak with audio function.
After the above issue, the audio device never entered the suspended
state even though the runtime_usage counter reached 0. It turned out
that runtime_active_kids was 4. Every time snd-hda-intel is loaded (and
fails to initialize due to the above issue), this counter is increased.

Issue 5 - audio breaks after system sleep or stopping Xorg.
When Xorg is stopped or the system sleep/resumes while speaker-test is
active (e.g. in GNU screen), audio stops playing and speaker-test exits.

Issue 6 - wrong pin status reported / no audio
(Possibly "working as expected" since audio is tied to GPU function.)
Scenario: HDMI cable is connected but GPU is unused
("echo off > /sys/class/drm/card1-HDMI-1-1/status" from console or
with "xrandr --output HDMI-A-1 --off"). hdajacksensetest reports no
HDMI pin presence even if connected, dmesg reports:

    nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1

Using "speaker-test", the program does not fail but no sound can be
heard either.

Issue 7 - nouveau: warning after unloading after stopping Xorg
(Issue in nouveau, likely not related to this patch set.)
After "xrandr --output HDMI-1-1 --mode 2560x1440" in Xorg, stopping Xorg
(and possibly "echo off > /sys/class/drm/card1-HDMI-A-1/status"),
removing nouveau triggered:

    WARNING: CPU: 7 PID: 5475 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup+0x1fa/0x260
    CPU: 7 PID: 5475 Comm: rmmod Not tainted 4.15.0testing-00020-gb33d50c5c6ad #55
    RIP: 0010:drm_mode_config_cleanup+0x1fa/0x260
    [..]
    Call Trace:
     nouveau_display_destroy+0x41/0x80 [nouveau]
     nouveau_drm_unload+0x6b/0xd0 [nouveau]
     drm_dev_unregister+0x3c/0xe0
     drm_put_dev+0x2e/0x60
     nouveau_drm_device_remove+0x37/0x50 [nouveau]
     pci_device_remove+0x36/0xb0
     device_release_driver_internal+0x160/0x230
     driver_detach+0x3a/0x70
     bus_remove_driver+0x58/0xd0
     pci_unregister_driver+0x3b/0x90
     nouveau_drm_exit+0x15/0x432 [nouveau]
     SyS_delete_module+0x16c/0x230

Issue 8 - acpi: sleeping function in atomic context.
(Issue is likely not related to this patch set.)
At some point I also got a BUG, nouveau was already unloaded and I ran:
"echo 1 | tee /sys/bus/pci/devices/0000:01:00.{0,1}/remove"

    BUG: sleeping function called from invalid context at /home/peter/linux/mm/slab.h:419
    in_atomic(): 1, irqs_disabled(): 0, pid: 4844, name: kworker/3:4
    INFO: lockdep is turned off.
    CPU: 3 PID: 4844 Comm: kworker/3:4 Tainted: G        W        4.15.0testing-00020-gb33d50c5c6ad #55
    Hardware name: Notebook                         P65_P67RGRERA/P65_P67RGRERA, BIOS 1.05.16 05/16/2016
    Workqueue: events_power_efficient srcu_invoke_callbacks
    Call Trace:
     dump_stack+0x5f/0x86
     ___might_sleep+0x20c/0x240
     kmem_cache_alloc_trace+0x4d/0x230
     acpi_ut_evaluate_object+0x68/0x23c
     ? srcu_invoke_callbacks+0xa2/0x150
     acpi_rs_get_prt_method_data+0x42/0xa2
     acpi_get_irq_routing_table+0x70/0x9f
     ? __slab_free+0x11c/0x380
     acpi_pci_irq_find_prt_entry+0x83/0x330
     ? srcu_invoke_callbacks+0xa2/0x150
     acpi_pci_irq_lookup+0x27/0x2e0
     acpi_pci_irq_disable+0x45/0xb0
     pci_release_dev+0x29/0x60
     device_release+0x2d/0x80
     kobject_put+0xb7/0x190
     __device_link_free_srcu+0x32/0x40
     srcu_invoke_callbacks+0xba/0x150
     process_one_work+0x273/0x670
     worker_thread+0x4a/0x400
     kthread+0x100/0x140
     ? process_one_work+0x670/0x670
     ? kthread_create_worker_on_cpu+0x50/0x50
     ? do_syscall_64+0x56/0x1a0
     ? SyS_exit_group+0x10/0x10

Issue 9 - potential memory corruption.
At some point (possibly after issue 7, but I am not fully sure), I saw
an artifact in the text console which would persist even when switching
between consoles. It was gone after system sleep/resume. If I remember
correctly, it looked like something from a Xorg session which I killed
before.

That was the bad news, the good news:
- Loading nouveau and snd-hda-intel (in any order) while RPM is enabled
  and the port was in D3cold works.
- RPM interaction between audio and GPU seems good, when audio resumes,
  the GPU RPM counter increments, when audio suspends it decrements.
- As the GPU enters D3cold, I can observe significant power savings
  through /sys/class/power_supply/BAT0/ (no regressions here).
- In a default configuration I have no audio function (see also nouveau
  bug https://bugs.freedesktop.org/show_bug.cgi?id=75985) so most of the
  above issues should not occur.

Hope it helps, and if desired you can add:
Tested-by: Peter Wu <peter@lekensteyn.nl>

For the following patches, you can also add my Reviewed-by:

    vga_switcheroo: Update PCI current_state on power change
    vga_switcheroo: Deduplicate power state tracking
    vga_switcheroo: Use device link for HDA controller
    vga_switcheroo: Let HDA autosuspend on mux change
    drm/nouveau: Runtime suspend despite HDA being unbound

The two other PCI patches look fine as well.

Kind regards,
Peter

On Sat, Mar 03, 2018 at 10:53:24AM +0100, Lukas Wunner wrote:
> Modernize vga_switcheroo by using a device link to enforce a runtime PM
> dependency from an HDA controller to the GPU it's integrated into, v2.
> 
> Changes since v1:
> 
> - Replace patch [1/7] to use pci_save_state() / pci_restore_state()
>   for consistency between runtime PM code path of bound and unbound
>   devices. (Rafael, Bjorn)
> 
> - Patch [5/7]: Drop an unnecessary initialization. (Bjorn)
>   Rephrase error message on failed link addition for clarity.
> 
> Link to v1:
> https://www.spinics.net/lists/dri-devel/msg165889.html
> 
> Testing on more machines would be greatly appreciated, particularly
> Nvidia Optimus or AMD PowerXpress.
> 
> The series is based on 4.16-rc3.  To test it on 4.15, you need to
> cherry-pick 7506dc798993 and 2fa6d6cdaf28.  For your convenience
> I've pushed a 4.15-based branch to:
> https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> 
> Minimal test procedure:
> 
> - Note well: Recent Optimus require that a Mini-DP or HDMI cable is
>   plugged in on boot for the HDA device to be present.
> 
> - Check that HDA, GPU and root port autosuspend when not in use:
>   cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status  # HDA
>   cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status  # GPU
>   cat /sys/bus/pci/devices/0000:00:01.0/power/runtime_status  # Root Port
> 
> - Check that all three autoresume when accessing the HDA:
>   hdajacksensetest -c 1
> 
> - Unbind the HDA controller:
>   echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/unbind
>   Wait for GPU to power off, then rebind the HDA controller:
>   echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/bind
>   Check dmesg for errors, try accessing HDA with hdajacksensetest.
> 
> - If your laptop uses the root port's _PR3 to cut power to the GPU:
>   Unbind the GPU:
>   echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/unbind
>   Allow runtime PM on the GPU:
>   echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control
>   Wait for GPU to power off, then rebind it:
>   echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/bind
>   Check dmesg for errors.  If you see any then we may need to perform
>   further actions in pci_pm_runtime_resume(), see patch [1/7].
> 
> Thanks,
> 
> Lukas
> 
> 
> Lukas Wunner (6):
>   PCI: Make pci_wakeup_bus() & pci_bus_set_current_state() public
>   vga_switcheroo: Update PCI current_state on power change
>   vga_switcheroo: Deduplicate power state tracking
>   vga_switcheroo: Use device link for HDA controller
>   vga_switcheroo: Let HDA autosuspend on mux change
>   drm/nouveau: Runtime suspend despite HDA being unbound
> 
> Rafael J. Wysocki (1):
>   PCI: Restore config space on runtime resume despite being unbound
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |   2 -
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  46 ----------
>  drivers/gpu/drm/nouveau/nouveau_drv.h   |   1 -
>  drivers/gpu/drm/radeon/radeon_drv.c     |   2 -
>  drivers/gpu/vga/vga_switcheroo.c        | 152 ++++++++------------------------
>  drivers/pci/pci-driver.c                |  17 ++--
>  drivers/pci/pci.c                       |   8 +-
>  drivers/pci/quirks.c                    |  39 ++++++++
>  include/linux/pci.h                     |   2 +
>  include/linux/pci_ids.h                 |   1 +
>  include/linux/vga_switcheroo.h          |   6 --
>  include/sound/hdaudio.h                 |   3 -
>  sound/pci/hda/hda_intel.c               |  36 +++++---
>  sound/pci/hda/hda_intel.h               |   3 -
>  14 files changed, 117 insertions(+), 201 deletions(-)
> 
> -- 
> 2.15.1
>
Daniel Vetter March 6, 2018, 10:29 a.m. UTC | #2
On Sat, Mar 03, 2018 at 10:53:24AM +0100, Lukas Wunner wrote:
> Modernize vga_switcheroo by using a device link to enforce a runtime PM
> dependency from an HDA controller to the GPU it's integrated into, v2.
> 
> Changes since v1:
> 
> - Replace patch [1/7] to use pci_save_state() / pci_restore_state()
>   for consistency between runtime PM code path of bound and unbound
>   devices. (Rafael, Bjorn)
> 
> - Patch [5/7]: Drop an unnecessary initialization. (Bjorn)
>   Rephrase error message on failed link addition for clarity.
> 
> Link to v1:
> https://www.spinics.net/lists/dri-devel/msg165889.html
> 
> Testing on more machines would be greatly appreciated, particularly
> Nvidia Optimus or AMD PowerXpress.
> 
> The series is based on 4.16-rc3.  To test it on 4.15, you need to
> cherry-pick 7506dc798993 and 2fa6d6cdaf28.  For your convenience
> I've pushed a 4.15-based branch to:
> https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> 
> Minimal test procedure:
> 
> - Note well: Recent Optimus require that a Mini-DP or HDMI cable is
>   plugged in on boot for the HDA device to be present.
> 
> - Check that HDA, GPU and root port autosuspend when not in use:
>   cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status  # HDA
>   cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status  # GPU
>   cat /sys/bus/pci/devices/0000:00:01.0/power/runtime_status  # Root Port
> 
> - Check that all three autoresume when accessing the HDA:
>   hdajacksensetest -c 1
> 
> - Unbind the HDA controller:
>   echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/unbind
>   Wait for GPU to power off, then rebind the HDA controller:
>   echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/bind
>   Check dmesg for errors, try accessing HDA with hdajacksensetest.
> 
> - If your laptop uses the root port's _PR3 to cut power to the GPU:
>   Unbind the GPU:
>   echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/unbind
>   Allow runtime PM on the GPU:
>   echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control
>   Wait for GPU to power off, then rebind it:
>   echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/bind
>   Check dmesg for errors.  If you see any then we may need to perform
>   further actions in pci_pm_runtime_resume(), see patch [1/7].

This all looks really reasonable and like a good cleanup, but it's a bit
too much detail so I'll punt review to someone else with more clue.
-Daniel

> 
> Thanks,
> 
> Lukas
> 
> 
> Lukas Wunner (6):
>   PCI: Make pci_wakeup_bus() & pci_bus_set_current_state() public
>   vga_switcheroo: Update PCI current_state on power change
>   vga_switcheroo: Deduplicate power state tracking
>   vga_switcheroo: Use device link for HDA controller
>   vga_switcheroo: Let HDA autosuspend on mux change
>   drm/nouveau: Runtime suspend despite HDA being unbound
> 
> Rafael J. Wysocki (1):
>   PCI: Restore config space on runtime resume despite being unbound
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |   2 -
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  46 ----------
>  drivers/gpu/drm/nouveau/nouveau_drv.h   |   1 -
>  drivers/gpu/drm/radeon/radeon_drv.c     |   2 -
>  drivers/gpu/vga/vga_switcheroo.c        | 152 ++++++++------------------------
>  drivers/pci/pci-driver.c                |  17 ++--
>  drivers/pci/pci.c                       |   8 +-
>  drivers/pci/quirks.c                    |  39 ++++++++
>  include/linux/pci.h                     |   2 +
>  include/linux/pci_ids.h                 |   1 +
>  include/linux/vga_switcheroo.h          |   6 --
>  include/sound/hdaudio.h                 |   3 -
>  sound/pci/hda/hda_intel.c               |  36 +++++---
>  sound/pci/hda/hda_intel.h               |   3 -
>  14 files changed, 117 insertions(+), 201 deletions(-)
> 
> -- 
> 2.15.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Lukas Wunner March 10, 2018, 6:09 a.m. UTC | #3
Hi Peter,

thanks a million for the extensive testing and reviewing.

I'll go through the issues you've found below, but it appears to me
that they're all either in the "works as intended" category or are
caused by something unrelated to this series:


On Mon, Mar 05, 2018 at 09:58:31PM +0100, Peter Wu wrote:
> Issue 1 - GPU does not suspend on text console.
> When present at the text console and an external monitor is connected
> through HDMI or DP, the RPM counter is 1. Only when the cable is removed
> (or when "echo off > /sys/class/drm/card1-HDMI-A-1/status"), the RPM
> count drops to 0 and the GPU device suspends. When Xorg is started
> (startx), the RPM counter also drops to 0 though.

Yes, that behavior is intended:  The GPU device is kept resumed as
long as it has an active crtc, and it has an active crtc if it's
used as text console.


> Issue 2 - RPM counter for audio function drops below 0 on system sleep.
> When both nouveau and snd-hda-intel are loaded and a HDMI (or DP?) cable
> is connected, the RPM counter becomes one after suspend/resume. This
> happens both with text console and Xorg.

Which RPM counter becomes one?  That of the GPU?  If so, it would be
caused by the GPU activating a crtc on hotplug to light up the display,
hence staying resumed.

I don't quite understand what you mean by "RPM counter for audio
function drops below 0 on system sleep".  I've gone through the HDA
code and don't see an unbalanced pm_runtime_get / _put.  However the
PM core changes the usage count a couple of times over a system sleep
cycle by acquiring/releasing a runtime PM ref or enabling/disabling
runtime PM (see drivers/base/power/main.c), maybe that's what you
observed.


> Issue 3 - invalid PCI config reads to audio device if disconnected.
> When no HDMI/DP cable is connected, the HDMI audio function will be
> inaccessible after runtime/system resume. Assume nouveau loaded before
> s/r, then loading snd-hda-intel will fail with:
> 
>     hdaudio hdaudioC1D0: no AFG or MFG node found
>     hdaudio hdaudioC1D1: no AFG or MFG node found
>     hdaudio hdaudioC1D2: no AFG or MFG node found
>     hdaudio hdaudioC1D3: no AFG or MFG node found
>     hdaudio hdaudioC1D4: no AFG or MFG node found
>     hdaudio hdaudioC1D5: no AFG or MFG node found
>     hdaudio hdaudioC1D6: no AFG or MFG node found
>     hdaudio hdaudioC1D7: no AFG or MFG node found
>     snd_hda_intel 0000:01:00.1: no codecs initialized

Okay, that's really bad, but it appears to be caused by the BIOS
hiding the audio function on resume if no cable is connected.
I've attached 3 patches to this bugzilla to fix this issue:
https://bugs.freedesktop.org/show_bug.cgi?id=75985

I've extensively tested system sleep in conjunction with the
switcheroo_devlink patches and the above never occurred on my
machine because the audio function is never hidden.

I don't consider this issue a blocker for the switcheroo_devlink
patches because I would expect it to occur regardless.  Quite to
the contrary, the switcheroo_devlink patches are a requirement to
fix the issue because the audio function needs to be exposed
either from the GPU driver or a PCI quirk applied to the GPU,
and the audio function needs to delay resume until that has
happened, which is achieved by the device link.


> Issue 4 - runtime_active_kids leak with audio function.
> After the above issue, the audio device never entered the suspended
> state even though the runtime_usage counter reached 0. It turned out
> that runtime_active_kids was 4. Every time snd-hda-intel is loaded (and
> fails to initialize due to the above issue), this counter is increased.

Yes, the codec devices are created as children of the HDA device
and keep it awake because initialization failed.  This won't occur
once issue 3 is fixed.


> Issue 5 - audio breaks after system sleep or stopping Xorg.
> When Xorg is stopped or the system sleep/resumes while speaker-test is
> active (e.g. in GNU screen), audio stops playing and speaker-test exits.

That is odd, I don't have an explanation for this but suspect a bug
in the sound code.


> Issue 6 - wrong pin status reported / no audio
> (Possibly "working as expected" since audio is tied to GPU function.)
> Scenario: HDMI cable is connected but GPU is unused
> ("echo off > /sys/class/drm/card1-HDMI-1-1/status" from console or
> with "xrandr --output HDMI-A-1 --off"). hdajacksensetest reports no
> HDMI pin presence even if connected, dmesg reports:
> 
>     nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1

Interesting, and yes, that would seem to be working as expected.


> Issue 7 - nouveau: warning after unloading after stopping Xorg
> (Issue in nouveau, likely not related to this patch set.)
> After "xrandr --output HDMI-1-1 --mode 2560x1440" in Xorg, stopping Xorg
> (and possibly "echo off > /sys/class/drm/card1-HDMI-A-1/status"),
> removing nouveau triggered:
> 
>     WARNING: CPU: 7 PID: 5475 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup+0x1fa/0x260

Okay, the driver was unloaded while the connector was still active.
You should also see a "connector HDMI-1-1 leaked!" message in dmesg.
I thought I had fixed this two years ago with commit 523872f6b072
("drm/nouveau: Turn off CRTCs on driver unload").  Either that commit
didn't work as it should or the issue reappeared.  :-(

So yes, it's a bug, but unrelated to the switcheroo_devlink patches.


> Issue 8 - acpi: sleeping function in atomic context.
> (Issue is likely not related to this patch set.)
> At some point I also got a BUG, nouveau was already unloaded and I ran:
> "echo 1 | tee /sys/bus/pci/devices/0000:01:00.{0,1}/remove"
> 
>     BUG: sleeping function called from invalid context at /home/peter/linux/mm/slab.h:419

Wow, that would seem to be a bug in ACPI code, sorry, I'm clueless
about that one. :-)

Once again thanks and let me know if unhiding the audio function
as per the patches I attached to the above-linked bugzilla fixes
issue 3.

Lukas
Lukas Wunner March 11, 2018, 3:55 p.m. UTC | #4
On Tue, Mar 06, 2018 at 11:29:40AM +0100, Daniel Vetter wrote:
> On Sat, Mar 03, 2018 at 10:53:24AM +0100, Lukas Wunner wrote:
> > Modernize vga_switcheroo by using a device link to enforce a runtime PM
> > dependency from an HDA controller to the GPU it's integrated into, v2.
> > 
> > https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> 
> This all looks really reasonable and like a good cleanup, but it's a bit
> too much detail so I'll punt review to someone else with more clue.

Patches [3/7] to [7/7] were reviewed by Peter Wu, the HDA bits in
patch [5/7] additionally by Takashi.

Patch [2/7] was acked by Bjorn.  There was no ack for patch [1/7]
(authored by Rafael), but it adressed the objection Bjorn raised
against my original patch, so I'm assuming Bjorn is okay with it.
(Bjorn, please let me know if that isn't the case.)

The series has been tested on 5 systems, which raises the confidence:
2x AMD PowerXpress (Mike Lothian, Kai Heng Feng)
2x Nvidia Optimus (Denis Lisov, Peter Wu)
1x MacBook Pro

The issues found during Peter Wu's thorough testing appear to all
be unrelated to this series, as per my e-mail yesterday.

If there are no objections, I plan to push the series to
drm-misc-next by the middle of the coming week so that it
would still catch the last train to 4.17.

Thanks,

Lukas
Daniel Vetter March 12, 2018, 4:54 p.m. UTC | #5
On Sun, Mar 11, 2018 at 04:55:49PM +0100, Lukas Wunner wrote:
> On Tue, Mar 06, 2018 at 11:29:40AM +0100, Daniel Vetter wrote:
> > On Sat, Mar 03, 2018 at 10:53:24AM +0100, Lukas Wunner wrote:
> > > Modernize vga_switcheroo by using a device link to enforce a runtime PM
> > > dependency from an HDA controller to the GPU it's integrated into, v2.
> > > 
> > > https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> > 
> > This all looks really reasonable and like a good cleanup, but it's a bit
> > too much detail so I'll punt review to someone else with more clue.
> 
> Patches [3/7] to [7/7] were reviewed by Peter Wu, the HDA bits in
> patch [5/7] additionally by Takashi.
> 
> Patch [2/7] was acked by Bjorn.  There was no ack for patch [1/7]
> (authored by Rafael), but it adressed the objection Bjorn raised
> against my original patch, so I'm assuming Bjorn is okay with it.
> (Bjorn, please let me know if that isn't the case.)

Since it's written by someone else your s-o-b counts as full review (wrt
drm-misc rules at least as implemented by dim). So sounds like you have
review for all the bits.

> The series has been tested on 5 systems, which raises the confidence:
> 2x AMD PowerXpress (Mike Lothian, Kai Heng Feng)
> 2x Nvidia Optimus (Denis Lisov, Peter Wu)
> 1x MacBook Pro
> 
> The issues found during Peter Wu's thorough testing appear to all
> be unrelated to this series, as per my e-mail yesterday.
> 
> If there are no objections, I plan to push the series to
> drm-misc-next by the middle of the coming week so that it
> would still catch the last train to 4.17.

Please make sure all maintainers of other bits are ok with that and have
given their formal ack for merging through drm-misc. With that you have
until end of this week (but don't cut it too short) to sneak it into 4.17.
Otherwise just push and it'll land in 4.18.

I think you've got all the other pieces.
-Daniel
Bjorn Helgaas March 13, 2018, 5:36 p.m. UTC | #6
On Sun, Mar 11, 2018 at 04:55:49PM +0100, Lukas Wunner wrote:
> On Tue, Mar 06, 2018 at 11:29:40AM +0100, Daniel Vetter wrote:
> > On Sat, Mar 03, 2018 at 10:53:24AM +0100, Lukas Wunner wrote:
> > > Modernize vga_switcheroo by using a device link to enforce a runtime PM
> > > dependency from an HDA controller to the GPU it's integrated into, v2.
> > > 
> > > https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> > 
> > This all looks really reasonable and like a good cleanup, but it's a bit
> > too much detail so I'll punt review to someone else with more clue.
> 
> Patches [3/7] to [7/7] were reviewed by Peter Wu, the HDA bits in
> patch [5/7] additionally by Takashi.
> 
> Patch [2/7] was acked by Bjorn.  There was no ack for patch [1/7]
> (authored by Rafael), but it adressed the objection Bjorn raised
> against my original patch, so I'm assuming Bjorn is okay with it.
> (Bjorn, please let me know if that isn't the case.)

I am OK with it.  I sent an ack and possible minor changelog tweak.

I expect that you'll merge the whole series via drm-misc.

> The series has been tested on 5 systems, which raises the confidence:
> 2x AMD PowerXpress (Mike Lothian, Kai Heng Feng)
> 2x Nvidia Optimus (Denis Lisov, Peter Wu)
> 1x MacBook Pro
> 
> The issues found during Peter Wu's thorough testing appear to all
> be unrelated to this series, as per my e-mail yesterday.
> 
> If there are no objections, I plan to push the series to
> drm-misc-next by the middle of the coming week so that it
> would still catch the last train to 4.17.
> 
> Thanks,
> 
> Lukas
Lukas Wunner March 14, 2018, 6:07 a.m. UTC | #7
On Sun, Mar 11, 2018 at 04:55:49PM +0100, Lukas Wunner wrote:
> > On Sat, Mar 03, 2018 at 10:53:24AM +0100, Lukas Wunner wrote:
> > > Modernize vga_switcheroo by using a device link to enforce a runtime PM
> > > dependency from an HDA controller to the GPU it's integrated into, v2.
> > > 
> > > https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> 
> If there are no objections, I plan to push the series to
> drm-misc-next by the middle of the coming week so that it
> would still catch the last train to 4.17.

Pushed to drm-misc-next now with Bjorn's changelog tweak and ack for
patch [1/1].

Thanks a lot everyone for the reviews, acks, testing & comments.

Lukas