mbox series

[SRU,G/Unstable/OEM-5.10,0/7] Prevent thermal shutdown during boot process

Message ID 20210121084902.672855-1-kai.heng.feng@canonical.com
Headers show
Series Prevent thermal shutdown during boot process | expand

Message

Kai-Heng Feng Jan. 21, 2021, 8:48 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1906168

[Impact]
Surprising thermal shutdown at boot on Intel based mobile workstations.

[Fix]
Since these thermal devcies are not in ACPI ThermalZone, OS shouldn't
shutdown the system.

These critial temperatures are for usespace to handle, so let kernel
know it shouldn't handle it.

[Test]
Use reboot stress as a reproducer. 5% chance to see a surprising
shutdown at boot.

With the fix applied, the thermal shutdown is no longer reproducible.

[Where problems could occur]
For ACPI based platforms, we still have "acpitz" to protect systems from
overheating. If these acpitz sensors don't work, then the system could
face real overheating issue.

Daniel Lezcano (5):
  thermal/core: Emit a warning if the thermal zone is updated without
    ops
  thermal/core: Add critical and hot ops
  thermal/drivers/acpi: Use hot and critical ops
  thermal/drivers/rcar: Remove notification usage
  thermal/core: Remove notify ops

Kai-Heng Feng (2):
  thermal: int340x: Fix unexpected shutdown at critical temperature
  thermal: intel: pch: Fix unexpected shutdown at critical temperature

 drivers/acpi/thermal.c                        | 30 ++++++------
 .../int340x_thermal/int340x_thermal_zone.c    |  6 +++
 drivers/thermal/intel/intel_pch_thermal.c     |  6 +++
 drivers/thermal/rcar_thermal.c                | 19 -------
 drivers/thermal/thermal_core.c                | 49 +++++++++++--------
 include/linux/thermal.h                       |  5 +-
 6 files changed, 58 insertions(+), 57 deletions(-)

Comments

Paolo Pisati Jan. 22, 2021, 9:19 a.m. UTC | #1
On Thu, Jan 21, 2021 at 04:48:54PM +0800, Kai-Heng Feng wrote:
> BugLink: https://bugs.launchpad.net/bugs/1906168
Andrea Righi Jan. 25, 2021, 8:16 a.m. UTC | #2
On Thu, Jan 21, 2021 at 04:48:54PM +0800, Kai-Heng Feng wrote:
> BugLink: https://bugs.launchpad.net/bugs/1906168
> 
> [Impact]
> Surprising thermal shutdown at boot on Intel based mobile workstations.
> 
> [Fix]
> Since these thermal devcies are not in ACPI ThermalZone, OS shouldn't
> shutdown the system.
> 
> These critial temperatures are for usespace to handle, so let kernel
> know it shouldn't handle it.
> 
> [Test]
> Use reboot stress as a reproducer. 5% chance to see a surprising
> shutdown at boot.
> 
> With the fix applied, the thermal shutdown is no longer reproducible.
> 
> [Where problems could occur]
> For ACPI based platforms, we still have "acpitz" to protect systems from
> overheating. If these acpitz sensors don't work, then the system could
> face real overheating issue.

Applied to 5.11 unstable (only the patches from linux-next that were
missing). Thanks!

-Andrea
Timo Aaltonen Jan. 29, 2021, 7:48 a.m. UTC | #3
On 21.1.2021 10.48, Kai-Heng Feng wrote:
> BugLink: https://bugs.launchpad.net/bugs/1906168
> 
> [Impact]
> Surprising thermal shutdown at boot on Intel based mobile workstations.
> 
> [Fix]
> Since these thermal devcies are not in ACPI ThermalZone, OS shouldn't
> shutdown the system.
> 
> These critial temperatures are for usespace to handle, so let kernel
> know it shouldn't handle it.
> 
> [Test]
> Use reboot stress as a reproducer. 5% chance to see a surprising
> shutdown at boot.
> 
> With the fix applied, the thermal shutdown is no longer reproducible.
> 
> [Where problems could occur]
> For ACPI based platforms, we still have "acpitz" to protect systems from
> overheating. If these acpitz sensors don't work, then the system could
> face real overheating issue.
> 
> Daniel Lezcano (5):
>    thermal/core: Emit a warning if the thermal zone is updated without
>      ops
>    thermal/core: Add critical and hot ops
>    thermal/drivers/acpi: Use hot and critical ops
>    thermal/drivers/rcar: Remove notification usage
>    thermal/core: Remove notify ops
> 
> Kai-Heng Feng (2):
>    thermal: int340x: Fix unexpected shutdown at critical temperature
>    thermal: intel: pch: Fix unexpected shutdown at critical temperature
> 
>   drivers/acpi/thermal.c                        | 30 ++++++------
>   .../int340x_thermal/int340x_thermal_zone.c    |  6 +++
>   drivers/thermal/intel/intel_pch_thermal.c     |  6 +++
>   drivers/thermal/rcar_thermal.c                | 19 -------
>   drivers/thermal/thermal_core.c                | 49 +++++++++++--------
>   include/linux/thermal.h                       |  5 +-
>   6 files changed, 58 insertions(+), 57 deletions(-)
> 

applied to oem-5.10, thanks