[v5,1/3] thermal: qcom-spmi: Use PMIC thermal stage 2 for critical trip points

Message ID 20180724234636.57137-1-mka@chromium.org
State New
Headers show
Series
  • [v5,1/3] thermal: qcom-spmi: Use PMIC thermal stage 2 for critical trip points
Related show

Commit Message

Matthias Kaehlcke July 24, 2018, 11:46 p.m.
There are three thermal stages defined in the PMIC:

stage 1: warning
stage 2: system should shut down
stage 3: emergency shut down

By default the PMIC assumes that the OS isn't doing anything and thus
at stage 2 it does a partial PMIC shutdown and at stage 3 it kills
all power. When switching between thermal stages the PMIC generates an
interrupt which is handled by the driver. The partial PMIC shutdown at
stage 2 can be disabled by software, which allows the OS to initiate a
shutdown at stage 2 with a thermal zone configured accordingly.

If a critical trip point is configured in the thermal zone the driver
adjusts the stage 1-3 temperature thresholds to (closely) match the
critical temperature with a stage 2 threshold (125/130/135/140 °C).
If a suitable match is found the partial shutdown at stage 2 is
disabled. If for some reason the system doesn't shutdown at stage 2
the emergency shutdown at stage 3 kicks in.

The partial shutdown at stage 2 remains enabled in these cases:
- no critical trip point defined
- the temperature of the critical trip point is < 125°C
- the temperature of the critical trip point is > 140°C and no
  ADC channel is configured (thus the OS is not notified when the critical
  temperature is reached)

Suggested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v5:
- patch added to the series
---
 drivers/thermal/qcom-spmi-temp-alarm.c | 161 ++++++++++++++++++++++---
 1 file changed, 142 insertions(+), 19 deletions(-)

Comments

Matthias Kaehlcke July 25, 2018, 12:06 a.m. | #1
On Tue, Jul 24, 2018 at 04:46:34PM -0700, Matthias Kaehlcke wrote:
> There are three thermal stages defined in the PMIC:
> 
> stage 1: warning
> stage 2: system should shut down
> stage 3: emergency shut down
> 
> By default the PMIC assumes that the OS isn't doing anything and thus
> at stage 2 it does a partial PMIC shutdown and at stage 3 it kills
> all power. When switching between thermal stages the PMIC generates an
> interrupt which is handled by the driver. The partial PMIC shutdown at
> stage 2 can be disabled by software, which allows the OS to initiate a
> shutdown at stage 2 with a thermal zone configured accordingly.
> 
> If a critical trip point is configured in the thermal zone the driver
> adjusts the stage 1-3 temperature thresholds to (closely) match the
> critical temperature with a stage 2 threshold (125/130/135/140 °C).
> If a suitable match is found the partial shutdown at stage 2 is
> disabled. If for some reason the system doesn't shutdown at stage 2
> the emergency shutdown at stage 3 kicks in.
> 
> The partial shutdown at stage 2 remains enabled in these cases:
> - no critical trip point defined
> - the temperature of the critical trip point is < 125°C
> - the temperature of the critical trip point is > 140°C and no
>   ADC channel is configured (thus the OS is not notified when the critical
>   temperature is reached)
> 
> Suggested-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> Changes in v5:
> - patch added to the series
> ---
>  drivers/thermal/qcom-spmi-temp-alarm.c | 161 ++++++++++++++++++++++---
>  1 file changed, 142 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/thermal/qcom-spmi-temp-alarm.c b/drivers/thermal/qcom-spmi-temp-alarm.c
> index ad4f3a8d6560..936e4dde4298 100644
> --- a/drivers/thermal/qcom-spmi-temp-alarm.c
> +++ b/drivers/thermal/qcom-spmi-temp-alarm.c
>
> ...
>
> +static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
> +					     int temp)
> +{
> +	u8 reg;
> +	bool disable_s2_shutdown = false;
> +	int ret;
> +
> +	WARN_ON(!mutex_is_locked(&chip->lock));
> +
> +	/*
> +	 * Default: S2 and S3 shutdown enabled, thresholds at
> +	 * 105C/125C/145C, monitoring at 25Hz
> +	 */
> +	reg = SHUTDOWN_CTRL1_RATE_25HZ;
> +
> +	if ((temp == THERMAL_TEMP_INVALID) ||
> +	    (temp < STAGE2_THRESHOLD_MIN)) {
> +		chip->thresh = THRESH_MIN;
> +		goto skip;
> +	}
> +
> +	if (temp <= STAGE2_THRESHOLD_MAX) {
> +		chip->thresh = THRESH_MAX -
> +			((STAGE2_THRESHOLD_MAX - temp) /
> +			 TEMP_THRESH_STEP);
> +		disable_s2_shutdown = true;
> +	} else {
> +		chip->thresh = THRESH_MAX;
> +
> +		if (!IS_ERR(chip->adc))

Note to self: with commit 7a4ca51b7040 ("thermal/drivers/qcom-spmi:
Use devm_iio_channel_get") this should be 'if (chip->adc)'.
Doug Anderson July 25, 2018, 11:19 p.m. | #2
Hi,

On Tue, Jul 24, 2018 at 4:46 PM, Matthias Kaehlcke <mka@chromium.org> wrote:
> +static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
> +                                            int temp)
> +{
> +       u8 reg;
> +       bool disable_s2_shutdown = false;
> +       int ret;
> +
> +       WARN_ON(!mutex_is_locked(&chip->lock));
> +
> +       /*
> +        * Default: S2 and S3 shutdown enabled, thresholds at
> +        * 105C/125C/145C, monitoring at 25Hz
> +        */
> +       reg = SHUTDOWN_CTRL1_RATE_25HZ;
> +
> +       if ((temp == THERMAL_TEMP_INVALID) ||
> +           (temp < STAGE2_THRESHOLD_MIN)) {
> +               chip->thresh = THRESH_MIN;
> +               goto skip;
> +       }
> +
> +       if (temp <= STAGE2_THRESHOLD_MAX) {
> +               chip->thresh = THRESH_MAX -
> +                       ((STAGE2_THRESHOLD_MAX - temp) /
> +                        TEMP_THRESH_STEP);
> +               disable_s2_shutdown = true;
> +       } else {
> +               chip->thresh = THRESH_MAX;
> +
> +               if (!IS_ERR(chip->adc))
> +                       disable_s2_shutdown = true;
> +               else
> +                       dev_warn(chip->dev,
> +                                "No ADC is configured and critical temperature is above the maximum stage 2 threshold of 140°C! Configuring stage 2 shutdown at 140°C.\n");

Putting a non-ASCII character (the degree symbol) in your commit
message is one thing, but are you sure it's wise to put it in the
kernel logs?


> +       }
> +
> +skip:
> +       reg |= chip->thresh;
> +       if (disable_s2_shutdown)
> +               reg |= SHUTDOWN_CTRL1_OVERRIDE_S2;
> +
> +       ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> +       if (ret < 0)
> +               return ret;
> +
> +       return ret;

Simplify the above lines to:

return qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);


> @@ -313,12 +441,7 @@ static int qpnp_tm_probe(struct platform_device *pdev)
>         if (ret < 0)
>                 return ret;
>
> -       chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
> -                                                       &qpnp_tm_sensor_ops);
> -       if (IS_ERR(chip->tz_dev)) {
> -               dev_err(&pdev->dev, "failed to register sensor\n");
> -               return PTR_ERR(chip->tz_dev);
> -       }
> +       chip->initialized = true;

Should we add "thermal_zone_device_update(chip->tz_dev,
THERMAL_EVENT_UNSPECIFIED);" here

...also: do we care about any type of locking for chip->initialized?
Technically we can be running on weakly ordered memory so if
qpnp_tm_update_temp_no_adc() is running on a different processor then
possibly it could still keep returning the default temperature for a
little while.  We could try to analyze whether there's some sort of
implicit barrier or we could add manual memory barriers, but generally
I try to avoid that and just do the simple locking...  What about just
setting chip-Initialized = true at the end of qpnp_tm_init() while the
mutex is still held?


I'd also love to hear from someone with more thermal framework
experience to make sure it's legit to return a default value if
someone calls us while we're initting.  It seems sane to me but nice
to confirm it's OK.


Overall I like the idea of this patch so hopefully others do too.
Thanks for sending it out!


-Doug
Matthias Kaehlcke July 26, 2018, 1:12 a.m. | #3
Hi Doug,

On Wed, Jul 25, 2018 at 04:19:56PM -0700, Doug Anderson wrote:

> On Tue, Jul 24, 2018 at 4:46 PM, Matthias Kaehlcke <mka@chromium.org> wrote:
> > +static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
> > +                                            int temp)
> > +{
> > +       u8 reg;
> > +       bool disable_s2_shutdown = false;
> > +       int ret;
> > +
> > +       WARN_ON(!mutex_is_locked(&chip->lock));
> > +
> > +       /*
> > +        * Default: S2 and S3 shutdown enabled, thresholds at
> > +        * 105C/125C/145C, monitoring at 25Hz
> > +        */
> > +       reg = SHUTDOWN_CTRL1_RATE_25HZ;
> > +
> > +       if ((temp == THERMAL_TEMP_INVALID) ||
> > +           (temp < STAGE2_THRESHOLD_MIN)) {
> > +               chip->thresh = THRESH_MIN;
> > +               goto skip;
> > +       }
> > +
> > +       if (temp <= STAGE2_THRESHOLD_MAX) {
> > +               chip->thresh = THRESH_MAX -
> > +                       ((STAGE2_THRESHOLD_MAX - temp) /
> > +                        TEMP_THRESH_STEP);
> > +               disable_s2_shutdown = true;
> > +       } else {
> > +               chip->thresh = THRESH_MAX;
> > +
> > +               if (!IS_ERR(chip->adc))
> > +                       disable_s2_shutdown = true;
> > +               else
> > +                       dev_warn(chip->dev,
> > +                                "No ADC is configured and critical temperature is above the maximum stage 2 threshold of 140°C! Configuring stage 2 shutdown at 140°C.\n");
> 
> Putting a non-ASCII character (the degree symbol) in your commit
> message is one thing, but are you sure it's wise to put it in the
> kernel logs?

A few other drivers also do this
(drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c,
drivers/macintosh/windfarm_pm121.c), however that doesn't mean it's a
good idea. Will change to degC or C.

> > +       }
> > +
> > +skip:
> > +       reg |= chip->thresh;
> > +       if (disable_s2_shutdown)
> > +               reg |= SHUTDOWN_CTRL1_OVERRIDE_S2;
> > +
> > +       ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       return ret;
> 
> Simplify the above lines to:
> 
> return qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);

Ouch, my code is indeed dumb ...

> > @@ -313,12 +441,7 @@ static int qpnp_tm_probe(struct platform_device *pdev)
> >         if (ret < 0)
> >                 return ret;
> >
> > -       chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
> > -                                                       &qpnp_tm_sensor_ops);
> > -       if (IS_ERR(chip->tz_dev)) {
> > -               dev_err(&pdev->dev, "failed to register sensor\n");
> > -               return PTR_ERR(chip->tz_dev);
> > -       }
> > +       chip->initialized = true;
> 
> Should we add "thermal_zone_device_update(chip->tz_dev,
> THERMAL_EVENT_UNSPECIFIED);" here

Seems reasonable, will do.

> ...also: do we care about any type of locking for chip->initialized?
> Technically we can be running on weakly ordered memory so if
> qpnp_tm_update_temp_no_adc() is running on a different processor then
> possibly it could still keep returning the default temperature for a
> little while.  We could try to analyze whether there's some sort of
> implicit barrier or we could add manual memory barriers, but generally
> I try to avoid that and just do the simple locking...  What about just
> setting chip-Initialized = true at the end of qpnp_tm_init() while the
> mutex is still held?

Thanks for pointing that out. I agree that we should keep things
simple, chip->initialized to true at the end of qpnp_tm_init() sounds
good to me.

> I'd also love to hear from someone with more thermal framework
> experience to make sure it's legit to return a default value if
> someone calls us while we're initting.  It seems sane to me but nice
> to confirm it's OK.

An alternative could be to return THERMAL_TEMP_INVALID, however I
don't see this handled outside of thermal_core.c, not sure if it could
throw some other code off.

Comments from thermal folks on either approach (or alternatives) are
definitely welcome :)

> Overall I like the idea of this patch so hopefully others do too.
> Thanks for sending it out!

Thanks for the review!

Matthias
Eduardo Valentin July 27, 2018, 10:40 p.m. | #4
On Wed, Jul 25, 2018 at 06:12:28PM -0700, Matthias Kaehlcke wrote:
> Hi Doug,
> 
> On Wed, Jul 25, 2018 at 04:19:56PM -0700, Doug Anderson wrote:
> 
> > On Tue, Jul 24, 2018 at 4:46 PM, Matthias Kaehlcke <mka@chromium.org> wrote:
> > > +static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
> > > +                                            int temp)
> > > +{
> > > +       u8 reg;
> > > +       bool disable_s2_shutdown = false;
> > > +       int ret;
> > > +
> > > +       WARN_ON(!mutex_is_locked(&chip->lock));
> > > +
> > > +       /*
> > > +        * Default: S2 and S3 shutdown enabled, thresholds at
> > > +        * 105C/125C/145C, monitoring at 25Hz
> > > +        */
> > > +       reg = SHUTDOWN_CTRL1_RATE_25HZ;
> > > +
> > > +       if ((temp == THERMAL_TEMP_INVALID) ||
> > > +           (temp < STAGE2_THRESHOLD_MIN)) {
> > > +               chip->thresh = THRESH_MIN;
> > > +               goto skip;
> > > +       }
> > > +
> > > +       if (temp <= STAGE2_THRESHOLD_MAX) {
> > > +               chip->thresh = THRESH_MAX -
> > > +                       ((STAGE2_THRESHOLD_MAX - temp) /
> > > +                        TEMP_THRESH_STEP);
> > > +               disable_s2_shutdown = true;
> > > +       } else {
> > > +               chip->thresh = THRESH_MAX;
> > > +
> > > +               if (!IS_ERR(chip->adc))
> > > +                       disable_s2_shutdown = true;
> > > +               else
> > > +                       dev_warn(chip->dev,
> > > +                                "No ADC is configured and critical temperature is above the maximum stage 2 threshold of 140°C! Configuring stage 2 shutdown at 140°C.\n");
> > 
> > Putting a non-ASCII character (the degree symbol) in your commit
> > message is one thing, but are you sure it's wise to put it in the
> > kernel logs?
> 
> A few other drivers also do this
> (drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c,
> drivers/macintosh/windfarm_pm121.c), however that doesn't mean it's a
> good idea. Will change to degC or C.
> 
> > > +       }
> > > +
> > > +skip:
> > > +       reg |= chip->thresh;
> > > +       if (disable_s2_shutdown)
> > > +               reg |= SHUTDOWN_CTRL1_OVERRIDE_S2;
> > > +
> > > +       ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > > +       if (ret < 0)
> > > +               return ret;
> > > +
> > > +       return ret;
> > 
> > Simplify the above lines to:
> > 
> > return qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> 
> Ouch, my code is indeed dumb ...
> 
> > > @@ -313,12 +441,7 @@ static int qpnp_tm_probe(struct platform_device *pdev)
> > >         if (ret < 0)
> > >                 return ret;
> > >
> > > -       chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
> > > -                                                       &qpnp_tm_sensor_ops);
> > > -       if (IS_ERR(chip->tz_dev)) {
> > > -               dev_err(&pdev->dev, "failed to register sensor\n");
> > > -               return PTR_ERR(chip->tz_dev);
> > > -       }
> > > +       chip->initialized = true;
> > 
> > Should we add "thermal_zone_device_update(chip->tz_dev,
> > THERMAL_EVENT_UNSPECIFIED);" here
> 
> Seems reasonable, will do.
> 
> > ...also: do we care about any type of locking for chip->initialized?
> > Technically we can be running on weakly ordered memory so if
> > qpnp_tm_update_temp_no_adc() is running on a different processor then
> > possibly it could still keep returning the default temperature for a
> > little while.  We could try to analyze whether there's some sort of
> > implicit barrier or we could add manual memory barriers, but generally
> > I try to avoid that and just do the simple locking...  What about just
> > setting chip-Initialized = true at the end of qpnp_tm_init() while the
> > mutex is still held?
> 
> Thanks for pointing that out. I agree that we should keep things
> simple, chip->initialized to true at the end of qpnp_tm_init() sounds
> good to me.
> 
> > I'd also love to hear from someone with more thermal framework
> > experience to make sure it's legit to return a default value if
> > someone calls us while we're initting.  It seems sane to me but nice
> > to confirm it's OK.
> 
> An alternative could be to return THERMAL_TEMP_INVALID, however I
> don't see this handled outside of thermal_core.c, not sure if it could
> throw some other code off.
> 
> Comments from thermal folks on either approach (or alternatives) are
> definitely welcome :)
> 
> > Overall I like the idea of this patch so hopefully others do too.
> > Thanks for sending it out!
> 

minor ask for next version


WARNING: line over 80 characters
#159: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:65:
+#define STAGE2_THRESHOLD_MIN		125000	/* Stage 2 Threshold
Min: 125 C */

WARNING: line over 80 characters
#160: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:66:
+#define STAGE2_THRESHOLD_MAX		140000	/* Stage 2 Threshold
Max: 140 C */

ERROR: trailing statements should be on next line
#201: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:186:
+	if (!chip->adc)) {

CHECK: Unnecessary parentheses around 'temp == THERMAL_TEMP_INVALID'
#227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
+	if ((temp == THERMAL_TEMP_INVALID) ||
+	    (temp < STAGE2_THRESHOLD_MIN)) {

CHECK: Unnecessary parentheses around 'temp < STAGE2_THRESHOLD_MIN'
#227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
+	if ((temp == THERMAL_TEMP_INVALID) ||
+	    (temp < STAGE2_THRESHOLD_MIN)) {

CHECK: Unnecessary parentheses around 'trips[i].type ==
THERMAL_TRIP_CRITICAL'
#305: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:302:
+		if (of_thermal_is_trip_valid(chip->tz_dev, i) &&
+		    (trips[i].type == THERMAL_TRIP_CRITICAL))

CHECK: Alignment should match open parenthesis
#386: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:427:
+	chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev,
0, chip,
+
&qpnp_tm_sensor_ops);

> Thanks for the review!
> 
> Matthias
Eduardo Valentin July 27, 2018, 10:45 p.m. | #5
On Fri, Jul 27, 2018 at 03:40:52PM -0700, Eduardo Valentin wrote:
> On Wed, Jul 25, 2018 at 06:12:28PM -0700, Matthias Kaehlcke wrote:
> > Hi Doug,
> > 
> > On Wed, Jul 25, 2018 at 04:19:56PM -0700, Doug Anderson wrote:
> > 
> > > On Tue, Jul 24, 2018 at 4:46 PM, Matthias Kaehlcke <mka@chromium.org> wrote:
> > > > +static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
> > > > +                                            int temp)
> > > > +{
> > > > +       u8 reg;
> > > > +       bool disable_s2_shutdown = false;
> > > > +       int ret;
> > > > +
> > > > +       WARN_ON(!mutex_is_locked(&chip->lock));
> > > > +
> > > > +       /*
> > > > +        * Default: S2 and S3 shutdown enabled, thresholds at
> > > > +        * 105C/125C/145C, monitoring at 25Hz
> > > > +        */
> > > > +       reg = SHUTDOWN_CTRL1_RATE_25HZ;
> > > > +
> > > > +       if ((temp == THERMAL_TEMP_INVALID) ||
> > > > +           (temp < STAGE2_THRESHOLD_MIN)) {
> > > > +               chip->thresh = THRESH_MIN;
> > > > +               goto skip;
> > > > +       }
> > > > +
> > > > +       if (temp <= STAGE2_THRESHOLD_MAX) {
> > > > +               chip->thresh = THRESH_MAX -
> > > > +                       ((STAGE2_THRESHOLD_MAX - temp) /
> > > > +                        TEMP_THRESH_STEP);
> > > > +               disable_s2_shutdown = true;
> > > > +       } else {
> > > > +               chip->thresh = THRESH_MAX;
> > > > +
> > > > +               if (!IS_ERR(chip->adc))
> > > > +                       disable_s2_shutdown = true;
> > > > +               else
> > > > +                       dev_warn(chip->dev,
> > > > +                                "No ADC is configured and critical temperature is above the maximum stage 2 threshold of 140°C! Configuring stage 2 shutdown at 140°C.\n");
> > > 
> > > Putting a non-ASCII character (the degree symbol) in your commit
> > > message is one thing, but are you sure it's wise to put it in the
> > > kernel logs?
> > 
> > A few other drivers also do this
> > (drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c,
> > drivers/macintosh/windfarm_pm121.c), however that doesn't mean it's a
> > good idea. Will change to degC or C.
> > 
> > > > +       }
> > > > +
> > > > +skip:
> > > > +       reg |= chip->thresh;
> > > > +       if (disable_s2_shutdown)
> > > > +               reg |= SHUTDOWN_CTRL1_OVERRIDE_S2;
> > > > +
> > > > +       ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > > > +       if (ret < 0)
> > > > +               return ret;
> > > > +
> > > > +       return ret;
> > > 
> > > Simplify the above lines to:
> > > 
> > > return qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > 
> > Ouch, my code is indeed dumb ...
> > 
> > > > @@ -313,12 +441,7 @@ static int qpnp_tm_probe(struct platform_device *pdev)
> > > >         if (ret < 0)
> > > >                 return ret;
> > > >
> > > > -       chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
> > > > -                                                       &qpnp_tm_sensor_ops);
> > > > -       if (IS_ERR(chip->tz_dev)) {
> > > > -               dev_err(&pdev->dev, "failed to register sensor\n");
> > > > -               return PTR_ERR(chip->tz_dev);
> > > > -       }
> > > > +       chip->initialized = true;
> > > 
> > > Should we add "thermal_zone_device_update(chip->tz_dev,
> > > THERMAL_EVENT_UNSPECIFIED);" here
> > 
> > Seems reasonable, will do.
> > 
> > > ...also: do we care about any type of locking for chip->initialized?
> > > Technically we can be running on weakly ordered memory so if
> > > qpnp_tm_update_temp_no_adc() is running on a different processor then
> > > possibly it could still keep returning the default temperature for a
> > > little while.  We could try to analyze whether there's some sort of
> > > implicit barrier or we could add manual memory barriers, but generally
> > > I try to avoid that and just do the simple locking...  What about just
> > > setting chip-Initialized = true at the end of qpnp_tm_init() while the
> > > mutex is still held?
> > 
> > Thanks for pointing that out. I agree that we should keep things
> > simple, chip->initialized to true at the end of qpnp_tm_init() sounds
> > good to me.
> > 
> > > I'd also love to hear from someone with more thermal framework
> > > experience to make sure it's legit to return a default value if
> > > someone calls us while we're initting.  It seems sane to me but nice
> > > to confirm it's OK.
> > 
> > An alternative could be to return THERMAL_TEMP_INVALID, however I
> > don't see this handled outside of thermal_core.c, not sure if it could
> > throw some other code off.
> > 
> > Comments from thermal folks on either approach (or alternatives) are
> > definitely welcome :)
> > 
> > > Overall I like the idea of this patch so hopefully others do too.
> > > Thanks for sending it out!
> > 
> 
> minor ask for next version
> 
> 
> WARNING: line over 80 characters
> #159: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:65:
> +#define STAGE2_THRESHOLD_MIN		125000	/* Stage 2 Threshold
> Min: 125 C */
> 
> WARNING: line over 80 characters
> #160: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:66:
> +#define STAGE2_THRESHOLD_MAX		140000	/* Stage 2 Threshold
> Max: 140 C */
> 
> ERROR: trailing statements should be on next line
> #201: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:186:
> +	if (!chip->adc)) {
> 
> CHECK: Unnecessary parentheses around 'temp == THERMAL_TEMP_INVALID'
> #227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
> +	if ((temp == THERMAL_TEMP_INVALID) ||
> +	    (temp < STAGE2_THRESHOLD_MIN)) {
> 
> CHECK: Unnecessary parentheses around 'temp < STAGE2_THRESHOLD_MIN'
> #227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
> +	if ((temp == THERMAL_TEMP_INVALID) ||
> +	    (temp < STAGE2_THRESHOLD_MIN)) {
> 
> CHECK: Unnecessary parentheses around 'trips[i].type ==
> THERMAL_TRIP_CRITICAL'
> #305: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:302:
> +		if (of_thermal_is_trip_valid(chip->tz_dev, i) &&
> +		    (trips[i].type == THERMAL_TRIP_CRITICAL))
> 
> CHECK: Alignment should match open parenthesis
> #386: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:427:
> +	chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev,
> 0, chip,
> +
> &qpnp_tm_sensor_ops);


And it would be great if you could combine these two in your a single
series, say when you fix this patch and send a new version of this
series, please include these too:
https://patchwork.kernel.org/patch/10543335/
https://patchwork.kernel.org/patch/10543333/

> 
> > Thanks for the review!
> > 
> > Matthias
Matthias Kaehlcke July 31, 2018, 4:19 p.m. | #6
On Fri, Jul 27, 2018 at 03:40:52PM -0700, Eduardo Valentin wrote:
> On Wed, Jul 25, 2018 at 06:12:28PM -0700, Matthias Kaehlcke wrote:
> > Hi Doug,
> > 
> > On Wed, Jul 25, 2018 at 04:19:56PM -0700, Doug Anderson wrote:
> > 
> > > On Tue, Jul 24, 2018 at 4:46 PM, Matthias Kaehlcke <mka@chromium.org> wrote:
> > > > +static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
> > > > +                                            int temp)
> > > > +{
> > > > +       u8 reg;
> > > > +       bool disable_s2_shutdown = false;
> > > > +       int ret;
> > > > +
> > > > +       WARN_ON(!mutex_is_locked(&chip->lock));
> > > > +
> > > > +       /*
> > > > +        * Default: S2 and S3 shutdown enabled, thresholds at
> > > > +        * 105C/125C/145C, monitoring at 25Hz
> > > > +        */
> > > > +       reg = SHUTDOWN_CTRL1_RATE_25HZ;
> > > > +
> > > > +       if ((temp == THERMAL_TEMP_INVALID) ||
> > > > +           (temp < STAGE2_THRESHOLD_MIN)) {
> > > > +               chip->thresh = THRESH_MIN;
> > > > +               goto skip;
> > > > +       }
> > > > +
> > > > +       if (temp <= STAGE2_THRESHOLD_MAX) {
> > > > +               chip->thresh = THRESH_MAX -
> > > > +                       ((STAGE2_THRESHOLD_MAX - temp) /
> > > > +                        TEMP_THRESH_STEP);
> > > > +               disable_s2_shutdown = true;
> > > > +       } else {
> > > > +               chip->thresh = THRESH_MAX;
> > > > +
> > > > +               if (!IS_ERR(chip->adc))
> > > > +                       disable_s2_shutdown = true;
> > > > +               else
> > > > +                       dev_warn(chip->dev,
> > > > +                                "No ADC is configured and critical temperature is above the maximum stage 2 threshold of 140°C! Configuring stage 2 shutdown at 140°C.\n");
> > > 
> > > Putting a non-ASCII character (the degree symbol) in your commit
> > > message is one thing, but are you sure it's wise to put it in the
> > > kernel logs?
> > 
> > A few other drivers also do this
> > (drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c,
> > drivers/macintosh/windfarm_pm121.c), however that doesn't mean it's a
> > good idea. Will change to degC or C.
> > 
> > > > +       }
> > > > +
> > > > +skip:
> > > > +       reg |= chip->thresh;
> > > > +       if (disable_s2_shutdown)
> > > > +               reg |= SHUTDOWN_CTRL1_OVERRIDE_S2;
> > > > +
> > > > +       ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > > > +       if (ret < 0)
> > > > +               return ret;
> > > > +
> > > > +       return ret;
> > > 
> > > Simplify the above lines to:
> > > 
> > > return qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > 
> > Ouch, my code is indeed dumb ...
> > 
> > > > @@ -313,12 +441,7 @@ static int qpnp_tm_probe(struct platform_device *pdev)
> > > >         if (ret < 0)
> > > >                 return ret;
> > > >
> > > > -       chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
> > > > -                                                       &qpnp_tm_sensor_ops);
> > > > -       if (IS_ERR(chip->tz_dev)) {
> > > > -               dev_err(&pdev->dev, "failed to register sensor\n");
> > > > -               return PTR_ERR(chip->tz_dev);
> > > > -       }
> > > > +       chip->initialized = true;
> > > 
> > > Should we add "thermal_zone_device_update(chip->tz_dev,
> > > THERMAL_EVENT_UNSPECIFIED);" here
> > 
> > Seems reasonable, will do.
> > 
> > > ...also: do we care about any type of locking for chip->initialized?
> > > Technically we can be running on weakly ordered memory so if
> > > qpnp_tm_update_temp_no_adc() is running on a different processor then
> > > possibly it could still keep returning the default temperature for a
> > > little while.  We could try to analyze whether there's some sort of
> > > implicit barrier or we could add manual memory barriers, but generally
> > > I try to avoid that and just do the simple locking...  What about just
> > > setting chip-Initialized = true at the end of qpnp_tm_init() while the
> > > mutex is still held?
> > 
> > Thanks for pointing that out. I agree that we should keep things
> > simple, chip->initialized to true at the end of qpnp_tm_init() sounds
> > good to me.
> > 
> > > I'd also love to hear from someone with more thermal framework
> > > experience to make sure it's legit to return a default value if
> > > someone calls us while we're initting.  It seems sane to me but nice
> > > to confirm it's OK.
> > 
> > An alternative could be to return THERMAL_TEMP_INVALID, however I
> > don't see this handled outside of thermal_core.c, not sure if it could
> > throw some other code off.
> > 
> > Comments from thermal folks on either approach (or alternatives) are
> > definitely welcome :)
> > 
> > > Overall I like the idea of this patch so hopefully others do too.
> > > Thanks for sending it out!
> > 
> 
> minor ask for next version
> 
> 
> WARNING: line over 80 characters
> #159: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:65:
> +#define STAGE2_THRESHOLD_MIN		125000	/* Stage 2 Threshold
> Min: 125 C */
> 
> WARNING: line over 80 characters
> #160: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:66:
> +#define STAGE2_THRESHOLD_MAX		140000	/* Stage 2 Threshold
> Max: 140 C */
> 
> ERROR: trailing statements should be on next line
> #201: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:186:
> +	if (!chip->adc)) {
> 
> CHECK: Unnecessary parentheses around 'temp == THERMAL_TEMP_INVALID'
> #227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
> +	if ((temp == THERMAL_TEMP_INVALID) ||
> +	    (temp < STAGE2_THRESHOLD_MIN)) {
> 
> CHECK: Unnecessary parentheses around 'temp < STAGE2_THRESHOLD_MIN'
> #227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
> +	if ((temp == THERMAL_TEMP_INVALID) ||
> +	    (temp < STAGE2_THRESHOLD_MIN)) {
> 
> CHECK: Unnecessary parentheses around 'trips[i].type ==
> THERMAL_TRIP_CRITICAL'
> #305: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:302:
> +		if (of_thermal_is_trip_valid(chip->tz_dev, i) &&
> +		    (trips[i].type == THERMAL_TRIP_CRITICAL))
> 
> CHECK: Alignment should match open parenthesis
> #386: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:427:
> +	chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev,
> 0, chip,
> +
> &qpnp_tm_sensor_ops);

Thanks for the review, I'll fix these in the next version.

Right after sending the patches I realized that I forgot to run
checkpatch.pl :( Will try to do better in the future.
Matthias Kaehlcke July 31, 2018, 4:21 p.m. | #7
On Fri, Jul 27, 2018 at 03:45:06PM -0700, Eduardo Valentin wrote:
> On Fri, Jul 27, 2018 at 03:40:52PM -0700, Eduardo Valentin wrote:
> > On Wed, Jul 25, 2018 at 06:12:28PM -0700, Matthias Kaehlcke wrote:
> > > Hi Doug,
> > > 
> > > On Wed, Jul 25, 2018 at 04:19:56PM -0700, Doug Anderson wrote:
> > > 
> > > > On Tue, Jul 24, 2018 at 4:46 PM, Matthias Kaehlcke <mka@chromium.org> wrote:
> > > > > +static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
> > > > > +                                            int temp)
> > > > > +{
> > > > > +       u8 reg;
> > > > > +       bool disable_s2_shutdown = false;
> > > > > +       int ret;
> > > > > +
> > > > > +       WARN_ON(!mutex_is_locked(&chip->lock));
> > > > > +
> > > > > +       /*
> > > > > +        * Default: S2 and S3 shutdown enabled, thresholds at
> > > > > +        * 105C/125C/145C, monitoring at 25Hz
> > > > > +        */
> > > > > +       reg = SHUTDOWN_CTRL1_RATE_25HZ;
> > > > > +
> > > > > +       if ((temp == THERMAL_TEMP_INVALID) ||
> > > > > +           (temp < STAGE2_THRESHOLD_MIN)) {
> > > > > +               chip->thresh = THRESH_MIN;
> > > > > +               goto skip;
> > > > > +       }
> > > > > +
> > > > > +       if (temp <= STAGE2_THRESHOLD_MAX) {
> > > > > +               chip->thresh = THRESH_MAX -
> > > > > +                       ((STAGE2_THRESHOLD_MAX - temp) /
> > > > > +                        TEMP_THRESH_STEP);
> > > > > +               disable_s2_shutdown = true;
> > > > > +       } else {
> > > > > +               chip->thresh = THRESH_MAX;
> > > > > +
> > > > > +               if (!IS_ERR(chip->adc))
> > > > > +                       disable_s2_shutdown = true;
> > > > > +               else
> > > > > +                       dev_warn(chip->dev,
> > > > > +                                "No ADC is configured and critical temperature is above the maximum stage 2 threshold of 140°C! Configuring stage 2 shutdown at 140°C.\n");
> > > > 
> > > > Putting a non-ASCII character (the degree symbol) in your commit
> > > > message is one thing, but are you sure it's wise to put it in the
> > > > kernel logs?
> > > 
> > > A few other drivers also do this
> > > (drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c,
> > > drivers/macintosh/windfarm_pm121.c), however that doesn't mean it's a
> > > good idea. Will change to degC or C.
> > > 
> > > > > +       }
> > > > > +
> > > > > +skip:
> > > > > +       reg |= chip->thresh;
> > > > > +       if (disable_s2_shutdown)
> > > > > +               reg |= SHUTDOWN_CTRL1_OVERRIDE_S2;
> > > > > +
> > > > > +       ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > > > > +       if (ret < 0)
> > > > > +               return ret;
> > > > > +
> > > > > +       return ret;
> > > > 
> > > > Simplify the above lines to:
> > > > 
> > > > return qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
> > > 
> > > Ouch, my code is indeed dumb ...
> > > 
> > > > > @@ -313,12 +441,7 @@ static int qpnp_tm_probe(struct platform_device *pdev)
> > > > >         if (ret < 0)
> > > > >                 return ret;
> > > > >
> > > > > -       chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
> > > > > -                                                       &qpnp_tm_sensor_ops);
> > > > > -       if (IS_ERR(chip->tz_dev)) {
> > > > > -               dev_err(&pdev->dev, "failed to register sensor\n");
> > > > > -               return PTR_ERR(chip->tz_dev);
> > > > > -       }
> > > > > +       chip->initialized = true;
> > > > 
> > > > Should we add "thermal_zone_device_update(chip->tz_dev,
> > > > THERMAL_EVENT_UNSPECIFIED);" here
> > > 
> > > Seems reasonable, will do.
> > > 
> > > > ...also: do we care about any type of locking for chip->initialized?
> > > > Technically we can be running on weakly ordered memory so if
> > > > qpnp_tm_update_temp_no_adc() is running on a different processor then
> > > > possibly it could still keep returning the default temperature for a
> > > > little while.  We could try to analyze whether there's some sort of
> > > > implicit barrier or we could add manual memory barriers, but generally
> > > > I try to avoid that and just do the simple locking...  What about just
> > > > setting chip-Initialized = true at the end of qpnp_tm_init() while the
> > > > mutex is still held?
> > > 
> > > Thanks for pointing that out. I agree that we should keep things
> > > simple, chip->initialized to true at the end of qpnp_tm_init() sounds
> > > good to me.
> > > 
> > > > I'd also love to hear from someone with more thermal framework
> > > > experience to make sure it's legit to return a default value if
> > > > someone calls us while we're initting.  It seems sane to me but nice
> > > > to confirm it's OK.
> > > 
> > > An alternative could be to return THERMAL_TEMP_INVALID, however I
> > > don't see this handled outside of thermal_core.c, not sure if it could
> > > throw some other code off.
> > > 
> > > Comments from thermal folks on either approach (or alternatives) are
> > > definitely welcome :)
> > > 
> > > > Overall I like the idea of this patch so hopefully others do too.
> > > > Thanks for sending it out!
> > > 
> > 
> > minor ask for next version
> > 
> > 
> > WARNING: line over 80 characters
> > #159: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:65:
> > +#define STAGE2_THRESHOLD_MIN		125000	/* Stage 2 Threshold
> > Min: 125 C */
> > 
> > WARNING: line over 80 characters
> > #160: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:66:
> > +#define STAGE2_THRESHOLD_MAX		140000	/* Stage 2 Threshold
> > Max: 140 C */
> > 
> > ERROR: trailing statements should be on next line
> > #201: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:186:
> > +	if (!chip->adc)) {
> > 
> > CHECK: Unnecessary parentheses around 'temp == THERMAL_TEMP_INVALID'
> > #227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
> > +	if ((temp == THERMAL_TEMP_INVALID) ||
> > +	    (temp < STAGE2_THRESHOLD_MIN)) {
> > 
> > CHECK: Unnecessary parentheses around 'temp < STAGE2_THRESHOLD_MIN'
> > #227: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:220:
> > +	if ((temp == THERMAL_TEMP_INVALID) ||
> > +	    (temp < STAGE2_THRESHOLD_MIN)) {
> > 
> > CHECK: Unnecessary parentheses around 'trips[i].type ==
> > THERMAL_TRIP_CRITICAL'
> > #305: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:302:
> > +		if (of_thermal_is_trip_valid(chip->tz_dev, i) &&
> > +		    (trips[i].type == THERMAL_TRIP_CRITICAL))
> > 
> > CHECK: Alignment should match open parenthesis
> > #386: FILE: drivers/thermal/qcom-spmi-temp-alarm.c:427:
> > +	chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev,
> > 0, chip,
> > +
> > &qpnp_tm_sensor_ops);
> 
> 
> And it would be great if you could combine these two in your a single
> series, say when you fix this patch and send a new version of this
> series, please include these too:
> https://patchwork.kernel.org/patch/10543335/
> https://patchwork.kernel.org/patch/10543333/

Ok, will do

Patch

diff --git a/drivers/thermal/qcom-spmi-temp-alarm.c b/drivers/thermal/qcom-spmi-temp-alarm.c
index ad4f3a8d6560..936e4dde4298 100644
--- a/drivers/thermal/qcom-spmi-temp-alarm.c
+++ b/drivers/thermal/qcom-spmi-temp-alarm.c
@@ -23,6 +23,8 @@ 
 #include <linux/regmap.h>
 #include <linux/thermal.h>
 
+#include "thermal_core.h"
+
 #define QPNP_TM_REG_TYPE		0x04
 #define QPNP_TM_REG_SUBTYPE		0x05
 #define QPNP_TM_REG_STATUS		0x08
@@ -37,9 +39,11 @@ 
 #define STATUS_GEN2_STATE_MASK		GENMASK(6, 4)
 #define STATUS_GEN2_STATE_SHIFT		4
 
-#define SHUTDOWN_CTRL1_OVERRIDE_MASK	GENMASK(7, 6)
+#define SHUTDOWN_CTRL1_OVERRIDE_S2	BIT(6)
 #define SHUTDOWN_CTRL1_THRESHOLD_MASK	GENMASK(1, 0)
 
+#define SHUTDOWN_CTRL1_RATE_25HZ	BIT(3)
+
 #define ALARM_CTRL_FORCE_ENABLE		BIT(7)
 
 /*
@@ -56,12 +60,17 @@ 
 #define TEMP_THRESH_STEP		5000	/* Threshold step: 5 C */
 
 #define THRESH_MIN			0
+#define THRESH_MAX			3
+
+#define STAGE2_THRESHOLD_MIN		125000	/* Stage 2 Threshold Min: 125 C */
+#define STAGE2_THRESHOLD_MAX		140000	/* Stage 2 Threshold Max: 140 C */
 
 /* Temperature in Milli Celsius reported during stage 0 if no ADC is present */
 #define DEFAULT_TEMP			37000
 
 struct qpnp_tm_chip {
 	struct regmap			*map;
+	struct device			*dev;
 	struct thermal_zone_device	*tz_dev;
 	unsigned int			subtype;
 	long				temp;
@@ -69,6 +78,10 @@  struct qpnp_tm_chip {
 	unsigned int			stage;
 	unsigned int			prev_stage;
 	unsigned int			base;
+	/* protects .thresh, .stage and chip registers */
+	struct mutex			lock;
+	bool				initialized;
+
 	struct iio_channel		*adc;
 };
 
@@ -125,6 +138,8 @@  static int qpnp_tm_update_temp_no_adc(struct qpnp_tm_chip *chip)
 	unsigned int stage, stage_new, stage_old;
 	int ret;
 
+	WARN_ON(!mutex_is_locked(&chip->lock));
+
 	ret = qpnp_tm_get_temp_stage(chip);
 	if (ret < 0)
 		return ret;
@@ -163,8 +178,15 @@  static int qpnp_tm_get_temp(void *data, int *temp)
 	if (!temp)
 		return -EINVAL;
 
-	if (!chip->adc) {
+	if (!chip->initialized) {
+		*temp = DEFAULT_TEMP;
+		return 0;
+	}
+
+	if (!chip->adc)) {
+		mutex_lock(&chip->lock);
 		ret = qpnp_tm_update_temp_no_adc(chip);
+		mutex_unlock(&chip->lock);
 		if (ret < 0)
 			return ret;
 	} else {
@@ -180,8 +202,77 @@  static int qpnp_tm_get_temp(void *data, int *temp)
 	return 0;
 }
 
+static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
+					     int temp)
+{
+	u8 reg;
+	bool disable_s2_shutdown = false;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&chip->lock));
+
+	/*
+	 * Default: S2 and S3 shutdown enabled, thresholds at
+	 * 105C/125C/145C, monitoring at 25Hz
+	 */
+	reg = SHUTDOWN_CTRL1_RATE_25HZ;
+
+	if ((temp == THERMAL_TEMP_INVALID) ||
+	    (temp < STAGE2_THRESHOLD_MIN)) {
+		chip->thresh = THRESH_MIN;
+		goto skip;
+	}
+
+	if (temp <= STAGE2_THRESHOLD_MAX) {
+		chip->thresh = THRESH_MAX -
+			((STAGE2_THRESHOLD_MAX - temp) /
+			 TEMP_THRESH_STEP);
+		disable_s2_shutdown = true;
+	} else {
+		chip->thresh = THRESH_MAX;
+
+		if (!IS_ERR(chip->adc))
+			disable_s2_shutdown = true;
+		else
+			dev_warn(chip->dev,
+				 "No ADC is configured and critical temperature is above the maximum stage 2 threshold of 140°C! Configuring stage 2 shutdown at 140°C.\n");
+	}
+
+skip:
+	reg |= chip->thresh;
+	if (disable_s2_shutdown)
+		reg |= SHUTDOWN_CTRL1_OVERRIDE_S2;
+
+	ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
+	if (ret < 0)
+		return ret;
+
+	return ret;
+}
+
+static int qpnp_tm_set_trip_temp(void *data, int trip, int temp)
+{
+	struct qpnp_tm_chip *chip = data;
+	const struct thermal_trip *trip_points;
+	int ret;
+
+	trip_points = of_thermal_get_trip_points(chip->tz_dev);
+	if (!trip_points)
+		return -EINVAL;
+
+	if (trip_points[trip].type != THERMAL_TRIP_CRITICAL)
+		return 0;
+
+	mutex_lock(&chip->lock);
+	ret = qpnp_tm_update_critical_trip_temp(chip, temp);
+	mutex_unlock(&chip->lock);
+
+	return ret;
+}
+
 static const struct thermal_zone_of_device_ops qpnp_tm_sensor_ops = {
 	.get_temp = qpnp_tm_get_temp,
+	.set_trip_temp = qpnp_tm_set_trip_temp,
 };
 
 static irqreturn_t qpnp_tm_isr(int irq, void *data)
@@ -193,6 +284,29 @@  static irqreturn_t qpnp_tm_isr(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static int qpnp_tm_get_critical_trip_temp(struct qpnp_tm_chip *chip)
+{
+	int ntrips;
+	const struct thermal_trip *trips;
+	int i;
+
+	ntrips = of_thermal_get_ntrips(chip->tz_dev);
+	if (ntrips <= 0)
+		return THERMAL_TEMP_INVALID;
+
+	trips = of_thermal_get_trip_points(chip->tz_dev);
+	if (!trips)
+		return THERMAL_TEMP_INVALID;
+
+	for (i = 0; i < ntrips; i++) {
+		if (of_thermal_is_trip_valid(chip->tz_dev, i) &&
+		    (trips[i].type == THERMAL_TRIP_CRITICAL))
+			return trips[i].temperature;
+	}
+
+	return THERMAL_TEMP_INVALID;
+}
+
 /*
  * This function initializes the internal temp value based on only the
  * current thermal stage and threshold. Setup threshold control and
@@ -203,17 +317,20 @@  static int qpnp_tm_init(struct qpnp_tm_chip *chip)
 	unsigned int stage;
 	int ret;
 	u8 reg = 0;
+	int crit_temp;
+
+	mutex_lock(&chip->lock);
 
 	ret = qpnp_tm_read(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, &reg);
 	if (ret < 0)
-		return ret;
+		goto out;
 
 	chip->thresh = reg & SHUTDOWN_CTRL1_THRESHOLD_MASK;
 	chip->temp = DEFAULT_TEMP;
 
 	ret = qpnp_tm_get_temp_stage(chip);
 	if (ret < 0)
-		return ret;
+		goto out;
 	chip->stage = ret;
 
 	stage = chip->subtype == QPNP_TM_SUBTYPE_GEN1
@@ -224,21 +341,17 @@  static int qpnp_tm_init(struct qpnp_tm_chip *chip)
 			     (stage - 1) * TEMP_STAGE_STEP +
 			     TEMP_THRESH_MIN;
 
-	/*
-	 * Set threshold and disable software override of stage 2 and 3
-	 * shutdowns.
-	 */
-	chip->thresh = THRESH_MIN;
-	reg &= ~(SHUTDOWN_CTRL1_OVERRIDE_MASK | SHUTDOWN_CTRL1_THRESHOLD_MASK);
-	reg |= chip->thresh & SHUTDOWN_CTRL1_THRESHOLD_MASK;
-	ret = qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg);
+	crit_temp = qpnp_tm_get_critical_trip_temp(chip);
+	ret = qpnp_tm_update_critical_trip_temp(chip, crit_temp);
 	if (ret < 0)
-		return ret;
+		goto out;
 
 	/* Enable the thermal alarm PMIC module in always-on mode. */
 	reg = ALARM_CTRL_FORCE_ENABLE;
 	ret = qpnp_tm_write(chip, QPNP_TM_REG_ALARM_CTRL, reg);
 
+out:
+	mutex_unlock(&chip->lock);
 	return ret;
 }
 
@@ -257,6 +370,9 @@  static int qpnp_tm_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	dev_set_drvdata(&pdev->dev, chip);
+	chip->dev = &pdev->dev;
+
+	mutex_init(&chip->lock);
 
 	chip->map = dev_get_regmap(pdev->dev.parent, NULL);
 	if (!chip->map)
@@ -302,6 +418,18 @@  static int qpnp_tm_probe(struct platform_device *pdev)
 
 	chip->subtype = subtype;
 
+	/*
+	 * Register the sensor before initializing the hardware to be able to
+	 * read the trip points. get_temp() returns the default temperature
+	 * before the hardware initialization is completed.
+	 */
+	chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
+							&qpnp_tm_sensor_ops);
+	if (IS_ERR(chip->tz_dev)) {
+		dev_err(&pdev->dev, "failed to register sensor\n");
+		return PTR_ERR(chip->tz_dev);
+	}
+
 	ret = qpnp_tm_init(chip);
 	if (ret < 0) {
 		dev_err(&pdev->dev, "init failed\n");
@@ -313,12 +441,7 @@  static int qpnp_tm_probe(struct platform_device *pdev)
 	if (ret < 0)
 		return ret;
 
-	chip->tz_dev = devm_thermal_zone_of_sensor_register(&pdev->dev, 0, chip,
-							&qpnp_tm_sensor_ops);
-	if (IS_ERR(chip->tz_dev)) {
-		dev_err(&pdev->dev, "failed to register sensor\n");
-		return PTR_ERR(chip->tz_dev);
-	}
+	chip->initialized = true;
 
 	return 0;
 }