mbox series

[v4,00/12] Add throttler driver for non-thermal throttling

Message ID 20180621015237.100100-1-mka@chromium.org
Headers show
Series Add throttler driver for non-thermal throttling | expand

Message

Matthias Kaehlcke June 21, 2018, 1:52 a.m. UTC
This series adds the throttler driver, for non-thermal throttling of
CPUs and devfreq devices. A use case for non-thermal throttling could
be the detection of a high battery discharge voltage, close to the
over-current protection (OCP) limit of the battery.

To support throttling of devfreq devices the series introduces the
concept of a devfreq policy and the DEVFREQ_ADJUST notifier (similar
to CPUFREQ_ADJUST). Further it includes some related devfreq bugfixes
and improvements that change some of the code that is also touched
by the policy changes.

Matthias Kaehlcke (12):
  PM / devfreq: Init user limits from OPP limits, not viceversa
  PM / devfreq: Fix handling of min/max_freq == 0
  PM / devfreq: Don't adjust to user limits in governors
  PM / devfreq: Add struct devfreq_policy
  PM / devfreq: Add support for policy notifiers
  PM / devfreq: Make update_devfreq() public
  PM / devfreq: export devfreq_class
  cpufreq: Add stub for cpufreq_update_policy()
  dt-bindings: PM / OPP: add opp-throttlers property
  misc: throttler: Add core support for non-thermal throttling
  misc: throttler: Add Chrome OS EC throttler
  mfd: cros_ec: Add throttler sub-device

 Documentation/devicetree/bindings/opp/opp.txt |   3 +
 MAINTAINERS                                   |   7 +
 drivers/devfreq/devfreq.c                     | 222 ++++--
 drivers/devfreq/governor.h                    |   6 +-
 drivers/devfreq/governor_passive.c            |   4 +-
 drivers/devfreq/governor_performance.c        |   5 +-
 drivers/devfreq/governor_powersave.c          |   2 +-
 drivers/devfreq/governor_simpleondemand.c     |  12 +-
 drivers/devfreq/governor_userspace.c          |  16 +-
 drivers/mfd/cros_ec_dev.c                     |  19 +
 drivers/misc/Kconfig                          |   1 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/throttler/Kconfig                |  33 +
 drivers/misc/throttler/Makefile               |   2 +
 drivers/misc/throttler/core.c                 | 705 ++++++++++++++++++
 drivers/misc/throttler/cros_ec_throttler.c    | 111 +++
 include/linux/cpufreq.h                       |   1 +
 include/linux/devfreq.h                       | 113 ++-
 include/linux/throttler.h                     |  21 +
 19 files changed, 1159 insertions(+), 125 deletions(-)
 create mode 100644 drivers/misc/throttler/Kconfig
 create mode 100644 drivers/misc/throttler/Makefile
 create mode 100644 drivers/misc/throttler/core.c
 create mode 100644 drivers/misc/throttler/cros_ec_throttler.c
 create mode 100644 include/linux/throttler.h

Comments

Brian Norris June 22, 2018, 2:04 a.m. UTC | #1
Hi,

A few more things I noticed; probably my last thoughts on this
particular patch; and I think I reviewed the rest:

On Wed, Jun 20, 2018 at 06:52:35PM -0700, Matthias Kaehlcke wrote:
> The purpose of the throttler is to provide support for non-thermal
> throttling. Throttling is triggered by external event, e.g. the
> detection of a high battery discharge current, close to the OCP limit
> of the battery. The throttler is only in charge of the throttling, not
> the monitoring, which is done by another (possibly platform specific)
> driver.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> Changes in v4:
> - removed OOM logs
> - "does have no" => "has no" in log message
> - changed 'level' to unsigned int
> - hold mutex in throttler_set_level() when checking if level has changed
> - removed debugfs dir in throttler_teardown()
> - consolidated update of all devfreq devices in thr_update_devfreq()
> - added field 'shutting_down' to struct throttler
> - refactored teardown to avoid deadlocks
> - minor change in introductory comment
> 
> Changes in v3:
> - Kconfig: don't select CPU_FREQ and PM_DEVFREQ
> - added CONFIG_THROTTLER_DEBUG option
> - changed 'level' sysfs attribute to debugfs
> - introduced thr_<level> macros for logging
> - added debug logs
> - added field clamp_freq to struct cpufreq_thrdev and devfreq_thrdev
>   to keep track of the current frequency limits and avoid spammy logs
> 
> Changes in v2:
> - completely reworked the driver to support configuration through OPPs, which
>   requires a more dynamic handling
> - added sysfs attribute to set the level for debugging and testing
> - Makefile: depend on Kconfig option to traverse throttler directory
> - Kconfig: removed 'default n'
> - added SPDX line instead of license boiler-plate
> - added entry to MAINTAINERS file
> ---
>  MAINTAINERS                     |   7 +
>  drivers/misc/Kconfig            |   1 +
>  drivers/misc/Makefile           |   1 +
>  drivers/misc/throttler/Kconfig  |  23 ++
>  drivers/misc/throttler/Makefile |   1 +
>  drivers/misc/throttler/core.c   | 705 ++++++++++++++++++++++++++++++++
>  include/linux/throttler.h       |  21 +
>  7 files changed, 759 insertions(+)
>  create mode 100644 drivers/misc/throttler/Kconfig
>  create mode 100644 drivers/misc/throttler/Makefile
>  create mode 100644 drivers/misc/throttler/core.c
>  create mode 100644 include/linux/throttler.h
> 

...

> diff --git a/drivers/misc/throttler/core.c b/drivers/misc/throttler/core.c
> new file mode 100644
> index 000000000000..305964cfb0b7
> --- /dev/null
> +++ b/drivers/misc/throttler/core.c
> @@ -0,0 +1,705 @@
> +// SPDX-License-Identifier: GPL-2.0

...

> +
> +static int thr_handle_devfreq_event(struct notifier_block *nb,
> +				    unsigned long event, void *data);
> +
> +static unsigned long thr_get_throttling_freq(struct thr_freq_table *ft,
> +					     unsigned int level)
> +{
> +	if (level == 0) {
> +		WARN(true, "level == 0");

It's possible to get here, if the level gets changed while you're
handling a devfreq event. I'd think you can drop the WARN() entirely and
just make sure to handle this case properly.

> +		return ULONG_MAX;
> +	}
> +
> +	if (level <= ft->n_entries)
> +		return ft->freqs[level - 1];
> +	else
> +		return ft->freqs[ft->n_entries - 1];
> +}
> +

...

> +static int thr_handle_cpufreq_event(struct notifier_block *nb,
> +				unsigned long event, void *data)
> +{
> +	struct throttler *thr =
> +		container_of(nb, struct throttler, cpufreq.nb);
> +	struct cpufreq_policy *policy = data;
> +	struct cpufreq_thrdev *cftd;
> +
> +	if ((event != CPUFREQ_ADJUST) || thr->shutting_down)
> +		return 0;
> +
> +	mutex_lock(&thr->lock);
> +
> +	if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
> +		goto out;
> +
> +	if (!cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_initialized)) {
> +		thr_cpufreq_init(thr, policy->cpu);
> +
> +		if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
> +			goto out;
> +
> +		thr_dbg(thr, "CPU%d is used for throttling\n", policy->cpu);
> +	}
> +
> +	/*
> +	 * Can't do this check earlier, otherwise we might miss CPU policies
> +	 * that are added after setup().
> +	 */
> +	if (thr->level == 0) {
> +		list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> +			if (cftd->cpu != policy->cpu)
> +				continue;
> +
> +			if (cftd->clamp_freq != 0) {
> +				thr_dbg(thr, "unthrottling CPU%d\n", cftd->cpu);
> +				cftd->clamp_freq = 0;
> +			}

Take it or leave it, but this entire 'level == 0' loop looks like it
could be easily merged into the next (very similar) loop, and avoid the
'goto'.

> +		}
> +
> +		goto out;
> +	}
> +
> +	list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> +		unsigned long clamp_freq;
> +
> +		if (cftd->cpu != policy->cpu)
> +			continue;
> +
> +		clamp_freq = thr_get_throttling_freq(&cftd->freq_table,
> +						     thr->level) / 1000;
> +		if (cftd->clamp_freq != clamp_freq) {
> +			thr_dbg(thr, "throttling CPU%d to %lu MHz\n", cftd->cpu,
> +				clamp_freq / 1000);
> +			cftd->clamp_freq = clamp_freq;
> +		}
> +
> +		if (clamp_freq < policy->max)
> +			cpufreq_verify_within_limits(policy, 0, clamp_freq);
> +	}
> +
> +out:
> +	mutex_unlock(&thr->lock);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +/*
> + * Notifier called by devfreq. Can't acquire thr->lock since it might
> + * already be held by throttler_set_level(). It isn't necessary to
> + * acquire the lock for the following reasons:
> + *
> + * Only the devfreq_thrdev and thr->level are accessed in this function.
> + * The devfreq device won't go away (or change) during the execution of
> + * this function, since we are called from the devfreq core. Theoretically
> + * thr->level could change and we'd apply an outdated setting, however in
> + * this case the function would run again shortly after and apply the
> + * correct value.
> + */
> +static int thr_handle_devfreq_event(struct notifier_block *nb,
> +				    unsigned long event, void *data)
> +{
> +	struct devfreq_thrdev *dftd =
> +		container_of(nb, struct devfreq_thrdev, nb);
> +	struct throttler *thr = dftd->thr;
> +	struct devfreq_policy *policy = data;
> +	unsigned long clamp_freq;
> +
> +	if ((event != DEVFREQ_ADJUST) || thr->shutting_down)
> +		return NOTIFY_DONE;
> +
> +	if (thr->level == 0) {
> +		if (dftd->clamp_freq != 0) {
> +			thr_dbg(thr, "unthrottling '%s'\n",
> +				dev_name(&dftd->devfreq->dev));
> +			dftd->clamp_freq = 0;
> +		}
> +
> +		return NOTIFY_DONE;
> +	}
> +

Given that the level can change in between the last reading (thr->level
== 0) and here...it seems like it would be better to really only read
the level once, and ensure that the same logic can handle both zero and
non-zero levels. e.g, you could try READ_ONCE(thr->level) and stash the
value in a local? And you could probably eliminate the entire 'if'
above, and just have a special case for 'clamp_freq == UINT_MAX'
following here.

Brian

> +	clamp_freq = thr_get_throttling_freq(&dftd->freq_table, thr->level);
> +	if (clamp_freq != dftd->clamp_freq) {
> +		thr_dbg(thr, "throttling '%s' to %lu MHz\n",
> +			dev_name(&dftd->devfreq->dev), clamp_freq / 1000000);
> +		dftd->clamp_freq = clamp_freq;
> +	}
> +
> +	if (clamp_freq < policy->max)
> +		devfreq_verify_within_limits(policy, 0, clamp_freq);
> +
> +	return NOTIFY_DONE;
> +}
> +

...
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Norris June 22, 2018, 2:06 a.m. UTC | #2
On Wed, Jun 20, 2018 at 06:52:37PM -0700, Matthias Kaehlcke wrote:
> Instantiate the CrOS EC throttler if it is enabled in the kernel
> configuration.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>

Reviewed-by: Brian Norris <briannorris@chromium.org>

> ---
> Changes in v4:
> - register throttler in cros_ec_dev.c instead of cros_ec.c
> 
> Changes in v3:
> - patch added to series
> ---
>  drivers/mfd/cros_ec_dev.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/mfd/cros_ec_dev.c b/drivers/mfd/cros_ec_dev.c
> index eafd06f62a3a..9e56c2793075 100644
> --- a/drivers/mfd/cros_ec_dev.c
> +++ b/drivers/mfd/cros_ec_dev.c
> @@ -383,6 +383,22 @@ static void cros_ec_sensors_register(struct cros_ec_dev *ec)
>  	kfree(msg);
>  }
>  
> +static const struct mfd_cell ec_throttler_cells[] = {
> +	{ .name = "cros-ec-throttler" }
> +};
> +
> +static void cros_ec_throttler_register(struct cros_ec_dev *ec)
> +{
> +	int ret;
> +
> +	ret = mfd_add_devices(ec->dev, 0, ec_throttler_cells,
> +			      ARRAY_SIZE(ec_throttler_cells),
> +			      NULL, 0, NULL);
> +	if (ret)
> +		dev_err(ec->dev,
> +			"failed to add cros-ec-throttler device: %d\n", ret);
> +}
> +
>  static int ec_device_probe(struct platform_device *pdev)
>  {
>  	int retval = -ENOMEM;
> @@ -422,6 +438,9 @@ static int ec_device_probe(struct platform_device *pdev)
>  	if (cros_ec_check_features(ec, EC_FEATURE_MOTION_SENSE))
>  		cros_ec_sensors_register(ec);
>  
> +	if (IS_ENABLED(CONFIG_CROS_EC_THROTTLER))
> +		cros_ec_throttler_register(ec);
> +
>  	/* Take control of the lightbar from the EC. */
>  	lb_manual_suspend_ctrl(ec, 1);
>  
> -- 
> 2.18.0.rc2.346.g013aa6912e-goog
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthias Kaehlcke June 23, 2018, 1:31 a.m. UTC | #3
On Thu, Jun 21, 2018 at 07:04:33PM -0700, Brian Norris wrote:
> Hi,
> 
> A few more things I noticed; probably my last thoughts on this
> particular patch; and I think I reviewed the rest:
> 
> On Wed, Jun 20, 2018 at 06:52:35PM -0700, Matthias Kaehlcke wrote:
> > The purpose of the throttler is to provide support for non-thermal
> > throttling. Throttling is triggered by external event, e.g. the
> > detection of a high battery discharge current, close to the OCP limit
> > of the battery. The throttler is only in charge of the throttling, not
> > the monitoring, which is done by another (possibly platform specific)
> > driver.
> > 
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > Changes in v4:
> > - removed OOM logs
> > - "does have no" => "has no" in log message
> > - changed 'level' to unsigned int
> > - hold mutex in throttler_set_level() when checking if level has changed
> > - removed debugfs dir in throttler_teardown()
> > - consolidated update of all devfreq devices in thr_update_devfreq()
> > - added field 'shutting_down' to struct throttler
> > - refactored teardown to avoid deadlocks
> > - minor change in introductory comment
> > 
> > Changes in v3:
> > - Kconfig: don't select CPU_FREQ and PM_DEVFREQ
> > - added CONFIG_THROTTLER_DEBUG option
> > - changed 'level' sysfs attribute to debugfs
> > - introduced thr_<level> macros for logging
> > - added debug logs
> > - added field clamp_freq to struct cpufreq_thrdev and devfreq_thrdev
> >   to keep track of the current frequency limits and avoid spammy logs
> > 
> > Changes in v2:
> > - completely reworked the driver to support configuration through OPPs, which
> >   requires a more dynamic handling
> > - added sysfs attribute to set the level for debugging and testing
> > - Makefile: depend on Kconfig option to traverse throttler directory
> > - Kconfig: removed 'default n'
> > - added SPDX line instead of license boiler-plate
> > - added entry to MAINTAINERS file
> > ---
> >  MAINTAINERS                     |   7 +
> >  drivers/misc/Kconfig            |   1 +
> >  drivers/misc/Makefile           |   1 +
> >  drivers/misc/throttler/Kconfig  |  23 ++
> >  drivers/misc/throttler/Makefile |   1 +
> >  drivers/misc/throttler/core.c   | 705 ++++++++++++++++++++++++++++++++
> >  include/linux/throttler.h       |  21 +
> >  7 files changed, 759 insertions(+)
> >  create mode 100644 drivers/misc/throttler/Kconfig
> >  create mode 100644 drivers/misc/throttler/Makefile
> >  create mode 100644 drivers/misc/throttler/core.c
> >  create mode 100644 include/linux/throttler.h
> > 
> 
> ...
> 
> > diff --git a/drivers/misc/throttler/core.c b/drivers/misc/throttler/core.c
> > new file mode 100644
> > index 000000000000..305964cfb0b7
> > --- /dev/null
> > +++ b/drivers/misc/throttler/core.c
> > @@ -0,0 +1,705 @@
> > +// SPDX-License-Identifier: GPL-2.0
> 
> ...
> 
> > +
> > +static int thr_handle_devfreq_event(struct notifier_block *nb,
> > +				    unsigned long event, void *data);
> > +
> > +static unsigned long thr_get_throttling_freq(struct thr_freq_table *ft,
> > +					     unsigned int level)
> > +{
> > +	if (level == 0) {
> > +		WARN(true, "level == 0");
> 
> It's possible to get here, if the level gets changed while you're
> handling a devfreq event. I'd think you can drop the WARN() entirely and
> just make sure to handle this case properly.

Right, I didn't take into account here that level could change. Will
adapt.

> > +		return ULONG_MAX;
> > +	}
> > +
> > +	if (level <= ft->n_entries)
> > +		return ft->freqs[level - 1];
> > +	else
> > +		return ft->freqs[ft->n_entries - 1];
> > +}
> > +
> 
> ...
> 
> > +static int thr_handle_cpufreq_event(struct notifier_block *nb,
> > +				unsigned long event, void *data)
> > +{
> > +	struct throttler *thr =
> > +		container_of(nb, struct throttler, cpufreq.nb);
> > +	struct cpufreq_policy *policy = data;
> > +	struct cpufreq_thrdev *cftd;
> > +
> > +	if ((event != CPUFREQ_ADJUST) || thr->shutting_down)
> > +		return 0;
> > +
> > +	mutex_lock(&thr->lock);
> > +
> > +	if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
> > +		goto out;
> > +
> > +	if (!cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_initialized)) {
> > +		thr_cpufreq_init(thr, policy->cpu);
> > +
> > +		if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
> > +			goto out;
> > +
> > +		thr_dbg(thr, "CPU%d is used for throttling\n", policy->cpu);
> > +	}
> > +
> > +	/*
> > +	 * Can't do this check earlier, otherwise we might miss CPU policies
> > +	 * that are added after setup().
> > +	 */
> > +	if (thr->level == 0) {
> > +		list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> > +			if (cftd->cpu != policy->cpu)
> > +				continue;
> > +
> > +			if (cftd->clamp_freq != 0) {
> > +				thr_dbg(thr, "unthrottling CPU%d\n", cftd->cpu);
> > +				cftd->clamp_freq = 0;
> > +			}
> 
> Take it or leave it, but this entire 'level == 0' loop looks like it
> could be easily merged into the next (very similar) loop, and avoid the
> 'goto'.

Merging the two loops sounds good.

> > +		}
> > +
> > +		goto out;
> > +	}
> > +
> > +	list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> > +		unsigned long clamp_freq;
> > +
> > +		if (cftd->cpu != policy->cpu)
> > +			continue;
> > +
> > +		clamp_freq = thr_get_throttling_freq(&cftd->freq_table,
> > +						     thr->level) / 1000;
> > +		if (cftd->clamp_freq != clamp_freq) {
> > +			thr_dbg(thr, "throttling CPU%d to %lu MHz\n", cftd->cpu,
> > +				clamp_freq / 1000);
> > +			cftd->clamp_freq = clamp_freq;
> > +		}
> > +
> > +		if (clamp_freq < policy->max)
> > +			cpufreq_verify_within_limits(policy, 0, clamp_freq);
> > +	}
> > +
> > +out:
> > +	mutex_unlock(&thr->lock);
> > +
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +/*
> > + * Notifier called by devfreq. Can't acquire thr->lock since it might
> > + * already be held by throttler_set_level(). It isn't necessary to
> > + * acquire the lock for the following reasons:
> > + *
> > + * Only the devfreq_thrdev and thr->level are accessed in this function.
> > + * The devfreq device won't go away (or change) during the execution of
> > + * this function, since we are called from the devfreq core. Theoretically
> > + * thr->level could change and we'd apply an outdated setting, however in
> > + * this case the function would run again shortly after and apply the
> > + * correct value.
> > + */
> > +static int thr_handle_devfreq_event(struct notifier_block *nb,
> > +				    unsigned long event, void *data)
> > +{
> > +	struct devfreq_thrdev *dftd =
> > +		container_of(nb, struct devfreq_thrdev, nb);
> > +	struct throttler *thr = dftd->thr;
> > +	struct devfreq_policy *policy = data;
> > +	unsigned long clamp_freq;
> > +
> > +	if ((event != DEVFREQ_ADJUST) || thr->shutting_down)
> > +		return NOTIFY_DONE;
> > +
> > +	if (thr->level == 0) {
> > +		if (dftd->clamp_freq != 0) {
> > +			thr_dbg(thr, "unthrottling '%s'\n",
> > +				dev_name(&dftd->devfreq->dev));
> > +			dftd->clamp_freq = 0;
> > +		}
> > +
> > +		return NOTIFY_DONE;
> > +	}
> > +
> 
> Given that the level can change in between the last reading (thr->level
> == 0) and here...it seems like it would be better to really only read
> the level once, and ensure that the same logic can handle both zero and
> non-zero levels. e.g, you could try READ_ONCE(thr->level) and stash the
> value in a local?

Ack

> And you could probably eliminate the entire 'if'
> above, and just have a special case for 'clamp_freq == UINT_MAX'
> following here.

It might end up being a line shorter or so, but I'm not convinced it
would improve readability. I'd prefer to keep it as is.

Thanks

Matthias
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html