[RFC,0/9] Apple SoC CPU P-state switching

Message ID	20211011165707.138157-1-marcan@marcan.st
Headers	show Return-Path: <devicetree-owner@vger.kernel.org> sender: hector@marcansoft.com) by mail.marcansoft.com (Postfix) with ESMTPSA id EFDA84206F; Mon, 11 Oct 2021 16:57:23 +0000 (UTC) From: Hector Martin <marcan@marcan.st> To: linux-arm-kernel@lists.infradead.org Cc: Hector Martin <marcan@marcan.st>, Alyssa Rosenzweig <alyssa@rosenzweig.io>, Sven Peter <sven@svenpeter.dev>, Marc Zyngier <maz@kernel.org>, Mark Kettenis <mark.kettenis@xs4all.nl>, Michael Turquette <mturquette@baylibre.com>, Stephen Boyd <sboyd@kernel.org>, Rob Herring <robh+dt@kernel.org>, Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>, Viresh Kumar <vireshk@kernel.org>, Nishanth Menon <nm@ti.com>, Catalin Marinas <catalin.marinas@arm.com>, "Rafael J. Wysocki" <rafael@kernel.org>, Kevin Hilman <khilman@kernel.org>, Ulf Hansson <ulf.hansson@linaro.org>, linux-clk@vger.kernel.org, devicetree@vger.kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 0/9] Apple SoC CPU P-state switching Date: Tue, 12 Oct 2021 01:56:58 +0900 Message-Id: <20211011165707.138157-1-marcan@marcan.st> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Apple SoC CPU P-state switching \| expand [RFC,0/9] Apple SoC CPU P-state switching [RFC,2/9] dt-bindings: memory-controller: Add apple,mcc binding [RFC,3/9] dt-bindings: clock: Add apple,cluster-clk binding

Hector Martin Oct. 11, 2021, 4:56 p.m. UTC

Hi folks, here's a first attempt at cpufreq support for the Apple M1.
I'd appreciate any feedback on this approach.

The hardware in these SoCs is very simple: you just poke a single
register to change the performance state of a CPU cluster. There's
some init required on older firmware versions, but we're going to
say that's the bootloader's responsibility. This keeps the driver
nice and simple and generic and likely to work on future SoCs too.

However, there's a catch: the memory controller config should also be
modified when higher clock states are engaged on the P-cores, to
get optimal performance.

This series implements this using two drivers, on top of the existing
cpufreq-dt infrastructure. The cpu clock driver represents the p-state
switching as if it were a standard clock driver, so it can work with
cpufreq-dt. To do this, it also needs access to the OPP table, so it can
map the incoming clock frequences back to P-State index numbers, so that
is present in the binding. This might be a bit strange, since the same
OPP table is referenced by the CPUs themselves, and by the clocks driver
that provides the actual switching for them...

The memory controller stuff is implemented as a genpd provider that
exposes two performance states that the CPU OPP tables can depend on.
Unfortunately, the cpufreq path doesn't work properly for this, since
the CPUs aren't typical devices participating in runtime-pm. So instead
I opted to put that logic in the clock driver, which means it gets a
power-domains reference to the memory controller. This required a hack
to the OPP core so that it wouldn't complain about the missing parent
domain when evaluating the OPPs in the context of the CPUs...

The actual memory controller config is two magic numbers per performance
state. I'd love to find out what they do, but this seems unlikely
without docs or a deep memory performance analysis expedition... so I
think we're going to have to settle for this way, at least for now. If
things become better understood in the future, we can modify the binding
and keep the driver backwards-compatible with old DTs at least.

I did benchmark the CPU p-state switching, so the latency numbers there
have been experimentally measured. The CPU capacity numbers are also
based on real benchmarks (oddly enough, Dhrystone was a big outlier
here that was not representative of everything else, so we didn't use
it).

Patches:
 #1:   MAINTAINERS updates, split out so this can go through the SoC
       tree so we can spare all the subsystem maintainers the merge
       conflicts, since we have a bunch of other changes here going on
       in parallel.
 #2-3: DT bindings
 #4:   The aforementioned hack for the OPP core
 #5:   Add of_genpd_add_provider_simple_noclk()
 #6:   Memory controller driver
 #7:   CPU p-state clock driver
 #8:   Add some deps for ARCH_APPLE
 #9:   DT updates (for asahi-soc tree)

Hector Martin (9):
  MAINTAINERS: apple: Add apple-mcc and clk-apple-cluster paths
  dt-bindings: memory-controller: Add apple,mcc binding
  dt-bindings: clock: Add apple,cluster-clk binding
  opp: core: Don't warn if required OPP device does not exist
  PM: domains: Add of_genpd_add_provider_simple_noclk()
  memory: apple: Add apple-mcc driver to manage MCC perf in Apple SoCs
  clk: apple: Add clk-apple-cluster driver to manage CPU p-states
  arm64: apple: Select MEMORY and APPLE_MCC
  arm64: apple: Add CPU frequency scaling support for t8103

 .../bindings/clock/apple,cluster-clk.yaml     | 115 ++++++++
 .../memory-controllers/apple,mcc.yaml         |  80 ++++++
 .../opp/apple,mcc-operating-points.yaml       |  62 +++++
 MAINTAINERS                                   |   5 +
 arch/arm64/Kconfig.platforms                  |   2 +
 arch/arm64/boot/dts/apple/t8103.dtsi          | 255 +++++++++++++++++-
 drivers/base/power/domain.c                   |  39 ++-
 drivers/clk/Kconfig                           |   9 +
 drivers/clk/Makefile                          |   1 +
 drivers/clk/clk-apple-cluster.c               | 184 +++++++++++++
 drivers/memory/Kconfig                        |   9 +
 drivers/memory/Makefile                       |   1 +
 drivers/memory/apple-mcc.c                    | 130 +++++++++
 drivers/opp/core.c                            |   5 +-
 include/linux/pm_domain.h                     |   8 +
 15 files changed, 887 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/clock/apple,cluster-clk.yaml
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/apple,mcc.yaml
 create mode 100644 Documentation/devicetree/bindings/opp/apple,mcc-operating-points.yaml
 create mode 100644 drivers/clk/clk-apple-cluster.c
 create mode 100644 drivers/memory/apple-mcc.c

Viresh Kumar Oct. 12, 2021, 3:21 a.m. UTC | #1

On 12-10-21, 01:57, Hector Martin wrote:
> When required-opps is used in CPU OPP tables, there is no parent power
> domain to drive it. Squelch this error, to allow a clock driver to
> handle this directly instead.
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/opp/core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 04b4691a8aac..89e616721f70 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -873,12 +873,13 @@ static int _set_required_opp(struct device *dev, struct device *pd_dev,
>  		return 0;
>  
>  	ret = dev_pm_genpd_set_performance_state(pd_dev, pstate);
> -	if (ret) {
> +	if (ret && ret != -ENODEV) {
>  		dev_err(dev, "Failed to set performance rate of %s: %d (%d)\n",
>  			dev_name(pd_dev), pstate, ret);
> +		return ret;
>  	}
>  
> -	return ret;
> +	return 0;
>  }
>  
>  /* This is only called for PM domain for now */

I am not sure why you need this, since _set_required_opps() has this check:

	if (unlikely(!required_opp_tables[0]->is_genpd)) {
		dev_err(dev, "required-opps don't belong to a genpd\n");
		return -ENOENT;
	}

Hector Martin Oct. 12, 2021, 5:34 a.m. UTC | #2

On 12/10/2021 12.21, Viresh Kumar wrote:
> I am not sure why you need this, since _set_required_opps() has this check:
> 
> 	if (unlikely(!required_opp_tables[0]->is_genpd)) {
> 		dev_err(dev, "required-opps don't belong to a genpd\n");
> 		return -ENOENT;
> 	}
> 

The table *is* assigned to a genpd (the memory controller), it's just 
that that genpd isn't actually a parent of the CPU device. Without the 
patch you end up with:

[    3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19)
[    3.042881] cpu cpu4: Failed to set required opps: -19
[    3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19

Viresh Kumar Oct. 12, 2021, 5:51 a.m. UTC | #3

On 12-10-21, 14:34, Hector Martin wrote:
> The table *is* assigned to a genpd (the memory controller), it's just that
> that genpd isn't actually a parent of the CPU device. Without the patch you
> end up with:
> 
> [    3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19)
> [    3.042881] cpu cpu4: Failed to set required opps: -19
> [    3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19

Hmm, Saravana and Sibi were working on a similar problem earlier and decided to
solve this using devfreq instead. Don't remember the exact series which got
merged for this, Sibi ?

If this part fails, how do you actually set the performance state of the memory
controller's genpd ?

Hector Martin Oct. 12, 2021, 5:57 a.m. UTC | #4

On 12/10/2021 14.51, Viresh Kumar wrote:
> On 12-10-21, 14:34, Hector Martin wrote:
>> The table *is* assigned to a genpd (the memory controller), it's just that
>> that genpd isn't actually a parent of the CPU device. Without the patch you
>> end up with:
>>
>> [    3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19)
>> [    3.042881] cpu cpu4: Failed to set required opps: -19
>> [    3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19
> 
> Hmm, Saravana and Sibi were working on a similar problem earlier and decided to
> solve this using devfreq instead. Don't remember the exact series which got
> merged for this, Sibi ?
> 
> If this part fails, how do you actually set the performance state of the memory
> controller's genpd ?

The clock controller has the genpd as an actual power-domain parent, so 
it does it instead. From patch #7:

> +	if (cluster->has_pd)
> +		dev_pm_genpd_set_performance_state(cluster->dev,
> +						   dev_pm_opp_get_required_pstate(opp, 0));
> +

This is arguably not entirely representative of how the hardware works, 
since technically the cluster switching couldn't care less what the 
memory controller is doing; it's a soft dependency, states that should 
be switched together but are not interdependent (in fact, the clock code 
does this unconditionally after the CPU p-state change, regardless of 
whether we're shifting up or down; this is, FWIW, the same order macOS 
uses, and it clearly doesn't matter which way you do it).

Krzysztof Kozlowski Oct. 12, 2021, 9:19 a.m. UTC | #5

On 11/10/2021 18:57, Hector Martin wrote:
> This driver binds to the memory controller hardware in Apple SoCs such
> as the Apple M1, and provides a power domain that downstream devices can
> use to change the performance state of the memory controller.
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/memory/Kconfig     |   9 +++
>  drivers/memory/Makefile    |   1 +
>  drivers/memory/apple-mcc.c | 130 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 140 insertions(+)
>  create mode 100644 drivers/memory/apple-mcc.c
> 
> diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
> index 72c0df129d5c..48ef3d563a1c 100644
> --- a/drivers/memory/Kconfig
> +++ b/drivers/memory/Kconfig
> @@ -30,6 +30,15 @@ config ARM_PL172_MPMC
>  	  If you have an embedded system with an AMBA bus and a PL172
>  	  controller, say Y or M here.
>  
> +config APPLE_MCC
> +	tristate "Apple SoC MCC driver"
> +	default y if ARCH_APPLE
> +	select PM_GENERIC_DOMAINS
> +	depends on ARCH_APPLE || COMPILE_TEST
> +	help
> +	  This driver manages performance tuning for the memory controller in
> +	  Apple SoCs, such as the Apple M1.
> +
>  config ATMEL_SDRAMC
>  	bool "Atmel (Multi-port DDR-)SDRAM Controller"
>  	default y if ARCH_AT91
> diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
> index bc7663ed1c25..947840cbd2d4 100644
> --- a/drivers/memory/Makefile
> +++ b/drivers/memory/Makefile
> @@ -8,6 +8,7 @@ ifeq ($(CONFIG_DDR),y)
>  obj-$(CONFIG_OF)		+= of_memory.o
>  endif
>  obj-$(CONFIG_ARM_PL172_MPMC)	+= pl172.o
> +obj-$(CONFIG_APPLE_MCC)		+= apple-mcc.o
>  obj-$(CONFIG_ATMEL_SDRAMC)	+= atmel-sdramc.o
>  obj-$(CONFIG_ATMEL_EBI)		+= atmel-ebi.o
>  obj-$(CONFIG_BRCMSTB_DPFE)	+= brcmstb_dpfe.o
> diff --git a/drivers/memory/apple-mcc.c b/drivers/memory/apple-mcc.c
> new file mode 100644
> index 000000000000..55959f034b9a
> --- /dev/null
> +++ b/drivers/memory/apple-mcc.c
> @@ -0,0 +1,130 @@
> +// SPDX-License-Identifier: GPL-2.0-only OR MIT
> +/*
> + * Apple SoC MCC memory controller performance control driver
> + *
> + * Copyright The Asahi Linux Contributors

Copyright date?

> + */
> +
> +#include <linux/delay.h>
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_domain.h>
> +#include <linux/pm_opp.h>
> +
> +#define APPLE_MCC_PERF_CONFIG1  0xdc4
> +#define APPLE_MCC_PERF_CONFIG2  0xdbc
> +#define APPLE_MCC_CHANNEL(x)	((x) * 0x40000)
> +
> +struct apple_mcc {
> +	struct device *dev;
> +	struct generic_pm_domain genpd;
> +	void __iomem *reg_base;
> +	u32 num_channels;
> +};
> +
> +#define to_apple_mcc(_genpd) container_of(_genpd, struct apple_mcc, genpd)
> +
> +static int apple_mcc_set_performance_state(struct generic_pm_domain *genpd, unsigned int state)
> +{
> +	struct apple_mcc *mcc = to_apple_mcc(genpd);
> +	struct dev_pm_opp *opp;
> +	struct device_node *np;
> +	u32 perf_config[2];
> +	unsigned int i;
> +
> +	dev_dbg(mcc->dev, "switching to perf state %d\n", state);
> +
> +	opp = dev_pm_opp_find_level_exact(&mcc->genpd.dev, state);
> +	if (IS_ERR(opp))
> +		return PTR_ERR(opp);
> +
> +	np = dev_pm_opp_get_of_node(opp);
> +	if (of_property_read_u32_array(np, "apple,memory-perf-config",
> +		perf_config, ARRAY_SIZE(perf_config))) {
> +		dev_err(mcc->dev, "missing apple,memory-perf-config property");
> +		of_node_put(np);
> +		return -EINVAL;
> +	}
> +	of_node_put(np);
> +
> +	for (i = 0; i < mcc->num_channels; i++) {
> +		writel_relaxed(perf_config[0],
> +			       mcc->reg_base + APPLE_MCC_CHANNEL(i) + APPLE_MCC_PERF_CONFIG1);
> +		writel_relaxed(perf_config[1],
> +			       mcc->reg_base + APPLE_MCC_CHANNEL(i) + APPLE_MCC_PERF_CONFIG2);
> +	}
> +
> +	return 0;
> +}
> +
> +static unsigned int apple_mcc_opp_to_performance_state(struct generic_pm_domain *genpd,
> +						       struct dev_pm_opp *opp)
> +{
> +	return dev_pm_opp_get_level(opp);
> +}
> +
> +static int apple_mcc_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct device_node *node = dev->of_node;

By convention mostly we call the variable "np".

> +	struct apple_mcc *mcc;
> +	int ret;
> +
> +	mcc = devm_kzalloc(dev, sizeof(*mcc), GFP_KERNEL);
> +	if (!mcc)
> +		return -ENOMEM;
> +
> +	mcc->reg_base = devm_platform_ioremap_resource(pdev, 0);
> +	if (IS_ERR(mcc->reg_base))
> +		return PTR_ERR(mcc->reg_base);
> +
> +	if (of_property_read_u32(node, "apple,num-channels", &mcc->num_channels)) {

Don't you have a limit of supported channels? It cannot be any uint32...

> +		dev_err(dev, "missing apple,num-channels property\n");

Use almost everywhere dev_err_probe - less code and you get error msg
printed.

> +		return -ENOENT;
> +	}
> +
> +	mcc->dev = dev;
> +	mcc->genpd.name = "apple-mcc-perf";
> +	mcc->genpd.opp_to_performance_state = apple_mcc_opp_to_performance_state;
> +	mcc->genpd.set_performance_state = apple_mcc_set_performance_state;
> +
> +	ret = pm_genpd_init(&mcc->genpd, NULL, false);
> +	if (ret < 0) {
> +		dev_err(dev, "pm_genpd_init failed\n");
> +		return ret;
> +	}
> +
> +	ret = of_genpd_add_provider_simple_noclk(node, &mcc->genpd);
> +	if (ret < 0) {
> +		dev_err(dev, "of_genpd_add_provider_simple failed\n");
> +		return ret;
> +	}
> +
> +	dev_info(dev, "Apple MCC performance driver initialized\n");

Please skip it, or at least make it a dev_dbg, you don't print any
valuable information here.

> +
> +	return 0;
> +}
> +
> +static const struct of_device_id apple_mcc_of_match[] = {
> +	{ .compatible = "apple,mcc" },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(of, apple_mcc_of_match);
> +
> +static struct platform_driver apple_mcc_driver = {
> +	.probe = apple_mcc_probe,
> +	.driver = {
> +		.name = "apple-mcc",
> +		.of_match_table = apple_mcc_of_match,
> +	},
> +};

module_platform_driver() goes here.

> +
> +MODULE_AUTHOR("Hector Martin <marcan@marcan.st>");
> +MODULE_DESCRIPTION("MCC memory controller performance tuning driver for Apple SoCs");
> +MODULE_LICENSE("GPL v2");

I think this will be "Dual MIT/GPL", based on your SPDX.

> +
> +module_platform_driver(apple_mcc_driver);
> 


Best regards,
Krzysztof

Viresh Kumar Oct. 12, 2021, 9:26 a.m. UTC | #6

On 12-10-21, 14:57, Hector Martin wrote:
> On 12/10/2021 14.51, Viresh Kumar wrote:
> > On 12-10-21, 14:34, Hector Martin wrote:
> > > The table *is* assigned to a genpd (the memory controller), it's just that
> > > that genpd isn't actually a parent of the CPU device. Without the patch you
> > > end up with:
> > > 
> > > [    3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19)
> > > [    3.042881] cpu cpu4: Failed to set required opps: -19
> > > [    3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19
> > 
> > Hmm, Saravana and Sibi were working on a similar problem earlier and decided to
> > solve this using devfreq instead. Don't remember the exact series which got
> > merged for this, Sibi ?
> > 
> > If this part fails, how do you actually set the performance state of the memory
> > controller's genpd ?
> 
> The clock controller has the genpd as an actual power-domain parent, so it
> does it instead. From patch #7:
> 
> > +	if (cluster->has_pd)
> > +		dev_pm_genpd_set_performance_state(cluster->dev,
> > +						   dev_pm_opp_get_required_pstate(opp, 0));
> > +
> 
> This is arguably not entirely representative of how the hardware works,
> since technically the cluster switching couldn't care less what the memory
> controller is doing; it's a soft dependency, states that should be switched
> together but are not interdependent (in fact, the clock code does this
> unconditionally after the CPU p-state change, regardless of whether we're
> shifting up or down; this is, FWIW, the same order macOS uses, and it
> clearly doesn't matter which way you do it).

Yeah, I understand what you are doing. But the current patch is
incorrect in the sense that it can cause a bug on other platforms. To
make this work, you should rather set this genpd as parent of CPU
devices (which are doing anyway since you are updating them with CPU's
DVFS). With that the clk driver won't be required to do the magic
behind the scene.

Hector Martin Oct. 12, 2021, 9:31 a.m. UTC | #7

On 2021年10月12日 18:26:03 JST, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>On 12-10-21, 14:57, Hector Martin wrote:
>> 
>> This is arguably not entirely representative of how the hardware works,
>> since technically the cluster switching couldn't care less what the memory
>> controller is doing; it's a soft dependency, states that should be switched
>> together but are not interdependent (in fact, the clock code does this
>> unconditionally after the CPU p-state change, regardless of whether we're
>> shifting up or down; this is, FWIW, the same order macOS uses, and it
>> clearly doesn't matter which way you do it).
>
>Yeah, I understand what you are doing. But the current patch is
>incorrect in the sense that it can cause a bug on other platforms. To
>make this work, you should rather set this genpd as parent of CPU
>devices (which are doing anyway since you are updating them with CPU's
>DVFS). With that the clk driver won't be required to do the magic
>behind the scene.
>

That doesn't work, though, because the CPUs aren't normal devices with runtime-pm. That was the first thing I tried :).

If you think this *should* be made to work instead then I can try that.

Viresh Kumar Oct. 12, 2021, 9:32 a.m. UTC | #8

On 12-10-21, 18:31, Hector Martin "marcan" wrote:
> That doesn't work, though, because the CPUs aren't normal devices
> with runtime-pm. That was the first thing I tried :).

What's the exact problem with runtime PM here ?

> If you think this *should* be made to work instead then I can try that.

Viresh Kumar Oct. 14, 2021, 6:56 a.m. UTC | #9

On 14-10-21, 15:52, Hector Martin wrote:
> The CPU devices aren't attached to their genpd, so the required OPP
> transition fails with the same error.
> 
> However, this was easier to fix than I expected. With this patch to
> cpufreq-dt, it all works properly, and I can drop the parent genpd
> from the clock node and related handling. Thoughts?
> 
> commit c4f88743374c1f4678ee7f17fb6cae30ded9ed59
> Author: Hector Martin <marcan@marcan.st>
> Date:   Thu Oct 14 15:47:45 2021 +0900
> 
>     cpufreq: dt: Attach CPU devices to power domains
>     This allows the required-opps mechanism to work for CPU OPP tables,
>     triggering specific OPP levels in a parent power domain.
>     Signed-off-by: Hector Martin <marcan@marcan.st>
> 
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index 8fcaba541539..5b22846b557d 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -16,6 +16,7 @@
>  #include <linux/list.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> +#include <linux/pm_domain.h>
>  #include <linux/pm_opp.h>
>  #include <linux/platform_device.h>
>  #include <linux/regulator/consumer.h>
> @@ -264,6 +265,16 @@ static int dt_cpufreq_early_init(struct device *dev, int cpu)
>  		goto out;
>  	}
> +	/*
> +	 * Attach the CPU device to its genpd domain (if any), to allow OPP
> +	 * dependencies to be satisfied.
> +	 */
> +	ret = genpd_dev_pm_attach(cpu_dev);
> +	if (ret <= 0) {
> +		dev_err(cpu_dev, "Failed to attach CPU device to genpd\n");
> +		goto out;
> +	}
> +

Other platform do this from some other place I think.

Ulf, where should this code be moved ? cpu-clk driver ?

Hector Martin Oct. 14, 2021, 6:59 a.m. UTC | #10

On 12/10/2021 18.19, Krzysztof Kozlowski wrote:
>> +// SPDX-License-Identifier: GPL-2.0-only OR MIT
>> +/*
>> + * Apple SoC MCC memory controller performance control driver
>> + *
>> + * Copyright The Asahi Linux Contributors
> 
> Copyright date?

We've gone over this one a few times already; most copyright dates 
quickly become outdated and meaningless :)

See: 
https://www.linuxfoundation.org/blog/copyright-notices-in-open-source-software-projects/

>> +static int apple_mcc_probe(struct platform_device *pdev)
>> +{
>> +	struct device *dev = &pdev->dev;
>> +	struct device_node *node = dev->of_node;
> 
> By convention mostly we call the variable "np".

Ack, I'll change it for v2.

>> +	mcc->reg_base = devm_platform_ioremap_resource(pdev, 0);
>> +	if (IS_ERR(mcc->reg_base))
>> +		return PTR_ERR(mcc->reg_base);
>> +
>> +	if (of_property_read_u32(node, "apple,num-channels", &mcc->num_channels)) {
> 
> Don't you have a limit of supported channels? It cannot be any uint32...

Today, it's max 8. But if come Monday we find out Apple's new chips have 
16 channels and otherwise the same register layout, I'd much rather not 
have to change the driver...

>> +		dev_err(dev, "missing apple,num-channels property\n");
> 
> Use almost everywhere dev_err_probe - less code and you get error msg
> printed.

Heh, I didn't know about that one. Thanks!

>> +
>> +	dev_info(dev, "Apple MCC performance driver initialized\n");
> 
> Please skip it, or at least make it a dev_dbg, you don't print any
> valuable information here.

Ack, I'll remove this.

>> +static struct platform_driver apple_mcc_driver = {
>> +	.probe = apple_mcc_probe,
>> +	.driver = {
>> +		.name = "apple-mcc",
>> +		.of_match_table = apple_mcc_of_match,
>> +	},
>> +};
> 
> module_platform_driver() goes here.

Ack, will fix for v2.

> 
>> +
>> +MODULE_AUTHOR("Hector Martin <marcan@marcan.st>");
>> +MODULE_DESCRIPTION("MCC memory controller performance tuning driver for Apple SoCs");
>> +MODULE_LICENSE("GPL v2");
> 
> I think this will be "Dual MIT/GPL", based on your SPDX.

Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess 
anything containing "GPL" works with EXPORT_SYMBOL_GPL?

Thanks for the review!

Hector Martin Oct. 14, 2021, 7:03 a.m. UTC | #11

On 14/10/2021 15.56, Viresh Kumar wrote:
>> +	/*
>> +	 * Attach the CPU device to its genpd domain (if any), to allow OPP
>> +	 * dependencies to be satisfied.
>> +	 */
>> +	ret = genpd_dev_pm_attach(cpu_dev);
>> +	if (ret <= 0) {
>> +		dev_err(cpu_dev, "Failed to attach CPU device to genpd\n");
>> +		goto out;
>> +	}
>> +
> 
> Other platform do this from some other place I think.
> 
> Ulf, where should this code be moved ? cpu-clk driver ?
> 

I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via 
dev_pm_domain_attach). Though it only does it for CPU#0; we need to do 
it for all CPUs.

Viresh Kumar Oct. 14, 2021, 7:22 a.m. UTC | #12

On 14-10-21, 16:03, Hector Martin wrote:
> On 14/10/2021 15.56, Viresh Kumar wrote:
> > > +	/*
> > > +	 * Attach the CPU device to its genpd domain (if any), to allow OPP
> > > +	 * dependencies to be satisfied.
> > > +	 */
> > > +	ret = genpd_dev_pm_attach(cpu_dev);
> > > +	if (ret <= 0) {
> > > +		dev_err(cpu_dev, "Failed to attach CPU device to genpd\n");
> > > +		goto out;
> > > +	}
> > > +
> > 
> > Other platform do this from some other place I think.
> > 
> > Ulf, where should this code be moved ? cpu-clk driver ?
> > 
> 
> I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via
> dev_pm_domain_attach).

That may be a good place since you are already adding it and it is related to
CPU clk.

> Though it only does it for CPU#0; we need to do it
> for all CPUs.

Sure.

Hector Martin Oct. 14, 2021, 7:23 a.m. UTC | #13

On 14/10/2021 16.03, Hector Martin wrote:
> On 14/10/2021 15.56, Viresh Kumar wrote:
>>> +	/*
>>> +	 * Attach the CPU device to its genpd domain (if any), to allow OPP
>>> +	 * dependencies to be satisfied.
>>> +	 */
>>> +	ret = genpd_dev_pm_attach(cpu_dev);
>>> +	if (ret <= 0) {
>>> +		dev_err(cpu_dev, "Failed to attach CPU device to genpd\n");
>>> +		goto out;
>>> +	}
>>> +
>>
>> Other platform do this from some other place I think.
>>
>> Ulf, where should this code be moved ? cpu-clk driver ?
>>
> 
> I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via
> dev_pm_domain_attach). Though it only does it for CPU#0; we need to do
> it for all CPUs.

Looking into this further, I'm not sure I like the idea of doing this in 
the clocks driver. There might be locking issues since it gets 
instantiated twice and yet doesn't really itself know what subset of 
CPUs it applies to.

There's another driver that does this: 
drivers/cpuidle/cpuidle-psci-domain.c. That one specifically looks for a 
power domain called "psci". Perhaps it would make sense to make this 
generic in cpufreq-dt as per my prior patch, but explicitly request a 
"cpufreq" domain? That way only devicetrees that opt in to having this 
handled by cpufreq by naming it that way would get this behavior.

Krzysztof Kozlowski Oct. 14, 2021, 7:36 a.m. UTC | #14

On 14/10/2021 08:59, Hector Martin wrote:
> On 12/10/2021 18.19, Krzysztof Kozlowski wrote:
>>> +// SPDX-License-Identifier: GPL-2.0-only OR MIT
>>> +/*
>>> + * Apple SoC MCC memory controller performance control driver
>>> + *
>>> + * Copyright The Asahi Linux Contributors
>>
>> Copyright date?
> 
> We've gone over this one a few times already; most copyright dates 
> quickly become outdated and meaningless :)
> 
> See: 
> https://www.linuxfoundation.org/blog/copyright-notices-in-open-source-software-projects/
> 
>>> +static int apple_mcc_probe(struct platform_device *pdev)
>>> +{
>>> +	struct device *dev = &pdev->dev;
>>> +	struct device_node *node = dev->of_node;
>>
>> By convention mostly we call the variable "np".
> 
> Ack, I'll change it for v2.
> 
>>> +	mcc->reg_base = devm_platform_ioremap_resource(pdev, 0);
>>> +	if (IS_ERR(mcc->reg_base))
>>> +		return PTR_ERR(mcc->reg_base);
>>> +
>>> +	if (of_property_read_u32(node, "apple,num-channels", &mcc->num_channels)) {
>>
>> Don't you have a limit of supported channels? It cannot be any uint32...
> 
> Today, it's max 8. But if come Monday we find out Apple's new chips have 
> 16 channels and otherwise the same register layout, I'd much rather not 
> have to change the driver...

OK, however if the driver ever receives different DT with a different
value, it will accept it unconditionally and go via address space. I am
just saying that being conservative on received values is safer, but I
am fine with skipping this problem. At the end we trust DT that it will
always match the kernel, don't we? Oh wait, someone can use DT from
other kernel in this one...

> 
>>> +		dev_err(dev, "missing apple,num-channels property\n");
>>
>> Use almost everywhere dev_err_probe - less code and you get error msg
>> printed.
> 
> Heh, I didn't know about that one. Thanks!
> 
>>> +
>>> +	dev_info(dev, "Apple MCC performance driver initialized\n");
>>
>> Please skip it, or at least make it a dev_dbg, you don't print any
>> valuable information here.
> 
> Ack, I'll remove this.
> 
>>> +static struct platform_driver apple_mcc_driver = {
>>> +	.probe = apple_mcc_probe,
>>> +	.driver = {
>>> +		.name = "apple-mcc",
>>> +		.of_match_table = apple_mcc_of_match,
>>> +	},
>>> +};
>>
>> module_platform_driver() goes here.
> 
> Ack, will fix for v2.
> 
>>
>>> +
>>> +MODULE_AUTHOR("Hector Martin <marcan@marcan.st>");
>>> +MODULE_DESCRIPTION("MCC memory controller performance tuning driver for Apple SoCs");
>>> +MODULE_LICENSE("GPL v2");
>>
>> I think this will be "Dual MIT/GPL", based on your SPDX.
> 
> Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess 
> anything containing "GPL" works with EXPORT_SYMBOL_GPL?

I don't think exporting symbols is related to how you license your code.


Best regards,
Krzysztof

Hector Martin Oct. 14, 2021, 7:52 a.m. UTC | #15

On 14/10/2021 16.36, Krzysztof Kozlowski wrote:
> On 14/10/2021 08:59, Hector Martin wrote:
>>> Don't you have a limit of supported channels? It cannot be any uint32...
>>
>> Today, it's max 8. But if come Monday we find out Apple's new chips have
>> 16 channels and otherwise the same register layout, I'd much rather not
>> have to change the driver...
> 
> OK, however if the driver ever receives different DT with a different
> value, it will accept it unconditionally and go via address space. I am
> just saying that being conservative on received values is safer, but I
> am fine with skipping this problem. At the end we trust DT that it will
> always match the kernel, don't we? Oh wait, someone can use DT from
> other kernel in this one...

DTs using these compatibles should have the same register layout, and 
should work with this driver; if a new chip comes out that has a 
different register layout we will change the compatibles (both) and 
therefore older kernels won't bind at all. If it has the same layout 
we'll keep the base compatible, `reg` will grow as needed to accomodate 
the extra channels, and e.g. num-channels=16 will then just work on 
older kernels with no changes.

Obviously a broken DT with an insane value here would crash the driver, 
but so would any other number of crazy DT things; however, I don't 
expect that to ever happen.

There's also the case where we end up with multiple memory controllers 
at discrete offsets (e.g. rumored multi-die configurations); in that 
case we'll end up with multiple genpd parents and have to add code to 
support that, and in the meantime older kernels will just have broken 
cpufreq on the p-cores. But I think that is ~acceptable as long as the 
system boots; we don't expect to be able to *fully* support newer SoCs 
on older kernels with no code changes. What I'm aiming for is just 
making the system work, hopefully with NVMe and USB and a dumb 
framebuffer, so that distro installers can run and then users can later 
install a proper up to date kernel will full support for the new SoC.

>> Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess
>> anything containing "GPL" works with EXPORT_SYMBOL_GPL?
> 
> I don't think exporting symbols is related to how you license your code.

It is; only modules with a GPL-compatible MODULE_LICENSE get to use 
symbols exported via EXPORT_SYMBOL_GPL.

See kernel/module.c for the symbol lookup logic and 
include/linux/license.h for the logic to check the string (seems like 
"Dual MIT/GPL" is explicitly whitelisted there).

Of course, this is a futile effort, as ~every time I see a proprietary 
module in some embedded device, it either falsely declares itself to be 
GPL, or they have a shim module that re-exports GPL symbols as non-GPL.

Krzysztof Kozlowski Oct. 14, 2021, 8:04 a.m. UTC | #16

On 14/10/2021 09:52, Hector Martin wrote:
> On 14/10/2021 16.36, Krzysztof Kozlowski wrote:

(...)

> 
>>> Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess
>>> anything containing "GPL" works with EXPORT_SYMBOL_GPL?
>>
>> I don't think exporting symbols is related to how you license your code.
> 
> It is; only modules with a GPL-compatible MODULE_LICENSE get to use 
> symbols exported via EXPORT_SYMBOL_GPL.

Although there might be such correlation but it's not a rule. You can
have a GPL module exporting symbols without GPL requirement
(EXPORT_SYMBOLS). You can have a GPL+MIT module exporting symbols as
GPL. Obviously you cannot have a non-GPL module, as we do not accept
these and there is no such choice.

So answering your question that "GPL" works with EXPORT_SYMBOL_GPL -
everything is GPL but it works with both EXPORT_SYMBOL and
EXPORT_SYMBOL_GPL.

> 
> See kernel/module.c for the symbol lookup logic and 
> include/linux/license.h for the logic to check the string (seems like 
> "Dual MIT/GPL" is explicitly whitelisted there).

Not related to export symbol. It is used for determining the tainted
kernel via other licenses.

> 
> Of course, this is a futile effort, as ~every time I see a proprietary 
> module in some embedded device, it either falsely declares itself to be 
> GPL, or they have a shim module that re-exports GPL symbols as non-GPL.
> 

This is being removed soon (or already).

Best regards,
Krzysztof

Hector Martin Oct. 14, 2021, 8:31 a.m. UTC | #17

On 14/10/2021 17.04, Krzysztof Kozlowski wrote:
>> It is; only modules with a GPL-compatible MODULE_LICENSE get to use
>> symbols exported via EXPORT_SYMBOL_GPL.
> 
> Although there might be such correlation but it's not a rule. You can
> have a GPL module exporting symbols without GPL requirement
> (EXPORT_SYMBOLS). You can have a GPL+MIT module exporting symbols as
> GPL. Obviously you cannot have a non-GPL module, as we do not accept
> these and there is no such choice.

What I mean is that modules can only import GPL symbols if they 
themselves are GPL compatible. What I didn't know is that "Dual MIT/GPL" 
is a valid string for MODULE_LICENSE to qualify as such.

>> See kernel/module.c for the symbol lookup logic and
>> include/linux/license.h for the logic to check the string (seems like
>> "Dual MIT/GPL" is explicitly whitelisted there).
> 
> Not related to export symbol. It is used for determining the tainted
> kernel via other licenses.
> 

Not just that; that module taint is used as a filter so that 
non-GPL-compatible modules are technically prevented from resolving 
EXPORT_SYMBOL_GPL symbols.

>> Of course, this is a futile effort, as ~every time I see a proprietary
>> module in some embedded device, it either falsely declares itself to be
>> GPL, or they have a shim module that re-exports GPL symbols as non-GPL.
>>
> 
> This is being removed soon (or already).

? Good luck getting proprietary embedded vendors to start following 
licenses... :)

Ulf Hansson Oct. 14, 2021, 9:55 a.m. UTC | #18

On Tue, 12 Oct 2021 at 07:57, Hector Martin <marcan@marcan.st> wrote:
>
> On 12/10/2021 14.51, Viresh Kumar wrote:
> > On 12-10-21, 14:34, Hector Martin wrote:
> >> The table *is* assigned to a genpd (the memory controller), it's just that
> >> that genpd isn't actually a parent of the CPU device. Without the patch you
> >> end up with:
> >>
> >> [    3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19)
> >> [    3.042881] cpu cpu4: Failed to set required opps: -19
> >> [    3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19
> >
> > Hmm, Saravana and Sibi were working on a similar problem earlier and decided to
> > solve this using devfreq instead. Don't remember the exact series which got
> > merged for this, Sibi ?
> >
> > If this part fails, how do you actually set the performance state of the memory
> > controller's genpd ?
>
> The clock controller has the genpd as an actual power-domain parent, so
> it does it instead. From patch #7:
>
> > +     if (cluster->has_pd)
> > +             dev_pm_genpd_set_performance_state(cluster->dev,
> > +                                                dev_pm_opp_get_required_pstate(opp, 0));
> > +
>
> This is arguably not entirely representative of how the hardware works,
> since technically the cluster switching couldn't care less what the
> memory controller is doing; it's a soft dependency, states that should
> be switched together but are not interdependent (in fact, the clock code
> does this unconditionally after the CPU p-state change, regardless of
> whether we're shifting up or down; this is, FWIW, the same order macOS
> uses, and it clearly doesn't matter which way you do it).

Yes, this sounds like you should move away from modeling the memory
part as a parent genpd for the CPUs' genpd.

As Viresh pointed out, a devfreq driver seems like a better way to do
this. As a matter of fact, there are already devfreq drivers that do
this, unless I am mistaken.

It looks like devfreq providers are listening to opp/cpufreq
notifiers, as to get an indication of when it could make sense to
change a performance state.

In some cases the devfreq provider is also modeled as an interconnect
provider, allowing consumers to specify memory bandwidth constraints,
which may trigger a new performance state to be set for the memory
controller.

In the tegra case, the memory controller is modelled as an
interconnect provider and the devfreq node is modelled as an
interconnect-consumer of the memory controller. Perhaps this can work
for apple SoCs too?

That said, perhaps as an option to move forward, we can try to get the
cpufreq pieces solved first. Then as a step on top, add the
performance scaling for the memory controller?

>
> --
> Hector Martin (marcan@marcan.st)
> Public Key: https://mrcn.st/pub

Kind regards
Uffe

Ulf Hansson Oct. 14, 2021, 11:08 a.m. UTC | #19

On Thu, 14 Oct 2021 at 09:23, Hector Martin <marcan@marcan.st> wrote:
>
> On 14/10/2021 16.03, Hector Martin wrote:
> > On 14/10/2021 15.56, Viresh Kumar wrote:
> >>> +   /*
> >>> +    * Attach the CPU device to its genpd domain (if any), to allow OPP
> >>> +    * dependencies to be satisfied.
> >>> +    */
> >>> +   ret = genpd_dev_pm_attach(cpu_dev);
> >>> +   if (ret <= 0) {
> >>> +           dev_err(cpu_dev, "Failed to attach CPU device to genpd\n");
> >>> +           goto out;
> >>> +   }
> >>> +
> >>
> >> Other platform do this from some other place I think.
> >>
> >> Ulf, where should this code be moved ? cpu-clk driver ?
> >>
> >
> > I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via
> > dev_pm_domain_attach). Though it only does it for CPU#0; we need to do
> > it for all CPUs.
>
> Looking into this further, I'm not sure I like the idea of doing this in
> the clocks driver. There might be locking issues since it gets
> instantiated twice and yet doesn't really itself know what subset of
> CPUs it applies to.

I agree. I suggest you look into using a genpd provider and hook up
all CPU's devices to it. I think that is what Viresh also suggested
earlier - and this makes most sense to me.

As a reference you may have a look at some Qcom platforms that already use this:

arch/arm64/boot/dts/qcom/qcs404.dtsi

drivers/cpufreq/qcom-cpufreq-nvmem.c:
To hook up CPU devices to their PM domains (genpds) - it calls
dev_pm_opp_attach_genpd(), which is a kind of wrapper for
dev_pm_domain_attach_by_name().

drivers/soc/qcom/cpr.c
Registers the genpd provider that is capable of dealing with
performance states/OPPs for CPUs.

>
> There's another driver that does this:
> drivers/cpuidle/cpuidle-psci-domain.c. That one specifically looks for a
> power domain called "psci". Perhaps it would make sense to make this
> generic in cpufreq-dt as per my prior patch, but explicitly request a
> "cpufreq" domain? That way only devicetrees that opt in to having this
> handled by cpufreq by naming it that way would get this behavior.

That sounds like an idea that is worth exploring. In this way, the
only thing that needs to be implemented for new cases would be the
genpd provider driver.

BTW, as you will figure out by looking at the above references, for
the qcom case we are using "cpr" as the domain name for cpufreq. Of
course, that doesn't mean we can use "cpufreq" (or whatever name that
makes sense) going forward for new cases.

>
> --
> Hector Martin (marcan@marcan.st)
> Public Key: https://mrcn.st/pub

Kind regards
Uffe

Hector Martin Oct. 14, 2021, 11:43 a.m. UTC | #20

On 14/10/2021 18.55, Ulf Hansson wrote:
> Yes, this sounds like you should move away from modeling the memory
> part as a parent genpd for the CPUs' genpd.
> 
> As Viresh pointed out, a devfreq driver seems like a better way to do
> this. As a matter of fact, there are already devfreq drivers that do
> this, unless I am mistaken.
> 
> It looks like devfreq providers are listening to opp/cpufreq
> notifiers, as to get an indication of when it could make sense to
> change a performance state.
> 
> In some cases the devfreq provider is also modeled as an interconnect
> provider, allowing consumers to specify memory bandwidth constraints,
> which may trigger a new performance state to be set for the memory
> controller.
> 
> In the tegra case, the memory controller is modelled as an
> interconnect provider and the devfreq node is modelled as an
> interconnect-consumer of the memory controller. Perhaps this can work
> for apple SoCs too?

I was poking around and noticed the OPP core can already integrate with 
interconnect requirements, so perhaps the memory controller can be an 
interconnect provider, and the CPU nodes can directly reference it as a 
consumer? This seems like a more accurate model of what the hardware 
does, and I think I saw some devices doing this already.

(only problem is I have no idea of the actual bandwidth numbers involved 
here... I'll have to run some benchmarks to make sure this isn't just 
completely dummy data)

> 
> That said, perhaps as an option to move forward, we can try to get the
> cpufreq pieces solved first. Then as a step on top, add the
> performance scaling for the memory controller?

Sure; that's a pretty much independent part of this patchset, though I'm 
thinking I might as well try some things out for v2 anyway; if it looks 
like it'll take longer we can split it out and do just the cpufreq side.

Ulf Hansson Oct. 14, 2021, 12:55 p.m. UTC | #21

On Thu, 14 Oct 2021 at 13:43, Hector Martin <marcan@marcan.st> wrote:
>
> On 14/10/2021 18.55, Ulf Hansson wrote:
> > Yes, this sounds like you should move away from modeling the memory
> > part as a parent genpd for the CPUs' genpd.
> >
> > As Viresh pointed out, a devfreq driver seems like a better way to do
> > this. As a matter of fact, there are already devfreq drivers that do
> > this, unless I am mistaken.
> >
> > It looks like devfreq providers are listening to opp/cpufreq
> > notifiers, as to get an indication of when it could make sense to
> > change a performance state.
> >
> > In some cases the devfreq provider is also modeled as an interconnect
> > provider, allowing consumers to specify memory bandwidth constraints,
> > which may trigger a new performance state to be set for the memory
> > controller.
> >
> > In the tegra case, the memory controller is modelled as an
> > interconnect provider and the devfreq node is modelled as an
> > interconnect-consumer of the memory controller. Perhaps this can work
> > for apple SoCs too?
>
> I was poking around and noticed the OPP core can already integrate with
> interconnect requirements, so perhaps the memory controller can be an
> interconnect provider, and the CPU nodes can directly reference it as a
> consumer? This seems like a more accurate model of what the hardware
> does, and I think I saw some devices doing this already.

Yeah, that could work too. And, yes, I agree, it may be a better
description of the HW.

>
> (only problem is I have no idea of the actual bandwidth numbers involved
> here... I'll have to run some benchmarks to make sure this isn't just
> completely dummy data)
>
> >
> > That said, perhaps as an option to move forward, we can try to get the
> > cpufreq pieces solved first. Then as a step on top, add the
> > performance scaling for the memory controller?
>
> Sure; that's a pretty much independent part of this patchset, though I'm
> thinking I might as well try some things out for v2 anyway; if it looks
> like it'll take longer we can split it out and do just the cpufreq side.

In any case, I do my best to help with review.

Kind regards
Uffe

Hector Martin Oct. 14, 2021, 5:02 p.m. UTC | #22

On 14/10/2021 21.55, Ulf Hansson wrote:
> On Thu, 14 Oct 2021 at 13:43, Hector Martin <marcan@marcan.st> wrote:
>> I was poking around and noticed the OPP core can already integrate with
>> interconnect requirements, so perhaps the memory controller can be an
>> interconnect provider, and the CPU nodes can directly reference it as a
>> consumer? This seems like a more accurate model of what the hardware
>> does, and I think I saw some devices doing this already.
> 
> Yeah, that could work too. And, yes, I agree, it may be a better
> description of the HW.
> 
>>
>> (only problem is I have no idea of the actual bandwidth numbers involved
>> here... I'll have to run some benchmarks to make sure this isn't just
>> completely dummy data)
>>

So... I tried getting bandwidth numbers and failed. It seems these 
registers don't actually affect peak performance in any measurable way. 
I'm also getting almost the same GeekBench scores on macOS with and 
without this mechanism enabled, although there is one subtest that seems 
to show a measurable difference.

My current guess is this is something more subtle (latencies? idle 
timers and such?) than a performance state. If that is the case, do you 
have any ideas as to the best way to model it in Linux? Should we even 
bother if it mostly has a minimal performance gain for typical workloads?

I'll try to do some latency tests, see if I can make sense of what it's 
actually doing.

Ulf Hansson Oct. 15, 2021, 11:26 a.m. UTC | #23

On Thu, 14 Oct 2021 at 19:02, Hector Martin <marcan@marcan.st> wrote:
>
> On 14/10/2021 21.55, Ulf Hansson wrote:
> > On Thu, 14 Oct 2021 at 13:43, Hector Martin <marcan@marcan.st> wrote:
> >> I was poking around and noticed the OPP core can already integrate with
> >> interconnect requirements, so perhaps the memory controller can be an
> >> interconnect provider, and the CPU nodes can directly reference it as a
> >> consumer? This seems like a more accurate model of what the hardware
> >> does, and I think I saw some devices doing this already.
> >
> > Yeah, that could work too. And, yes, I agree, it may be a better
> > description of the HW.
> >
> >>
> >> (only problem is I have no idea of the actual bandwidth numbers involved
> >> here... I'll have to run some benchmarks to make sure this isn't just
> >> completely dummy data)
> >>
>
> So... I tried getting bandwidth numbers and failed. It seems these
> registers don't actually affect peak performance in any measurable way.
> I'm also getting almost the same GeekBench scores on macOS with and
> without this mechanism enabled, although there is one subtest that seems
> to show a measurable difference.
>
> My current guess is this is something more subtle (latencies? idle
> timers and such?) than a performance state. If that is the case, do you
> have any ideas as to the best way to model it in Linux? Should we even
> bother if it mostly has a minimal performance gain for typical workloads?

For latency constraints, we have dev_pm_qos. This will make the genpd
governor, to prevent deeper idle states for the device and its
corresponding PM domain (genpd). But that doesn't sound like a good
fit here.

If you are right, it rather sounds like there is some kind of
quiescence mode of the memory controller that can be prevented. But I
have no clue, of course. :-)

>
> I'll try to do some latency tests, see if I can make sense of what it's
> actually doing.
>

Kind regards
Uffe

Hector Martin Oct. 17, 2021, 9:16 a.m. UTC | #24

On 15/10/2021 07.07, Stephen Boyd wrote:
> This looks bad from a locking perspective. How is lockdep holding up
> with this driver? We're underneath the prepare lock here and we're
> setting a couple level registers which is all good but now we're calling
> into genpd code and who knows what's going to happen locking wise.

It seems this is all going away given the other discussion threads point 
towards handling this directly via OPP in the cpufreq-dt driver. I'll 
run whatever I end up with for v2 through lockdep though, good call!

> I don't actually see anything in here that indicates this is supposed to
> be a clk provider. Is it being modeled as a clk so that it can use
> cpufreq-dt? If it was a clk provider I'd expect it to be looking at
> parent clk rates, and reading hardware to calculate frequencies based on
> dividers and multipliers, etc. None of that is happening here.
> 
> Why not write a cpufreq driver, similar to qcom-cpufreq-hw.c that looks
> through the OPP table and then writes the value into the pstate
> registers? The registers in here look awfully similar to the qcom
> hardware. I don't know what the DESIRED1 and DESIRED2 registers are for
> though. Maybe they're so that one or the other frequency can be used if
> available? Like a min/max?
> 
> Either way, writing this as a cpufreq driver avoids the clk framework
> entirely which is super great for me :) It also avoids locking headaches
> from the clk prepare lock, and it also lets you support lockless cpufreq
> transitions by implementing the fast_switch function. I don't see any
> downsides to the cpufreq driver approach.

I wasn't too sure about this approach; I thought using a clk provider 
would end up simplifying things since I could use the cpufreq-dt 
machinery to take care of all the OPP stuff, and a lot of SoCs seemed to 
be going that way, but it seems cpufreq might be a better approach for 
this SoC?

There can only be one cpufreq driver instance, while I used two clock 
controllers to model the two clusters. So in the cpufreq case, the 
driver itself would have to deal with all potential CPU cluster 
instances/combinations itself. Not sure how much more code that will be, 
hopefully not too much...

I see qcom-cpufreq-hw uses a qcom,freq-domain prop to link CPUs to the 
cpufreq domains. cpufreq-dt and vexpress-spc-cpufreq instead use 
dev_pm_opp_get_sharing_cpus to look for shared OPP tables. Is there a 
reason not to do it that way and avoid the vendor prop? I guess the prop 
is more explicit while the sharing approach would have an implicit order 
dependency (i.e. CPUs are always grouped by cluster and clusters are 
listed in /cpus in the same order as in the cpufreq node)...

(Ack on the other comments, but if this becomes a cpufreq driver most of 
it is going to end up rewritten... :))

For the cpufreq case, do you have any suggestions as to how to relate it 
to the memory controller configuration tweaks? Ideally this would go 
through the OPP tables so it can be customized for future SoCs without 
stuff hardcoded in the driver... it seems the configuration affects 
power saving behavior / latencies, so it doesn't quite match the 
interconnect framework bandwidth request stuff. I'm also not sure how 
this would affect fast_switch, since going through those frameworks 
might imply locks... we might even find ourselves with a situation in 
the near future where multiple cpufreq policies can request memory 
controller latency reduction independently; I can come up with how to do 
this locklessly using atomics, but I can't imagine that being workable 
with higher-level frameworks, it would have to be a vendor-specific 
mechanism at that point...

[RFC,0/9] Apple SoC CPU P-state switching

Message

Comments