mbox series

[v4,00/23] ARM: qcom: apq8064: support CPU frequency scaling

Message ID 20230827115033.935089-1-dmitry.baryshkov@linaro.org
Headers show
Series ARM: qcom: apq8064: support CPU frequency scaling | expand

Message

Dmitry Baryshkov Aug. 27, 2023, 11:50 a.m. UTC
Implement CPUFreq support for one of the oldest supported Qualcomm
platforms, APQ8064. Each core has independent power and frequency
control. Additionally the L2 cache is scaled to follow the CPU
frequencies (failure to do so results in strange semi-random crashes).

Core voltage is controlled through the SAW2 devices, one for each core.
The L2 has two regulators, vdd-mem and vdd-dig.

Dependencies: [1], [2].

[1] https://lore.kernel.org/linux-arm-msm/20230827005920.898719-1-dmitry.baryshkov@linaro.org/
[2] https://lore.kernel.org/linux-arm-msm/20230827032803.934819-1-dmitry.baryshkov@linaro.org/

Changes since v3:
- Split the opp and cpufreq patches to a separate series (Viresh)
- Added the cache's select clause (Rob, Krzysztof)
- Fixed $ref to include full path (Krzysztof)
- Renamed opp-table-l2 to just opp-table (Krzysztof)
- Unrolled loops in krait_l2_config_regulators() (Konrad)
- Dropped useless comments from krait-cc DT (Konrad)
- Fixed Votlage comment formatting (Konrad)
- Added empty lines between nvmem cells (Konrad)
- Replaced direct register masks with GENMASK (Konrad)
- Reordered headers in spm.c (Konrad)

Changes since v2:
- Merged basic cpufreq and voltage patches, to make sure that we don't
  undervolt the cores (Konrad)
- Reordered patches pushing voltage constraints early (Konrad)
- Removed KRAIT_NUM_CLOCKS from the kraitcc bindings header (Konrad)
- Rebased on top of PMIC cleanup
- Added missing regulator constraints to apq8064-cm-qs600 and
  apq8064-sony-xperia-lagan-yuga.

Changes since v1:
- Added separate Krait L2 cache device driver
- Moved vdd-mem and vdd-dig scaling to the L2 cache device (Christian,
  Stephen Gerhold)
- Fixed the 'INTERCONNECT' in the guarding define for krait-cc bindings
  (Stephen Boyd)
- Made SAW2's regulator property -> node handling clear (Krzysztof)
- Dropped the 'regulator' property from all SAW2 devices.


Dmitry Baryshkov (23):
  dt-bindings: soc: qcom: merge qcom,saw2.txt into qcom,spm.yaml
  dt-bindings: soc: qcom: qcom,saw2: define optional regulator node
  dt-bindings: clock: qcom,krait-cc: Krait core clock controller
  dt-bindings: cache: describe L2 cache on Qualcomm Krait platforms
  interconnect: icc-clk: add support for scaling using OPP
  clk: qcom: krait-cc: rewrite driver to use clk_hw instead of clk
  soc: qcom: spm: add support for voltage regulator
  soc: qcom: Add driver for Qualcomm Krait L2 cache scaling
  ARM: dts: qcom: apq8064-asus-nexus7-flo: constraint cpufreq regulators
  ARM: dts: qcom: apq8064-cm-qs600: constraint cpufreq regulators
  ARM: dts: qcom: apq8064-ifc6410: constraint cpufreq regulators
  ARM: dts: qcom: apq8064-sony-xperia-lagan-yuga: constraint cpufreq
    regulators
  ARM: dts: qcom: apq8064: rename SAW nodes to power-manager
  ARM: dts: qcom: apq8064: declare SAW2 regulators
  ARM: dts: qcom: apq8064: add Krait clock controller
  ARM: dts: qcom: apq8064: add L2 cache scaling
  ARM: dts: qcom: apq8064: add simple CPUFreq support
  ARM: dts: qcom: apq8064: enable passive CPU cooling
  ARM: dts: qcom: msm8960: declare SAW2 regulators
  ARM: dts: qcom: apq8084: drop 'regulator' property from SAW2 device
  ARM: dts: qcom: msm8974: drop 'regulator' property from SAW2 device
  ARM: dts: qcom: ipq4019: drop 'regulator' property from SAW2 devices
  ARM: dts: qcom: ipq8064: drop 'regulator' property from SAW2 devices

 .../devicetree/bindings/arm/msm/qcom,saw2.txt |  58 --
 .../bindings/cache/qcom,krait-l2-cache.yaml   |  86 +++
 .../qcom/{qcom,spm.yaml => qcom,saw2.yaml}    |  39 +-
 .../dts/qcom/qcom-apq8064-asus-nexus7-flo.dts |  19 +-
 .../boot/dts/qcom/qcom-apq8064-cm-qs600.dts   |  23 +-
 .../boot/dts/qcom/qcom-apq8064-ifc6410.dts    |  23 +-
 .../qcom-apq8064-sony-xperia-lagan-yuga.dts   |  13 +-
 arch/arm/boot/dts/qcom/qcom-apq8064.dtsi      | 670 +++++++++++++++++-
 arch/arm/boot/dts/qcom/qcom-apq8084.dtsi      |   1 -
 arch/arm/boot/dts/qcom/qcom-ipq4019.dtsi      |   5 -
 arch/arm/boot/dts/qcom/qcom-ipq8064.dtsi      |   2 -
 arch/arm/boot/dts/qcom/qcom-msm8960.dtsi      |  12 +-
 arch/arm/boot/dts/qcom/qcom-msm8974.dtsi      |   1 -
 drivers/clk/qcom/krait-cc.c                   | 141 ++--
 drivers/interconnect/icc-clk.c                |  13 +-
 drivers/soc/qcom/Kconfig                      |   9 +
 drivers/soc/qcom/Makefile                     |   1 +
 drivers/soc/qcom/krait-l2-cache.c             | 160 +++++
 drivers/soc/qcom/spm.c                        | 221 +++++-
 include/dt-bindings/clock/qcom,krait-cc.h     |  15 +
 include/dt-bindings/soc/qcom,krait-l2-cache.h |  12 +
 include/linux/interconnect-clk.h              |   1 +
 include/soc/qcom/spm.h                        |   9 +
 23 files changed, 1357 insertions(+), 177 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/arm/msm/qcom,saw2.txt
 create mode 100644 Documentation/devicetree/bindings/cache/qcom,krait-l2-cache.yaml
 rename Documentation/devicetree/bindings/soc/qcom/{qcom,spm.yaml => qcom,saw2.yaml} (57%)
 create mode 100644 drivers/soc/qcom/krait-l2-cache.c
 create mode 100644 include/dt-bindings/clock/qcom,krait-cc.h
 create mode 100644 include/dt-bindings/soc/qcom,krait-l2-cache.h

Comments

Konrad Dybcio Aug. 28, 2023, 10:49 a.m. UTC | #1
On 27.08.2023 13:50, Dmitry Baryshkov wrote:
> The SPM / SAW2 device also provides a voltage regulator functionality
> with optional AVS (Adaptive Voltage Scaling) support. The exact register
> sequence and voltage ranges differs from device to device.
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
Konrad Dybcio Aug. 28, 2023, 10:49 a.m. UTC | #2
On 27.08.2023 13:50, Dmitry Baryshkov wrote:
> Add a simple driver that handles scaling of L2 frequency and voltages.
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
[...]

> +MODULE_DESCRIPTION("Qualcomm Krait L2 scaling driver");
> +MODULE_LICENSE("GPL v2");
Checkpatch?

Konrad
Konrad Dybcio Aug. 28, 2023, 10:54 a.m. UTC | #3
On 27.08.2023 13:50, Dmitry Baryshkov wrote:
> Add device node for the clock controller for the CPU cores and L2
> clocks. It will be further used by the L2 and by the CPUfreq nodes.
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  arch/arm/boot/dts/qcom/qcom-apq8064.dtsi | 26 ++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi b/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi
> index ba7d5ef8de17..a05e64bff07f 100644
> --- a/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi
> +++ b/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi
> @@ -213,6 +213,32 @@ sleep_clk: sleep_clk {
>  		};
>  	};
>  
> +	kraitcc: clock-controller {
> +		compatible = "qcom,krait-cc-v1";
> +		clocks = <&gcc PLL9>,
> +			 <&gcc PLL10>,
> +			 <&gcc PLL16>,
> +			 <&gcc PLL17>,
> +			 <&gcc PLL12>,
> +			 <&acc0>,
> +			 <&acc1>,
> +			 <&acc2>,
> +			 <&acc3>,
> +			 <&l2cc>;
> +		clock-names = "hfpll0",
> +			      "hfpll1",
> +			      "hfpll2",
> +			      "hfpll3",
> +			      "hfpll_l2",
> +			      "acpu0_aux",
> +			      "acpu1_aux",
> +			      "acpu2_aux",
> +			      "acpu3_aux",
> +			      "acpu_l2_aux";
> +		#clock-cells = <1>;
> +		#interconnect-cells = <1>;
Doesn't only the L2 device register with icc?

Konrad
Konrad Dybcio Aug. 28, 2023, 10:55 a.m. UTC | #4
On 27.08.2023 13:50, Dmitry Baryshkov wrote:
> Populate L2 cache node with clock, supplies and OPP information to
> facilitate scaling L2 frequency.
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
Konrad Dybcio Aug. 28, 2023, 10:56 a.m. UTC | #5
On 27.08.2023 13:50, Dmitry Baryshkov wrote:
> Declare CPU frequency-scaling properties. Each CPU has its own clock,
> how all CPUs have the same OPP table. Voltage scaling is not (yet)
> enabled with this patch. It will be enabled later.
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
Dmitry Baryshkov Aug. 28, 2023, 11:38 a.m. UTC | #6
On Mon, 28 Aug 2023 at 13:54, Konrad Dybcio <konrad.dybcio@linaro.org> wrote:
>
> On 27.08.2023 13:50, Dmitry Baryshkov wrote:
> > Add device node for the clock controller for the CPU cores and L2
> > clocks. It will be further used by the L2 and by the CPUfreq nodes.
> >
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> >  arch/arm/boot/dts/qcom/qcom-apq8064.dtsi | 26 ++++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi b/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi
> > index ba7d5ef8de17..a05e64bff07f 100644
> > --- a/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi
> > +++ b/arch/arm/boot/dts/qcom/qcom-apq8064.dtsi
> > @@ -213,6 +213,32 @@ sleep_clk: sleep_clk {
> >               };
> >       };
> >
> > +     kraitcc: clock-controller {
> > +             compatible = "qcom,krait-cc-v1";
> > +             clocks = <&gcc PLL9>,
> > +                      <&gcc PLL10>,
> > +                      <&gcc PLL16>,
> > +                      <&gcc PLL17>,
> > +                      <&gcc PLL12>,
> > +                      <&acc0>,
> > +                      <&acc1>,
> > +                      <&acc2>,
> > +                      <&acc3>,
> > +                      <&l2cc>;
> > +             clock-names = "hfpll0",
> > +                           "hfpll1",
> > +                           "hfpll2",
> > +                           "hfpll3",
> > +                           "hfpll_l2",
> > +                           "acpu0_aux",
> > +                           "acpu1_aux",
> > +                           "acpu2_aux",
> > +                           "acpu3_aux",
> > +                           "acpu_l2_aux";
> > +             #clock-cells = <1>;
> > +             #interconnect-cells = <1>;
> Doesn't only the L2 device register with icc?

True. I'll drop this



--
With best wishes
Dmitry
Dmitry Baryshkov Aug. 28, 2023, 9:18 p.m. UTC | #7
On Mon, 28 Aug 2023 at 21:10, Stephen Boyd <sboyd@kernel.org> wrote:
>
> Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
> > diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
> > index d787f2ea36d9..45ffb068979d 100644
> > --- a/drivers/interconnect/icc-clk.c
> > +++ b/drivers/interconnect/icc-clk.c
> > @@ -25,12 +28,16 @@ struct icc_clk_provider {
> >  static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
> >  {
> >         struct icc_clk_node *qn = src->data;
> > +       unsigned long rate = icc_units_to_bps(src->peak_bw);
> >         int ret;
> >
> >         if (!qn || !qn->clk)
> >                 return 0;
> >
> > -       if (!src->peak_bw) {
> > +       if (qn->opp)
> > +               return dev_pm_opp_set_rate(qn->dev, rate);
>
> Just curious how does lockdep do with this? Doesn't OPP call into
> interconnect code, so lockdep will complain about ABBA?

Interesting question. I should check it.
Dmitry Baryshkov Oct. 3, 2023, 8:30 a.m. UTC | #8
On 28/08/2023 21:09, Stephen Boyd wrote:
> Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
>> diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
>> index d787f2ea36d9..45ffb068979d 100644
>> --- a/drivers/interconnect/icc-clk.c
>> +++ b/drivers/interconnect/icc-clk.c
>> @@ -25,12 +28,16 @@ struct icc_clk_provider {
>>   static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
>>   {
>>          struct icc_clk_node *qn = src->data;
>> +       unsigned long rate = icc_units_to_bps(src->peak_bw);
>>          int ret;
>>   
>>          if (!qn || !qn->clk)
>>                  return 0;
>>   
>> -       if (!src->peak_bw) {
>> +       if (qn->opp)
>> +               return dev_pm_opp_set_rate(qn->dev, rate);
> 
> Just curious how does lockdep do with this? Doesn't OPP call into
> interconnect code, so lockdep will complain about ABBA?

Unfortunately it does. It seems, the icc-clk is not a proper way to go 
here. I will take a look at reusing set_required_opps for this case.
Stephan Gerhold Oct. 3, 2023, 1:01 p.m. UTC | #9
On Tue, Oct 03, 2023 at 11:30:28AM +0300, Dmitry Baryshkov wrote:
> On 28/08/2023 21:09, Stephen Boyd wrote:
> > Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
> > > diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
> > > index d787f2ea36d9..45ffb068979d 100644
> > > --- a/drivers/interconnect/icc-clk.c
> > > +++ b/drivers/interconnect/icc-clk.c
> > > @@ -25,12 +28,16 @@ struct icc_clk_provider {
> > >   static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
> > >   {
> > >          struct icc_clk_node *qn = src->data;
> > > +       unsigned long rate = icc_units_to_bps(src->peak_bw);
> > >          int ret;
> > >          if (!qn || !qn->clk)
> > >                  return 0;
> > > -       if (!src->peak_bw) {
> > > +       if (qn->opp)
> > > +               return dev_pm_opp_set_rate(qn->dev, rate);
> > 
> > Just curious how does lockdep do with this? Doesn't OPP call into
> > interconnect code, so lockdep will complain about ABBA?
> 
> Unfortunately it does. It seems, the icc-clk is not a proper way to go here.
> I will take a look at reusing set_required_opps for this case.
> 

Could you elaborate a bit which locks exactly cause trouble here?
I'm probably missing something here.

From a quick look at the OPP code I don't see a global lock taken there
for the entire OPP switch sequence, so I'm not sure how this could cause
an ABBA deadlock.

Thanks,
Stephan
Dmitry Baryshkov Oct. 3, 2023, 1:36 p.m. UTC | #10
On Tue, 3 Oct 2023 at 16:02, Stephan Gerhold <stephan@gerhold.net> wrote:
>
> On Tue, Oct 03, 2023 at 11:30:28AM +0300, Dmitry Baryshkov wrote:
> > On 28/08/2023 21:09, Stephen Boyd wrote:
> > > Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
> > > > diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
> > > > index d787f2ea36d9..45ffb068979d 100644
> > > > --- a/drivers/interconnect/icc-clk.c
> > > > +++ b/drivers/interconnect/icc-clk.c
> > > > @@ -25,12 +28,16 @@ struct icc_clk_provider {
> > > >   static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
> > > >   {
> > > >          struct icc_clk_node *qn = src->data;
> > > > +       unsigned long rate = icc_units_to_bps(src->peak_bw);
> > > >          int ret;
> > > >          if (!qn || !qn->clk)
> > > >                  return 0;
> > > > -       if (!src->peak_bw) {
> > > > +       if (qn->opp)
> > > > +               return dev_pm_opp_set_rate(qn->dev, rate);
> > >
> > > Just curious how does lockdep do with this? Doesn't OPP call into
> > > interconnect code, so lockdep will complain about ABBA?
> >
> > Unfortunately it does. It seems, the icc-clk is not a proper way to go here.
> > I will take a look at reusing set_required_opps for this case.
> >
>
> Could you elaborate a bit which locks exactly cause trouble here?
> I'm probably missing something here.
>
> From a quick look at the OPP code I don't see a global lock taken there
> for the entire OPP switch sequence, so I'm not sure how this could cause
> an ABBA deadlock.

For example:

[    7.680041] Chain exists of:
[    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
[    7.680041]
[    7.687955]  Possible unsafe locking scenario:
[    7.687955]
[    7.699039]        CPU0                    CPU1
[    7.704752]        ----                    ----
[    7.709266]   lock(fs_reclaim);
[    7.713779]                                lock(regulator_ww_class_acquire);
[    7.716919]                                lock(fs_reclaim);
[    7.724204]   lock(icc_bw_lock);
Stephan Gerhold Oct. 3, 2023, 2:17 p.m. UTC | #11
On Tue, Oct 03, 2023 at 04:36:11PM +0300, Dmitry Baryshkov wrote:
> On Tue, 3 Oct 2023 at 16:02, Stephan Gerhold <stephan@gerhold.net> wrote:
> >
> > On Tue, Oct 03, 2023 at 11:30:28AM +0300, Dmitry Baryshkov wrote:
> > > On 28/08/2023 21:09, Stephen Boyd wrote:
> > > > Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
> > > > > diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
> > > > > index d787f2ea36d9..45ffb068979d 100644
> > > > > --- a/drivers/interconnect/icc-clk.c
> > > > > +++ b/drivers/interconnect/icc-clk.c
> > > > > @@ -25,12 +28,16 @@ struct icc_clk_provider {
> > > > >   static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
> > > > >   {
> > > > >          struct icc_clk_node *qn = src->data;
> > > > > +       unsigned long rate = icc_units_to_bps(src->peak_bw);
> > > > >          int ret;
> > > > >          if (!qn || !qn->clk)
> > > > >                  return 0;
> > > > > -       if (!src->peak_bw) {
> > > > > +       if (qn->opp)
> > > > > +               return dev_pm_opp_set_rate(qn->dev, rate);
> > > >
> > > > Just curious how does lockdep do with this? Doesn't OPP call into
> > > > interconnect code, so lockdep will complain about ABBA?
> > >
> > > Unfortunately it does. It seems, the icc-clk is not a proper way to go here.
> > > I will take a look at reusing set_required_opps for this case.
> > >
> >
> > Could you elaborate a bit which locks exactly cause trouble here?
> > I'm probably missing something here.
> >
> > From a quick look at the OPP code I don't see a global lock taken there
> > for the entire OPP switch sequence, so I'm not sure how this could cause
> > an ABBA deadlock.
> 
> For example:
> 
> [    7.680041] Chain exists of:
> [    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
> [    7.680041]
> [    7.687955]  Possible unsafe locking scenario:
> [    7.687955]
> [    7.699039]        CPU0                    CPU1
> [    7.704752]        ----                    ----
> [    7.709266]   lock(fs_reclaim);
> [    7.713779]                                lock(regulator_ww_class_acquire);
> [    7.716919]                                lock(fs_reclaim);
> [    7.724204]   lock(icc_bw_lock);
> 

Hm, I'm not entirely sure how to interpret this. Is there really a
connection between OPP and ICC here? It looks more like ICC <->
regulator and somehow related to memory allocations (the fs_reclaim).

Was there more output around this?

In general, I would expect that adjusting a regulator from an
interconnect driver should be made possible somehow. Just because the
RPM firmware or similar typically handles this internally on Qualcomm
platforms doesn't mean no one else will ever need to do this. If you
have a higher bandwidth requests, need to increase the clock, which
again depends on a higher voltage, then you need to be able to do the
regulator_set_voltage() from the ICC driver somehow.

Thanks,
Stephan
Dmitry Baryshkov Oct. 3, 2023, 3:31 p.m. UTC | #12
On Tue, 3 Oct 2023 at 17:17, Stephan Gerhold <stephan@gerhold.net> wrote:
>
> On Tue, Oct 03, 2023 at 04:36:11PM +0300, Dmitry Baryshkov wrote:
> > On Tue, 3 Oct 2023 at 16:02, Stephan Gerhold <stephan@gerhold.net> wrote:
> > >
> > > On Tue, Oct 03, 2023 at 11:30:28AM +0300, Dmitry Baryshkov wrote:
> > > > On 28/08/2023 21:09, Stephen Boyd wrote:
> > > > > Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
> > > > > > diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
> > > > > > index d787f2ea36d9..45ffb068979d 100644
> > > > > > --- a/drivers/interconnect/icc-clk.c
> > > > > > +++ b/drivers/interconnect/icc-clk.c
> > > > > > @@ -25,12 +28,16 @@ struct icc_clk_provider {
> > > > > >   static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
> > > > > >   {
> > > > > >          struct icc_clk_node *qn = src->data;
> > > > > > +       unsigned long rate = icc_units_to_bps(src->peak_bw);
> > > > > >          int ret;
> > > > > >          if (!qn || !qn->clk)
> > > > > >                  return 0;
> > > > > > -       if (!src->peak_bw) {
> > > > > > +       if (qn->opp)
> > > > > > +               return dev_pm_opp_set_rate(qn->dev, rate);
> > > > >
> > > > > Just curious how does lockdep do with this? Doesn't OPP call into
> > > > > interconnect code, so lockdep will complain about ABBA?
> > > >
> > > > Unfortunately it does. It seems, the icc-clk is not a proper way to go here.
> > > > I will take a look at reusing set_required_opps for this case.
> > > >
> > >
> > > Could you elaborate a bit which locks exactly cause trouble here?
> > > I'm probably missing something here.
> > >
> > > From a quick look at the OPP code I don't see a global lock taken there
> > > for the entire OPP switch sequence, so I'm not sure how this could cause
> > > an ABBA deadlock.
> >
> > For example:
> >
> > [    7.680041] Chain exists of:
> > [    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
> > [    7.680041]
> > [    7.687955]  Possible unsafe locking scenario:
> > [    7.687955]
> > [    7.699039]        CPU0                    CPU1
> > [    7.704752]        ----                    ----
> > [    7.709266]   lock(fs_reclaim);
> > [    7.713779]                                lock(regulator_ww_class_acquire);
> > [    7.716919]                                lock(fs_reclaim);
> > [    7.724204]   lock(icc_bw_lock);
> >
>
> Hm, I'm not entirely sure how to interpret this. Is there really a
> connection between OPP and ICC here? It looks more like ICC <->
> regulator and somehow related to memory allocations (the fs_reclaim).
>
> Was there more output around this?

[    7.362902] ======================================================
[    7.363447] WARNING: possible circular locking dependency detected
[    7.369434] 6.6.0-rc3-next-20230929-08870-g49503b4b9f55 #129 Not tainted
[    7.375604] ------------------------------------------------------
[    7.382453] swapper/0/1 is trying to acquire lock:
[    7.388437] c143445c (icc_bw_lock){+.+.}-{3:3}, at: icc_init+0x0/0x104
[    7.393225]
[    7.393225] but task is already holding lock:
[    7.399730] c1397444 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x18/0x104
[    7.405547]
[    7.405547] which lock already depends on the new lock.
[    7.405547]
[    7.412077]
[    7.412077] the existing dependency chain (in reverse order) is:
[    7.420310]
[    7.420310] -> #2 (fs_reclaim){+.+.}-{0:0}:
[    7.427767]        fs_reclaim_acquire+0x60/0x94
[    7.433492]        __kmem_cache_alloc_node+0x30/0x2b4
[    7.437742]        __kmalloc_node_track_caller+0x48/0x244
[    7.442954]        kstrdup+0x30/0x58
[    7.448325]        create_regulator+0x70/0x2b8
[    7.451981]        regulator_resolve_supply+0x3bc/0x7f4
[    7.456234]        regulator_register+0x9b0/0xb50
[    7.461352]        devm_regulator_register+0x50/0x8c
[    7.466216]        rpm_reg_probe+0xf4/0x1a0
[    7.471244]        platform_probe+0x5c/0xb0
[    7.475157]        really_probe+0xc8/0x2d8
[    7.479318]        __driver_probe_device+0x88/0x1a0
[    7.483488]        driver_probe_device+0x30/0x108
[    7.488262]        __driver_attach_async_helper+0x4c/0xa0
[    7.493133]        async_run_entry_fn+0x24/0xb0
[    7.498504]        process_one_work+0x208/0x5e4
[    7.502844]        worker_thread+0x1e0/0x4a4
[    7.507356]        kthread+0x100/0x120
[    7.511874]        ret_from_fork+0x14/0x28
[    7.515428]
[    7.515428] -> #1 (regulator_ww_class_acquire){+.+.}-{0:0}:
[    7.519528]        lock_release+0x164/0x340
[    7.526713]        __mutex_unlock_slowpath+0x3c/0x2fc
[    7.530626]        regulator_lock_dependent+0x1a4/0x298
[    7.535839]        regulator_set_voltage+0x30/0xfc
[    7.540866]        krait_l2_set_one_supply+0x28/0x84
[    7.545729]        krait_l2_config_regulators+0x90/0x1c4
[    7.550851]        _set_opp+0xf0/0x438
[    7.556144]        dev_pm_opp_set_rate+0x118/0x224
[    7.559703]        icc_node_add+0xe8/0x15c
[    7.564474]        icc_clk_register+0x150/0x208
[    7.568556]        krait_l2_probe+0x100/0x114
[    7.572988]        platform_probe+0x5c/0xb0
[    7.577495]        really_probe+0xc8/0x2d8
[    7.581487]        __driver_probe_device+0x88/0x1a0
[    7.585658]        driver_probe_device+0x30/0x108
[    7.590433]        __device_attach_driver+0x94/0x10c
[    7.595300]        bus_for_each_drv+0x84/0xd8
[    7.600326]        __device_attach+0xac/0x1a8
[    7.604580]        bus_probe_device+0x8c/0x90
[    7.608919]        device_add+0x5c8/0x7a4
[    7.613265]        of_platform_device_create_pdata+0x90/0xbc
[    7.617260]        qcom_cpufreq_init+0xa8/0x130
[    7.622984]        do_one_initcall+0x68/0x2d8
[    7.627235]        kernel_init_freeable+0x1ac/0x208
[    7.631752]        kernel_init+0x18/0x12c
[    7.636441]        ret_from_fork+0x14/0x28
[    7.640602]
[    7.640602] -> #0 (icc_bw_lock){+.+.}-{3:3}:
[    7.644607]        __lock_acquire+0x1530/0x28fc
[    7.650413]        lock_acquire+0x10c/0x33c
[    7.654757]        icc_init+0x40/0x104
[    7.658917]        do_one_initcall+0x68/0x2d8
[    7.662740]        kernel_init_freeable+0x1ac/0x208
[    7.667168]        kernel_init+0x18/0x12c
[    7.671855]        ret_from_fork+0x14/0x28
[    7.676018]
[    7.676018] other info that might help us debug this:
[    7.676018]
[    7.680041] Chain exists of:
[    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
[    7.680041]
[    7.687955]  Possible unsafe locking scenario:
[    7.687955]
[    7.699039]        CPU0                    CPU1
[    7.704752]        ----                    ----
[    7.709266]   lock(fs_reclaim);
[    7.713779]                                lock(regulator_ww_class_acquire);
[    7.716919]                                lock(fs_reclaim);
[    7.724204]   lock(icc_bw_lock);
[    7.729833]
[    7.729833]  *** DEADLOCK ***
[    7.729833]
[    7.733071] 1 lock held by swapper/0/1:
[    7.738691]  #0: c1397444 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x18/0x104
[    7.742528]
[    7.742528] stack backtrace:
[    7.749463] CPU: 3 PID: 1 Comm: swapper/0 Not tainted
6.6.0-rc3-next-20230929-08870-g49503b4b9f55 #129
[    7.754010] Hardware name: Generic DT based system
[    7.763183]  unwind_backtrace from show_stack+0x10/0x14
[    7.767957]  show_stack from dump_stack_lvl+0x60/0x90
[    7.773082]  dump_stack_lvl from check_noncircular+0x174/0x18c
[    7.778288]  check_noncircular from __lock_acquire+0x1530/0x28fc
[    7.784018]  __lock_acquire from lock_acquire+0x10c/0x33c
[    7.790178]  lock_acquire from icc_init+0x40/0x104
[    7.795475]  icc_init from do_one_initcall+0x68/0x2d8
[    7.800159]  do_one_initcall from kernel_init_freeable+0x1ac/0x208
[    7.805286]  kernel_init_freeable from kernel_init+0x18/0x12c
[    7.811361]  kernel_init from ret_from_fork+0x14/0x28
[    7.817177] Exception stack(0xf081dfb0 to 0xf081dff8)

> In general, I would expect that adjusting a regulator from an
> interconnect driver should be made possible somehow. Just because the
> RPM firmware or similar typically handles this internally on Qualcomm
> platforms doesn't mean no one else will ever need to do this. If you
> have a higher bandwidth requests, need to increase the clock, which
> again depends on a higher voltage, then you need to be able to do the
> regulator_set_voltage() from the ICC driver somehow.

This kind of dependency is handled by the consumer, not by the ICC
driver. Usually we explicitly put them to the opp tables, matching
rates and volage.

I think I'll still check the required-opps approach. It will require
manually aggregating the L2 rate requirements, but then we will be
safe from such kinds of interdependencies. And maybe then we can also
rewrite msm8996 to use required-opps too.
Stephan Gerhold Oct. 3, 2023, 4:04 p.m. UTC | #13
On Tue, Oct 03, 2023 at 06:31:27PM +0300, Dmitry Baryshkov wrote:
> On Tue, 3 Oct 2023 at 17:17, Stephan Gerhold <stephan@gerhold.net> wrote:
> > On Tue, Oct 03, 2023 at 04:36:11PM +0300, Dmitry Baryshkov wrote:
> > > On Tue, 3 Oct 2023 at 16:02, Stephan Gerhold <stephan@gerhold.net> wrote:
> > > > On Tue, Oct 03, 2023 at 11:30:28AM +0300, Dmitry Baryshkov wrote:
> > > > > On 28/08/2023 21:09, Stephen Boyd wrote:
> > > > > > Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
> > > > > > > diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
> > > > > > > index d787f2ea36d9..45ffb068979d 100644
> > > > > > > --- a/drivers/interconnect/icc-clk.c
> > > > > > > +++ b/drivers/interconnect/icc-clk.c
> > > > > > > @@ -25,12 +28,16 @@ struct icc_clk_provider {
> > > > > > >   static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
> > > > > > >   {
> > > > > > >          struct icc_clk_node *qn = src->data;
> > > > > > > +       unsigned long rate = icc_units_to_bps(src->peak_bw);
> > > > > > >          int ret;
> > > > > > >          if (!qn || !qn->clk)
> > > > > > >                  return 0;
> > > > > > > -       if (!src->peak_bw) {
> > > > > > > +       if (qn->opp)
> > > > > > > +               return dev_pm_opp_set_rate(qn->dev, rate);
> > > > > >
> > > > > > Just curious how does lockdep do with this? Doesn't OPP call into
> > > > > > interconnect code, so lockdep will complain about ABBA?
> > > > >
> > > > > Unfortunately it does. It seems, the icc-clk is not a proper way to go here.
> > > > > I will take a look at reusing set_required_opps for this case.
> > > > >
> > > >
> > > > Could you elaborate a bit which locks exactly cause trouble here?
> > > > I'm probably missing something here.
> > > >
> > > > From a quick look at the OPP code I don't see a global lock taken there
> > > > for the entire OPP switch sequence, so I'm not sure how this could cause
> > > > an ABBA deadlock.
> > >
> > > For example:
> > >
> > > [    7.680041] Chain exists of:
> > > [    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
> > > [    7.680041]
> > > [    7.687955]  Possible unsafe locking scenario:
> > > [    7.687955]
> > > [    7.699039]        CPU0                    CPU1
> > > [    7.704752]        ----                    ----
> > > [    7.709266]   lock(fs_reclaim);
> > > [    7.713779]                                lock(regulator_ww_class_acquire);
> > > [    7.716919]                                lock(fs_reclaim);
> > > [    7.724204]   lock(icc_bw_lock);
> > >
> >
> > Hm, I'm not entirely sure how to interpret this. Is there really a
> > connection between OPP and ICC here? It looks more like ICC <->
> > regulator and somehow related to memory allocations (the fs_reclaim).
> >
> > Was there more output around this?
> 
> [    7.362902] ======================================================
> [    7.363447] WARNING: possible circular locking dependency detected
> [    7.369434] 6.6.0-rc3-next-20230929-08870-g49503b4b9f55 #129 Not tainted
> [    7.375604] ------------------------------------------------------
> [    7.382453] swapper/0/1 is trying to acquire lock:
> [    7.388437] c143445c (icc_bw_lock){+.+.}-{3:3}, at: icc_init+0x0/0x104
> [    7.393225]
> [    7.393225] but task is already holding lock:
> [    7.399730] c1397444 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x18/0x104
> [    7.405547]
> [    7.405547] which lock already depends on the new lock.
> [    7.405547]
> [    7.412077]
> [    7.412077] the existing dependency chain (in reverse order) is:
> [    7.420310]
> [    7.420310] -> #2 (fs_reclaim){+.+.}-{0:0}:
> [    7.427767]        fs_reclaim_acquire+0x60/0x94
> [    7.433492]        __kmem_cache_alloc_node+0x30/0x2b4
> [    7.437742]        __kmalloc_node_track_caller+0x48/0x244
> [    7.442954]        kstrdup+0x30/0x58
> [    7.448325]        create_regulator+0x70/0x2b8
> [    7.451981]        regulator_resolve_supply+0x3bc/0x7f4
> [    7.456234]        regulator_register+0x9b0/0xb50
> [    7.461352]        devm_regulator_register+0x50/0x8c
> [    7.466216]        rpm_reg_probe+0xf4/0x1a0
> [    7.471244]        platform_probe+0x5c/0xb0
> [    7.475157]        really_probe+0xc8/0x2d8
> [    7.479318]        __driver_probe_device+0x88/0x1a0
> [    7.483488]        driver_probe_device+0x30/0x108
> [    7.488262]        __driver_attach_async_helper+0x4c/0xa0
> [    7.493133]        async_run_entry_fn+0x24/0xb0
> [    7.498504]        process_one_work+0x208/0x5e4
> [    7.502844]        worker_thread+0x1e0/0x4a4
> [    7.507356]        kthread+0x100/0x120
> [    7.511874]        ret_from_fork+0x14/0x28
> [    7.515428]
> [    7.515428] -> #1 (regulator_ww_class_acquire){+.+.}-{0:0}:
> [    7.519528]        lock_release+0x164/0x340
> [    7.526713]        __mutex_unlock_slowpath+0x3c/0x2fc
> [    7.530626]        regulator_lock_dependent+0x1a4/0x298
> [    7.535839]        regulator_set_voltage+0x30/0xfc
> [    7.540866]        krait_l2_set_one_supply+0x28/0x84
> [    7.545729]        krait_l2_config_regulators+0x90/0x1c4
> [    7.550851]        _set_opp+0xf0/0x438
> [    7.556144]        dev_pm_opp_set_rate+0x118/0x224
> [    7.559703]        icc_node_add+0xe8/0x15c
> [    7.564474]        icc_clk_register+0x150/0x208
> [    7.568556]        krait_l2_probe+0x100/0x114
> [    7.572988]        platform_probe+0x5c/0xb0
> [    7.577495]        really_probe+0xc8/0x2d8
> [    7.581487]        __driver_probe_device+0x88/0x1a0
> [    7.585658]        driver_probe_device+0x30/0x108
> [    7.590433]        __device_attach_driver+0x94/0x10c
> [    7.595300]        bus_for_each_drv+0x84/0xd8
> [    7.600326]        __device_attach+0xac/0x1a8
> [    7.604580]        bus_probe_device+0x8c/0x90
> [    7.608919]        device_add+0x5c8/0x7a4
> [    7.613265]        of_platform_device_create_pdata+0x90/0xbc
> [    7.617260]        qcom_cpufreq_init+0xa8/0x130
> [    7.622984]        do_one_initcall+0x68/0x2d8
> [    7.627235]        kernel_init_freeable+0x1ac/0x208
> [    7.631752]        kernel_init+0x18/0x12c
> [    7.636441]        ret_from_fork+0x14/0x28
> [    7.640602]
> [    7.640602] -> #0 (icc_bw_lock){+.+.}-{3:3}:
> [    7.644607]        __lock_acquire+0x1530/0x28fc
> [    7.650413]        lock_acquire+0x10c/0x33c
> [    7.654757]        icc_init+0x40/0x104
> [    7.658917]        do_one_initcall+0x68/0x2d8
> [    7.662740]        kernel_init_freeable+0x1ac/0x208
> [    7.667168]        kernel_init+0x18/0x12c
> [    7.671855]        ret_from_fork+0x14/0x28
> [    7.676018]
> [    7.676018] other info that might help us debug this:
> [    7.676018]
> [    7.680041] Chain exists of:
> [    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
> [    7.680041]
> [    7.687955]  Possible unsafe locking scenario:
> [    7.687955]
> [    7.699039]        CPU0                    CPU1
> [    7.704752]        ----                    ----
> [    7.709266]   lock(fs_reclaim);
> [    7.713779]                                lock(regulator_ww_class_acquire);
> [    7.716919]                                lock(fs_reclaim);
> [    7.724204]   lock(icc_bw_lock);
> [    7.729833]
> [    7.729833]  *** DEADLOCK ***
> [    7.729833]
> [    7.733071] 1 lock held by swapper/0/1:
> [    7.738691]  #0: c1397444 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x18/0x104
> [    7.742528]
> [    7.742528] stack backtrace:
> [    7.749463] CPU: 3 PID: 1 Comm: swapper/0 Not tainted
> 6.6.0-rc3-next-20230929-08870-g49503b4b9f55 #129
> [    7.754010] Hardware name: Generic DT based system
> [    7.763183]  unwind_backtrace from show_stack+0x10/0x14
> [    7.767957]  show_stack from dump_stack_lvl+0x60/0x90
> [    7.773082]  dump_stack_lvl from check_noncircular+0x174/0x18c
> [    7.778288]  check_noncircular from __lock_acquire+0x1530/0x28fc
> [    7.784018]  __lock_acquire from lock_acquire+0x10c/0x33c
> [    7.790178]  lock_acquire from icc_init+0x40/0x104
> [    7.795475]  icc_init from do_one_initcall+0x68/0x2d8
> [    7.800159]  do_one_initcall from kernel_init_freeable+0x1ac/0x208
> [    7.805286]  kernel_init_freeable from kernel_init+0x18/0x12c
> [    7.811361]  kernel_init from ret_from_fork+0x14/0x28
> [    7.817177] Exception stack(0xf081dfb0 to 0xf081dff8)
> 

Thanks! I assume you haven't noticed an actual deadlock (i.e. cpufreq
not progressing) without lockdep?

FWIW I'm not fully convinced that this deadlock can actually happen in
practice for your particular limited use case (i.e. that it is not a
false positive).

> > In general, I would expect that adjusting a regulator from an
> > interconnect driver should be made possible somehow. Just because the
> > RPM firmware or similar typically handles this internally on Qualcomm
> > platforms doesn't mean no one else will ever need to do this. If you
> > have a higher bandwidth requests, need to increase the clock, which
> > again depends on a higher voltage, then you need to be able to do the
> > regulator_set_voltage() from the ICC driver somehow.
> 
> This kind of dependency is handled by the consumer, not by the ICC
> driver. Usually we explicitly put them to the opp tables, matching
> rates and volage.
> 

Right, but I'm talking about the voltage requirements of the aggregated
bus clock. The rate for that is calculated from the different bandwidth
requirements from all the consumers and not just a single one. From the
kernel perspective, the interconnect driver is the consumer of that
clock. On Qualcomm SMD RPM platforms the voltage adjustment for those is
handled behind the scenes by the RPM firmware (no idea about RPMH and
RPM-old).

But one could easily think of a (non-Qualcomm) SoC without RPM, where
the interconnect driver calculates a particular bus rate for multiple
consumers and then needs to vote for a power domain or regulator to be
able to generate that clock safely.

> I think I'll still check the required-opps approach. It will require
> manually aggregating the L2 rate requirements, but then we will be
> safe from such kinds of interdependencies. And maybe then we can also
> rewrite msm8996 to use required-opps too.
> 

Sure, thanks! I think at the moment the required-opps implementation is
somewhat entangled with power domains/genpd but perhaps it's easy enough
to refactor that to be more general.

Thanks,
Stephan
Stephen Boyd Oct. 10, 2023, 4:14 a.m. UTC | #14
Quoting Dmitry Baryshkov (2023-10-03 08:31:27)
> On Tue, 3 Oct 2023 at 17:17, Stephan Gerhold <stephan@gerhold.net> wrote:
> >
> > On Tue, Oct 03, 2023 at 04:36:11PM +0300, Dmitry Baryshkov wrote:
> > > On Tue, 3 Oct 2023 at 16:02, Stephan Gerhold <stephan@gerhold.net> wrote:
> > > >
> > > > On Tue, Oct 03, 2023 at 11:30:28AM +0300, Dmitry Baryshkov wrote:
> > > > > On 28/08/2023 21:09, Stephen Boyd wrote:
> > > > > > Quoting Dmitry Baryshkov (2023-08-27 04:50:15)
> > > > > > > diff --git a/drivers/interconnect/icc-clk.c b/drivers/interconnect/icc-clk.c
> > > > > > > index d787f2ea36d9..45ffb068979d 100644
> > > > > > > --- a/drivers/interconnect/icc-clk.c
> > > > > > > +++ b/drivers/interconnect/icc-clk.c
> > > > > > > @@ -25,12 +28,16 @@ struct icc_clk_provider {
> > > > > > >   static int icc_clk_set(struct icc_node *src, struct icc_node *dst)
> > > > > > >   {
> > > > > > >          struct icc_clk_node *qn = src->data;
> > > > > > > +       unsigned long rate = icc_units_to_bps(src->peak_bw);
> > > > > > >          int ret;
> > > > > > >          if (!qn || !qn->clk)
> > > > > > >                  return 0;
> > > > > > > -       if (!src->peak_bw) {
> > > > > > > +       if (qn->opp)
> > > > > > > +               return dev_pm_opp_set_rate(qn->dev, rate);
> > > > > >
> > > > > > Just curious how does lockdep do with this? Doesn't OPP call into
> > > > > > interconnect code, so lockdep will complain about ABBA?
> > > > >
> > > > > Unfortunately it does. It seems, the icc-clk is not a proper way to go here.
> > > > > I will take a look at reusing set_required_opps for this case.
> > > > >
> > > >
> > > > Could you elaborate a bit which locks exactly cause trouble here?
> > > > I'm probably missing something here.
> > > >
> > > > From a quick look at the OPP code I don't see a global lock taken there
> > > > for the entire OPP switch sequence, so I'm not sure how this could cause
> > > > an ABBA deadlock.
> > >
> > > For example:
> > >
> > > [    7.680041] Chain exists of:
> > > [    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
> > > [    7.680041]
> > > [    7.687955]  Possible unsafe locking scenario:
> > > [    7.687955]
> > > [    7.699039]        CPU0                    CPU1
> > > [    7.704752]        ----                    ----
> > > [    7.709266]   lock(fs_reclaim);
> > > [    7.713779]                                lock(regulator_ww_class_acquire);
> > > [    7.716919]                                lock(fs_reclaim);
> > > [    7.724204]   lock(icc_bw_lock);
> > >
> >
> > Hm, I'm not entirely sure how to interpret this. Is there really a
> > connection between OPP and ICC here? It looks more like ICC <->
> > regulator and somehow related to memory allocations (the fs_reclaim).
> >
> > Was there more output around this?
> 
> [    7.362902] ======================================================
> [    7.363447] WARNING: possible circular locking dependency detected
> [    7.369434] 6.6.0-rc3-next-20230929-08870-g49503b4b9f55 #129 Not tainted
> [    7.375604] ------------------------------------------------------
> [    7.382453] swapper/0/1 is trying to acquire lock:
> [    7.388437] c143445c (icc_bw_lock){+.+.}-{3:3}, at: icc_init+0x0/0x104
> [    7.393225]
> [    7.393225] but task is already holding lock:
> [    7.399730] c1397444 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x18/0x104
> [    7.405547]
> [    7.405547] which lock already depends on the new lock.
> [    7.405547]
> [    7.412077]
> [    7.412077] the existing dependency chain (in reverse order) is:
> [    7.420310]
> [    7.420310] -> #2 (fs_reclaim){+.+.}-{0:0}:
> [    7.427767]        fs_reclaim_acquire+0x60/0x94
> [    7.433492]        __kmem_cache_alloc_node+0x30/0x2b4
> [    7.437742]        __kmalloc_node_track_caller+0x48/0x244
> [    7.442954]        kstrdup+0x30/0x58
> [    7.448325]        create_regulator+0x70/0x2b8

The regulator core should avoid holding the lock while calling into
kstrdup() and also debugfs file creation APIs.

-Stephen

> [    7.451981]        regulator_resolve_supply+0x3bc/0x7f4
> [    7.456234]        regulator_register+0x9b0/0xb50
> [    7.461352]        devm_regulator_register+0x50/0x8c
> [    7.466216]        rpm_reg_probe+0xf4/0x1a0
> [    7.471244]        platform_probe+0x5c/0xb0
> [    7.475157]        really_probe+0xc8/0x2d8
> [    7.479318]        __driver_probe_device+0x88/0x1a0
> [    7.483488]        driver_probe_device+0x30/0x108
> [    7.488262]        __driver_attach_async_helper+0x4c/0xa0
> [    7.493133]        async_run_entry_fn+0x24/0xb0
> [    7.498504]        process_one_work+0x208/0x5e4
> [    7.502844]        worker_thread+0x1e0/0x4a4
> [    7.507356]        kthread+0x100/0x120
> [    7.511874]        ret_from_fork+0x14/0x28
> [    7.515428]
> [    7.515428] -> #1 (regulator_ww_class_acquire){+.+.}-{0:0}:
> [    7.519528]        lock_release+0x164/0x340
> [    7.526713]        __mutex_unlock_slowpath+0x3c/0x2fc
> [    7.530626]        regulator_lock_dependent+0x1a4/0x298
> [    7.535839]        regulator_set_voltage+0x30/0xfc
> [    7.540866]        krait_l2_set_one_supply+0x28/0x84
> [    7.545729]        krait_l2_config_regulators+0x90/0x1c4
> [    7.550851]        _set_opp+0xf0/0x438
> [    7.556144]        dev_pm_opp_set_rate+0x118/0x224
> [    7.559703]        icc_node_add+0xe8/0x15c
> [    7.564474]        icc_clk_register+0x150/0x208
> [    7.568556]        krait_l2_probe+0x100/0x114
> [    7.572988]        platform_probe+0x5c/0xb0
> [    7.577495]        really_probe+0xc8/0x2d8
> [    7.581487]        __driver_probe_device+0x88/0x1a0
> [    7.585658]        driver_probe_device+0x30/0x108
> [    7.590433]        __device_attach_driver+0x94/0x10c
> [    7.595300]        bus_for_each_drv+0x84/0xd8
> [    7.600326]        __device_attach+0xac/0x1a8
> [    7.604580]        bus_probe_device+0x8c/0x90
> [    7.608919]        device_add+0x5c8/0x7a4
> [    7.613265]        of_platform_device_create_pdata+0x90/0xbc
> [    7.617260]        qcom_cpufreq_init+0xa8/0x130
> [    7.622984]        do_one_initcall+0x68/0x2d8
> [    7.627235]        kernel_init_freeable+0x1ac/0x208
> [    7.631752]        kernel_init+0x18/0x12c
> [    7.636441]        ret_from_fork+0x14/0x28
> [    7.640602]
> [    7.640602] -> #0 (icc_bw_lock){+.+.}-{3:3}:
> [    7.644607]        __lock_acquire+0x1530/0x28fc
> [    7.650413]        lock_acquire+0x10c/0x33c
> [    7.654757]        icc_init+0x40/0x104
> [    7.658917]        do_one_initcall+0x68/0x2d8
> [    7.662740]        kernel_init_freeable+0x1ac/0x208
> [    7.667168]        kernel_init+0x18/0x12c
> [    7.671855]        ret_from_fork+0x14/0x28
> [    7.676018]
> [    7.676018] other info that might help us debug this:
> [    7.676018]
> [    7.680041] Chain exists of:
> [    7.680041]   icc_bw_lock --> regulator_ww_class_acquire --> fs_reclaim
> [    7.680041]
> [    7.687955]  Possible unsafe locking scenario:
> [    7.687955]
> [    7.699039]        CPU0                    CPU1
> [    7.704752]        ----                    ----
> [    7.709266]   lock(fs_reclaim);
> [    7.713779]                                lock(regulator_ww_class_acquire);
> [    7.716919]                                lock(fs_reclaim);
> [    7.724204]   lock(icc_bw_lock);
> [    7.729833]
> [    7.729833]  *** DEADLOCK ***
> [    7.729833]
> [    7.733071] 1 lock held by swapper/0/1:
> [    7.738691]  #0: c1397444 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x18/0x104
> [    7.742528]
> [    7.742528] stack backtrace:
> [    7.749463] CPU: 3 PID: 1 Comm: swapper/0 Not tainted
> 6.6.0-rc3-next-20230929-08870-g49503b4b9f55 #129
> [    7.754010] Hardware name: Generic DT based system
> [    7.763183]  unwind_backtrace from show_stack+0x10/0x14
> [    7.767957]  show_stack from dump_stack_lvl+0x60/0x90
> [    7.773082]  dump_stack_lvl from check_noncircular+0x174/0x18c
> [    7.778288]  check_noncircular from __lock_acquire+0x1530/0x28fc
> [    7.784018]  __lock_acquire from lock_acquire+0x10c/0x33c
> [    7.790178]  lock_acquire from icc_init+0x40/0x104
> [    7.795475]  icc_init from do_one_initcall+0x68/0x2d8
> [    7.800159]  do_one_initcall from kernel_init_freeable+0x1ac/0x208
> [    7.805286]  kernel_init_freeable from kernel_init+0x18/0x12c
> [    7.811361]  kernel_init from ret_from_fork+0x14/0x28
> [    7.817177] Exception stack(0xf081dfb0 to 0xf081dff8)
>
Rob Herring (Arm) Oct. 11, 2023, 3:49 p.m. UTC | #15
On Sun, Aug 27, 2023 at 02:50:18PM +0300, Dmitry Baryshkov wrote:
> Add a simple driver that handles scaling of L2 frequency and voltages.
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---

[...]

> +static const struct of_device_id krait_l2_match_table[] = {
> +	{ .compatible = "qcom,krait-l2-cache" },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(of, krait_l2_match_table);
> +
> +static struct platform_driver krait_l2_driver = {
> +	.probe = krait_l2_probe,
> +	.remove = krait_l2_remove,
> +	.driver = {
> +		.name = "qcom-krait-l2",
> +		.of_match_table = krait_l2_match_table,
> +		.sync_state = icc_sync_state,
> +	},
> +};

As I mentioned in the other thread, cache devices already have a struct 
device. Specifically, they have a struct device (no subclass) on the 
cpu_subsys bus type. So there should be no need for a platform device 
and second struct device.

See drivers/acpi/processor_driver.c for an example. Or grep any use of 
"cpu_subsys".

Rob
Dmitry Baryshkov Oct. 11, 2023, 6:19 p.m. UTC | #16
On Wed, 11 Oct 2023 at 18:49, Rob Herring <robh@kernel.org> wrote:
>
> On Sun, Aug 27, 2023 at 02:50:18PM +0300, Dmitry Baryshkov wrote:
> > Add a simple driver that handles scaling of L2 frequency and voltages.
> >
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
>
> [...]
>
> > +static const struct of_device_id krait_l2_match_table[] = {
> > +     { .compatible = "qcom,krait-l2-cache" },
> > +     {}
> > +};
> > +MODULE_DEVICE_TABLE(of, krait_l2_match_table);
> > +
> > +static struct platform_driver krait_l2_driver = {
> > +     .probe = krait_l2_probe,
> > +     .remove = krait_l2_remove,
> > +     .driver = {
> > +             .name = "qcom-krait-l2",
> > +             .of_match_table = krait_l2_match_table,
> > +             .sync_state = icc_sync_state,
> > +     },
> > +};
>
> As I mentioned in the other thread, cache devices already have a struct
> device. Specifically, they have a struct device (no subclass) on the
> cpu_subsys bus type. So there should be no need for a platform device
> and second struct device.
>
> See drivers/acpi/processor_driver.c for an example. Or grep any use of
> "cpu_subsys".

Most likely you mean drivers/base/cacheinfo.c. I saw this code, I
don't think it makes a good fit here. The cacheinfo devices provide
information only, they are not tied to DT nodes in any way. cpu_subsys
doesn't provide a way to match drivers with subsys devices in the
non-ACPI case, etc. Moreover, the whole cacheinfo subsys is
non-existing on arm32, there is no cacheinfo implementation there,
thanks to the overall variety of architectures.

Thus said, I don't think cacheinfo makes a good fit for the case of
scaling L2 cache.
Rob Herring (Arm) Oct. 11, 2023, 6:44 p.m. UTC | #17
On Wed, Oct 11, 2023 at 1:20 PM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> On Wed, 11 Oct 2023 at 18:49, Rob Herring <robh@kernel.org> wrote:
> >
> > On Sun, Aug 27, 2023 at 02:50:18PM +0300, Dmitry Baryshkov wrote:
> > > Add a simple driver that handles scaling of L2 frequency and voltages.
> > >
> > > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > ---
> >
> > [...]
> >
> > > +static const struct of_device_id krait_l2_match_table[] = {
> > > +     { .compatible = "qcom,krait-l2-cache" },
> > > +     {}
> > > +};
> > > +MODULE_DEVICE_TABLE(of, krait_l2_match_table);
> > > +
> > > +static struct platform_driver krait_l2_driver = {
> > > +     .probe = krait_l2_probe,
> > > +     .remove = krait_l2_remove,
> > > +     .driver = {
> > > +             .name = "qcom-krait-l2",
> > > +             .of_match_table = krait_l2_match_table,
> > > +             .sync_state = icc_sync_state,
> > > +     },
> > > +};
> >
> > As I mentioned in the other thread, cache devices already have a struct
> > device. Specifically, they have a struct device (no subclass) on the
> > cpu_subsys bus type. So there should be no need for a platform device
> > and second struct device.
> >
> > See drivers/acpi/processor_driver.c for an example. Or grep any use of
> > "cpu_subsys".
>
> Most likely you mean drivers/base/cacheinfo.c. I saw this code, I
> don't think it makes a good fit here. The cacheinfo devices provide
> information only, they are not tied to DT nodes in any way.

They are completely tied to DT nodes beyond L1.

>  cpu_subsys
> doesn't provide a way to match drivers with subsys devices in the
> non-ACPI case, etc.

That's a 2 line addition to add DT support.

> Moreover, the whole cacheinfo subsys is
> non-existing on arm32, there is no cacheinfo implementation there,
> thanks to the overall variety of architectures.

Humm, well I don't think it would be too hard to add, but I won't ask
you to do that. All the info comes from DT or can come from DT, so it
should be just a matter of arm32 calling the cacheinfo init.

> Thus said, I don't think cacheinfo makes a good fit for the case of
> scaling L2 cache.

I still disagree. It's not really cacheinfo. That is what creates the
devices, but it's the cpu_subsys bus type. Why do you care that it is
platform bus vs. cpu_subsys?

On a separate issue, I'd propose you move this to drivers/cache/
instead of the dumping ground that is drivers/soc/. It's nothing more
than a location to collect cache related drivers ATM because we seem
to be accumulating more of them.

Rob
Dmitry Baryshkov Jan. 4, 2024, 2:02 a.m. UTC | #18
HI Rob,

Resurrecting old thread, but I think it's better as it has context.

Added driver core maintainers, see discussion points below.

On Wed, 11 Oct 2023 at 21:44, Rob Herring <robh@kernel.org> wrote:
>
> On Wed, Oct 11, 2023 at 1:20 PM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
> >
> > On Wed, 11 Oct 2023 at 18:49, Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Sun, Aug 27, 2023 at 02:50:18PM +0300, Dmitry Baryshkov wrote:
> > > > Add a simple driver that handles scaling of L2 frequency and voltages.
> > > >
> > > > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > > ---
> > >
> > > [...]
> > >
> > > > +static const struct of_device_id krait_l2_match_table[] = {
> > > > +     { .compatible = "qcom,krait-l2-cache" },
> > > > +     {}
> > > > +};
> > > > +MODULE_DEVICE_TABLE(of, krait_l2_match_table);
> > > > +
> > > > +static struct platform_driver krait_l2_driver = {
> > > > +     .probe = krait_l2_probe,
> > > > +     .remove = krait_l2_remove,
> > > > +     .driver = {
> > > > +             .name = "qcom-krait-l2",
> > > > +             .of_match_table = krait_l2_match_table,
> > > > +             .sync_state = icc_sync_state,
> > > > +     },
> > > > +};
> > >
> > > As I mentioned in the other thread, cache devices already have a struct
> > > device. Specifically, they have a struct device (no subclass) on the
> > > cpu_subsys bus type. So there should be no need for a platform device
> > > and second struct device.
> > >
> > > See drivers/acpi/processor_driver.c for an example. Or grep any use of
> > > "cpu_subsys".
> >
> > Most likely you mean drivers/base/cacheinfo.c. I saw this code, I
> > don't think it makes a good fit here. The cacheinfo devices provide
> > information only, they are not tied to DT nodes in any way.
>
> They are completely tied to DT nodes beyond L1.
>
> >  cpu_subsys
> > doesn't provide a way to match drivers with subsys devices in the
> > non-ACPI case, etc.
>
> That's a 2 line addition to add DT support.
>
> > Moreover, the whole cacheinfo subsys is
> > non-existing on arm32, there is no cacheinfo implementation there,
> > thanks to the overall variety of architectures.
>
> Humm, well I don't think it would be too hard to add, but I won't ask
> you to do that. All the info comes from DT or can come from DT, so it
> should be just a matter of arm32 calling the cacheinfo init.
>
> > Thus said, I don't think cacheinfo makes a good fit for the case of
> > scaling L2 cache.
>
> I still disagree. It's not really cacheinfo. That is what creates the
> devices, but it's the cpu_subsys bus type. Why do you care that it is
> platform bus vs. cpu_subsys?

I finally found a timeslot to look at cacheinfo. I added support for
arm32 cacheinfo (which is fine) and tried using cacheinfo devices for
L2 driver mapping (the RFC has been posted at [1]).
But after I actually tried using it for the L2 cache driver.  I
stumbled upon several issues, which I'd like to discuss before rushing
to code.

First, you supposed that cacheinfo devices land onto the cpu_subsys
bus. However only actual CPU devices end up on cpu_subsys. CPU cache
devices are created using cpu_device_create(), but despite its name
they don't go to cpu_subsys.

Second and more important, these devices are created without any
attempt to share them. So on a 4-core system I have 4 distinct devices
for L2 cache even though it is shared between all cores.

root@qcom-armv7a:~# stat -c "%N %i" /sys/bus/cpu/devices/cpu*/cache/index2/level
/sys/bus/cpu/devices/cpu0/cache/index2/level 15537
/sys/bus/cpu/devices/cpu1/cache/index2/level 15560
/sys/bus/cpu/devices/cpu2/cache/index2/level 15583
/sys/bus/cpu/devices/cpu3/cache/index2/level 15606

I think it makes sense to rework cacheinfo to create actual CPU cache
devices (maybe having a separate cache bus).
In my case it should become something like:

cpu0-2-unified (shared between all 4 cores)
cpu0-1-icache
cpu0-1-dcache
cpu1-1-icache
cpu1-1-dcache
...

I'm not sure if it's worth supporting more than one instance of the
same kind per level (e.g. I think current cacheinfo has nothing
against having two I-cache or two D-cache devices)

The cpuN/cache/indexM should become symlinks to those cache devices.

What do you think?

[1] https://lore.kernel.org/linux-arm-msm/CAA8EJppCRzknaujKFyLa_i7x4UnX31YFSyjtux+zJ0harixrbA@mail.gmail.com

> On a separate issue, I'd propose you move this to drivers/cache/
> instead of the dumping ground that is drivers/soc/. It's nothing more
> than a location to collect cache related drivers ATM because we seem
> to be accumulating more of them.

I thought about reusing drivers/devfreq, it already has the Mediatek CCI driver.
Rob Herring (Arm) Jan. 15, 2024, 4:16 p.m. UTC | #19
On Wed, Jan 3, 2024 at 8:02 PM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> HI Rob,
>
> Resurrecting old thread, but I think it's better as it has context.
>
> Added driver core maintainers, see discussion points below.
>
> On Wed, 11 Oct 2023 at 21:44, Rob Herring <robh@kernel.org> wrote:
> >
> > On Wed, Oct 11, 2023 at 1:20 PM Dmitry Baryshkov
> > <dmitry.baryshkov@linaro.org> wrote:
> > >
> > > On Wed, 11 Oct 2023 at 18:49, Rob Herring <robh@kernel.org> wrote:
> > > >
> > > > On Sun, Aug 27, 2023 at 02:50:18PM +0300, Dmitry Baryshkov wrote:
> > > > > Add a simple driver that handles scaling of L2 frequency and voltages.
> > > > >
> > > > > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > > > ---
> > > >
> > > > [...]
> > > >
> > > > > +static const struct of_device_id krait_l2_match_table[] = {
> > > > > +     { .compatible = "qcom,krait-l2-cache" },
> > > > > +     {}
> > > > > +};
> > > > > +MODULE_DEVICE_TABLE(of, krait_l2_match_table);
> > > > > +
> > > > > +static struct platform_driver krait_l2_driver = {
> > > > > +     .probe = krait_l2_probe,
> > > > > +     .remove = krait_l2_remove,
> > > > > +     .driver = {
> > > > > +             .name = "qcom-krait-l2",
> > > > > +             .of_match_table = krait_l2_match_table,
> > > > > +             .sync_state = icc_sync_state,
> > > > > +     },
> > > > > +};
> > > >
> > > > As I mentioned in the other thread, cache devices already have a struct
> > > > device. Specifically, they have a struct device (no subclass) on the
> > > > cpu_subsys bus type. So there should be no need for a platform device
> > > > and second struct device.
> > > >
> > > > See drivers/acpi/processor_driver.c for an example. Or grep any use of
> > > > "cpu_subsys".
> > >
> > > Most likely you mean drivers/base/cacheinfo.c. I saw this code, I
> > > don't think it makes a good fit here. The cacheinfo devices provide
> > > information only, they are not tied to DT nodes in any way.
> >
> > They are completely tied to DT nodes beyond L1.
> >
> > >  cpu_subsys
> > > doesn't provide a way to match drivers with subsys devices in the
> > > non-ACPI case, etc.
> >
> > That's a 2 line addition to add DT support.
> >
> > > Moreover, the whole cacheinfo subsys is
> > > non-existing on arm32, there is no cacheinfo implementation there,
> > > thanks to the overall variety of architectures.
> >
> > Humm, well I don't think it would be too hard to add, but I won't ask
> > you to do that. All the info comes from DT or can come from DT, so it
> > should be just a matter of arm32 calling the cacheinfo init.
> >
> > > Thus said, I don't think cacheinfo makes a good fit for the case of
> > > scaling L2 cache.
> >
> > I still disagree. It's not really cacheinfo. That is what creates the
> > devices, but it's the cpu_subsys bus type. Why do you care that it is
> > platform bus vs. cpu_subsys?
>
> I finally found a timeslot to look at cacheinfo. I added support for
> arm32 cacheinfo (which is fine) and tried using cacheinfo devices for
> L2 driver mapping (the RFC has been posted at [1]).
> But after I actually tried using it for the L2 cache driver.  I
> stumbled upon several issues, which I'd like to discuss before rushing
> to code.
>
> First, you supposed that cacheinfo devices land onto the cpu_subsys
> bus. However only actual CPU devices end up on cpu_subsys. CPU cache
> devices are created using cpu_device_create(), but despite its name
> they don't go to cpu_subsys.
>
> Second and more important, these devices are created without any
> attempt to share them. So on a 4-core system I have 4 distinct devices
> for L2 cache even though it is shared between all cores.

I wonder if that's because things are created in CPU hotplug callbacks
and there might be ordering problems if cache devices are created in
another code path.

Also, I think on some PowerPC systems, CPUs can move to different L2
(or L3?) caches when hot unplugged and then plugged. So hotplug
rescans everything. I don't think that would be a problem with this
and PowerPC does its own scanning anyways. Just wanted you to be aware
of the issue.

> root@qcom-armv7a:~# stat -c "%N %i" /sys/bus/cpu/devices/cpu*/cache/index2/level
> /sys/bus/cpu/devices/cpu0/cache/index2/level 15537
> /sys/bus/cpu/devices/cpu1/cache/index2/level 15560
> /sys/bus/cpu/devices/cpu2/cache/index2/level 15583
> /sys/bus/cpu/devices/cpu3/cache/index2/level 15606
>
> I think it makes sense to rework cacheinfo to create actual CPU cache
> devices (maybe having a separate cache bus).
> In my case it should become something like:
>
> cpu0-2-unified (shared between all 4 cores)
> cpu0-1-icache
> cpu0-1-dcache
> cpu1-1-icache
> cpu1-1-dcache
> ...
>
> I'm not sure if it's worth supporting more than one instance of the
> same kind per level (e.g. I think current cacheinfo has nothing
> against having two I-cache or two D-cache devices)

Probably a safe assumption. Though I think old XScale CPUs had a 1K
mini I-cache and the main L1 I-cache. I guess that's really an L0
cache though.

> The cpuN/cache/indexM should become symlinks to those cache devices.
>
> What do you think?

Seems like a good improvement to me if changing the current way
doesn't cause an ABI issue.


> [1] https://lore.kernel.org/linux-arm-msm/CAA8EJppCRzknaujKFyLa_i7x4UnX31YFSyjtux+zJ0harixrbA@mail.gmail.com
>
> > On a separate issue, I'd propose you move this to drivers/cache/
> > instead of the dumping ground that is drivers/soc/. It's nothing more
> > than a location to collect cache related drivers ATM because we seem
> > to be accumulating more of them.
>
> I thought about reusing drivers/devfreq, it already has the Mediatek CCI driver.

Anywhere except drivers/misc/ would be an improvement over
drivers/soc/. devfreq is more tied to interconnects than caches
though.

Rob