mbox series

[00/23] interconnect: fix racy provider registration

Message ID 20230201101559.15529-1-johan+linaro@kernel.org
Headers show
Series interconnect: fix racy provider registration | expand

Message

Johan Hovold Feb. 1, 2023, 10:15 a.m. UTC
The current interconnect provider interface is inherently racy as
providers are expected to be registered before being fully initialised.

This can specifically cause racing DT lookups to fail as I recently
noticed when the Qualcomm cpufreq driver failed to probe:

	of_icc_xlate_onecell: invalid index 0
        cpu cpu0: error -EINVAL: error finding src node
        cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22
        qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22

This only happens very rarely, but the bug is easily reproduced by
increasing the race window by adding an msleep() after registering
osm-l3 interconnect provider.

Note that the Qualcomm cpufreq driver is especially susceptible to this
race as the interconnect path is looked up from the CPU nodes so that
driver core does not guarantee the probe order even when device links
are enabled (which they not always are).

This series adds a new interconnect provider registration API which is
used to fix up the interconnect drivers before removing the old racy
API.

Included are also a number of fixes for other bugs found while preparing
the series.

Johan


Johan Hovold (23):
  interconnect: fix mem leak when freeing nodes
  interconnect: fix icc_provider_del() error handling
  interconnect: fix provider registration API
  interconnect: imx: fix registration race
  interconnect: qcom: osm-l3: fix registration race
  interconnect: qcom: rpm: fix probe child-node error handling
  interconnect: qcom: rpm: fix probe PM domain error handling
  interconnect: qcom: rpm: fix registration race
  interconnect: qcom: rpmh: fix probe child-node error handling
  interconnect: qcom: rpmh: fix registration race
  interconnect: qcom: msm8974: fix registration race
  interconnect: qcom: sm8450: fix registration race
  interconnect: qcom: sm8550: fix registration race
  interconnect: exynos: fix node leak in probe PM QoS error path
  interconnect: exynos: fix registration race
  interconnect: exynos: drop redundant link destroy
  memory: tegra: fix interconnect registration race
  memory: tegra124-emc: fix interconnect registration race
  memory: tegra20-emc: fix interconnect registration race
  memory: tegra30-emc: fix interconnect registration race
  interconnect: drop racy registration API
  interconnect: drop unused icc_get() interface
  interconnect: drop unused icc_link_destroy() interface

 drivers/interconnect/core.c           | 149 +++++---------------------
 drivers/interconnect/imx/imx.c        |  20 ++--
 drivers/interconnect/qcom/icc-rpm.c   |  33 +++---
 drivers/interconnect/qcom/icc-rpmh.c  |  30 ++++--
 drivers/interconnect/qcom/msm8974.c   |  20 ++--
 drivers/interconnect/qcom/osm-l3.c    |  14 ++-
 drivers/interconnect/qcom/sm8450.c    |  22 ++--
 drivers/interconnect/qcom/sm8550.c    |  22 ++--
 drivers/interconnect/samsung/exynos.c |  30 +++---
 drivers/memory/tegra/mc.c             |  16 ++-
 drivers/memory/tegra/tegra124-emc.c   |  12 +--
 drivers/memory/tegra/tegra20-emc.c    |  12 +--
 drivers/memory/tegra/tegra30-emc.c    |  12 +--
 include/linux/interconnect-provider.h |  19 ++--
 include/linux/interconnect.h          |   8 --
 15 files changed, 154 insertions(+), 265 deletions(-)

Comments

Krzysztof Kozlowski Feb. 2, 2023, 11:13 a.m. UTC | #1
On 01/02/2023 11:15, Johan Hovold wrote:
> The current interconnect provider interface is inherently racy as
> providers are expected to be registered before being fully initialised.
> 
> This can specifically cause racing DT lookups to fail as I recently
> noticed when the Qualcomm cpufreq driver failed to probe:
> 
> 	of_icc_xlate_onecell: invalid index 0
>         cpu cpu0: error -EINVAL: error finding src node
>         cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22
>         qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22
> 
> This only happens very rarely, but the bug is easily reproduced by
> increasing the race window by adding an msleep() after registering
> osm-l3 interconnect provider.
> 
> Note that the Qualcomm cpufreq driver is especially susceptible to this
> race as the interconnect path is looked up from the CPU nodes so that
> driver core does not guarantee the probe order even when device links
> are enabled (which they not always are).
> 
> This series adds a new interconnect provider registration API which is
> used to fix up the interconnect drivers before removing the old racy
> API.
> 

So is there a dependency or not? Can you make it clear that I shouldn't
take memory controller bits?

Best regards,
Krzysztof
Johan Hovold Feb. 2, 2023, 12:20 p.m. UTC | #2
On Thu, Feb 02, 2023 at 12:13:33PM +0100, Krzysztof Kozlowski wrote:
> On 01/02/2023 11:15, Johan Hovold wrote:
> > The current interconnect provider interface is inherently racy as
> > providers are expected to be registered before being fully initialised.
> > 
> > This can specifically cause racing DT lookups to fail as I recently
> > noticed when the Qualcomm cpufreq driver failed to probe:
> > 
> > 	of_icc_xlate_onecell: invalid index 0
> >         cpu cpu0: error -EINVAL: error finding src node
> >         cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22
> >         qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22
> > 
> > This only happens very rarely, but the bug is easily reproduced by
> > increasing the race window by adding an msleep() after registering
> > osm-l3 interconnect provider.
> > 
> > Note that the Qualcomm cpufreq driver is especially susceptible to this
> > race as the interconnect path is looked up from the CPU nodes so that
> > driver core does not guarantee the probe order even when device links
> > are enabled (which they not always are).
> > 
> > This series adds a new interconnect provider registration API which is
> > used to fix up the interconnect drivers before removing the old racy
> > API.
> > 
> 
> So is there a dependency or not? Can you make it clear that I shouldn't
> take memory controller bits?

As the fixes depend on the new API it is best if these could all go
through Georgi's tree.

Johan