diff mbox series

[v2] PCI: tegra194: Fix runtime PM imbalance on error

Message ID 20200521031355.7022-1-dinghao.liu@zju.edu.cn
State New
Headers show
Series [v2] PCI: tegra194: Fix runtime PM imbalance on error | expand

Commit Message

Dinghao Liu May 21, 2020, 3:13 a.m. UTC
pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code. Thus a pairing decrement is needed on
the error handling path to keep the counter balanced.

Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
---
 drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Bjorn Helgaas May 21, 2020, 3:16 p.m. UTC | #1
[+cc Rafael, linux-pm]

On Thu, May 21, 2020 at 11:13:49AM +0800, Dinghao Liu wrote:
> pm_runtime_get_sync() increments the runtime PM usage counter even
> when it returns an error code. Thus a pairing decrement is needed on
> the error handling path to keep the counter balanced.

I didn't realize there were so many drivers with the exact same issue.
Can we just squash these all into a single patch so we can see them
all together?

Hmm.  There are over 1300 callers of pm_runtime_get_sync(), and it
looks like many of them have similar issues, i.e., they have a pattern
like this

  ret = pm_runtime_get_sync(dev);
  if (ret < 0)
    return;

  pm_runtime_put(dev);

where there is not a pm_runtime_put() to match every
pm_runtime_get_sync().  Random sample:

  nds32_pmu_reserve_hardware
  sata_rcar_probe
  exynos_trng_probe
  ks_sa_rng_probe
  omap_aes_probe
  sun8i_ss_probe
  omap_aes_probe
  zynq_gpio_probe
  amdgpu_hwmon_show_power_avg
  mtk_crtc_ddp_hw_init
  ...

Surely I'm missing something and these aren't all broken, right?

Maybe we could put together a coccinelle script to scan the tree for
this issue?

> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
> ---
>  drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index ae30a2fd3716..2c0d2ce16b47 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -1623,7 +1623,7 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
>  	ret = pinctrl_pm_select_default_state(dev);
>  	if (ret < 0) {
>  		dev_err(dev, "Failed to configure sideband pins: %d\n", ret);
> -		goto fail_pinctrl;
> +		goto fail_pm_get_sync;
>  	}
>  
>  	tegra_pcie_init_controller(pcie);
> @@ -1650,9 +1650,8 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
>  
>  fail_host_init:
>  	tegra_pcie_deinit_controller(pcie);
> -fail_pinctrl:
> -	pm_runtime_put_sync(dev);
>  fail_pm_get_sync:
> +	pm_runtime_put_sync(dev);
>  	pm_runtime_disable(dev);
>  	return ret;
>  }
> -- 
> 2.17.1
>
Rafael J. Wysocki May 21, 2020, 3:25 p.m. UTC | #2
On Thu, May 21, 2020 at 5:16 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Rafael, linux-pm]
>
> On Thu, May 21, 2020 at 11:13:49AM +0800, Dinghao Liu wrote:
> > pm_runtime_get_sync() increments the runtime PM usage counter even
> > when it returns an error code. Thus a pairing decrement is needed on
> > the error handling path to keep the counter balanced.
>
> I didn't realize there were so many drivers with the exact same issue.
> Can we just squash these all into a single patch so we can see them
> all together?
>
> Hmm.  There are over 1300 callers of pm_runtime_get_sync(), and it
> looks like many of them have similar issues, i.e., they have a pattern
> like this
>
>   ret = pm_runtime_get_sync(dev);
>   if (ret < 0)
>     return;
>
>   pm_runtime_put(dev);
>
> where there is not a pm_runtime_put() to match every
> pm_runtime_get_sync().  Random sample:
>
>   nds32_pmu_reserve_hardware
>   sata_rcar_probe
>   exynos_trng_probe
>   ks_sa_rng_probe
>   omap_aes_probe
>   sun8i_ss_probe
>   omap_aes_probe
>   zynq_gpio_probe
>   amdgpu_hwmon_show_power_avg
>   mtk_crtc_ddp_hw_init
>   ...
>
> Surely I'm missing something and these aren't all broken, right?

If they do what you've said, they are all broken I'm afraid.

They should all be doing something like

    ret = pm_runtime_get_sync(dev);
    if (ret < 0)
        goto out;

    ...

out:
    pm_runtime_put(dev);

> Maybe we could put together a coccinelle script to scan the tree for
> this issue?
>
> > Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
> > ---
> >  drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> > index ae30a2fd3716..2c0d2ce16b47 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -1623,7 +1623,7 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
> >       ret = pinctrl_pm_select_default_state(dev);
> >       if (ret < 0) {
> >               dev_err(dev, "Failed to configure sideband pins: %d\n", ret);
> > -             goto fail_pinctrl;
> > +             goto fail_pm_get_sync;
> >       }
> >
> >       tegra_pcie_init_controller(pcie);
> > @@ -1650,9 +1650,8 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
> >
> >  fail_host_init:
> >       tegra_pcie_deinit_controller(pcie);
> > -fail_pinctrl:
> > -     pm_runtime_put_sync(dev);
> >  fail_pm_get_sync:
> > +     pm_runtime_put_sync(dev);

Why not pm_runtime_put()?

> >       pm_runtime_disable(dev);
> >       return ret;
> >  }
> > --
> > 2.17.1
> >
Dinghao Liu May 22, 2020, 4:36 a.m. UTC | #3
Hi Bjorn,

In fact, most usage of pm_runtime_get_sync() is correct. I made 
a static analysis tool to check this imbalance in kernel and 
found about 80 bugs in dirvers. Some of my patches have been 
accepted and I'm trying to patch the rest as soon as possible.

Regards,
Dinghao 

> [+cc Rafael, linux-pm]
> 
> On Thu, May 21, 2020 at 11:13:49AM +0800, Dinghao Liu wrote:
> > pm_runtime_get_sync() increments the runtime PM usage counter even
> > when it returns an error code. Thus a pairing decrement is needed on
> > the error handling path to keep the counter balanced.
> 
> I didn't realize there were so many drivers with the exact same issue.
> Can we just squash these all into a single patch so we can see them
> all together?
> 
> Hmm.  There are over 1300 callers of pm_runtime_get_sync(), and it
> looks like many of them have similar issues, i.e., they have a pattern
> like this
> 
>   ret = pm_runtime_get_sync(dev);
>   if (ret < 0)
>     return;
> 
>   pm_runtime_put(dev);
> 
> where there is not a pm_runtime_put() to match every
> pm_runtime_get_sync().  Random sample:
> 
>   nds32_pmu_reserve_hardware
>   sata_rcar_probe
>   exynos_trng_probe
>   ks_sa_rng_probe
>   omap_aes_probe
>   sun8i_ss_probe
>   omap_aes_probe
>   zynq_gpio_probe
>   amdgpu_hwmon_show_power_avg
>   mtk_crtc_ddp_hw_init
>   ...
> 
> Surely I'm missing something and these aren't all broken, right?
> 
> Maybe we could put together a coccinelle script to scan the tree for
> this issue?
> 
> > Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
> > ---
> >  drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> > index ae30a2fd3716..2c0d2ce16b47 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -1623,7 +1623,7 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
> >  	ret = pinctrl_pm_select_default_state(dev);
> >  	if (ret < 0) {
> >  		dev_err(dev, "Failed to configure sideband pins: %d\n", ret);
> > -		goto fail_pinctrl;
> > +		goto fail_pm_get_sync;
> >  	}
> >  
> >  	tegra_pcie_init_controller(pcie);
> > @@ -1650,9 +1650,8 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
> >  
> >  fail_host_init:
> >  	tegra_pcie_deinit_controller(pcie);
> > -fail_pinctrl:
> > -	pm_runtime_put_sync(dev);
> >  fail_pm_get_sync:
> > +	pm_runtime_put_sync(dev);
> >  	pm_runtime_disable(dev);
> >  	return ret;
> >  }
> > -- 
> > 2.17.1
> >
Dinghao Liu May 22, 2020, 6:06 a.m. UTC | #4
> On Thu, May 21, 2020 at 5:16 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Rafael, linux-pm]
> >
> > On Thu, May 21, 2020 at 11:13:49AM +0800, Dinghao Liu wrote:
> > > pm_runtime_get_sync() increments the runtime PM usage counter even
> > > when it returns an error code. Thus a pairing decrement is needed on
> > > the error handling path to keep the counter balanced.
> >
> > I didn't realize there were so many drivers with the exact same issue.
> > Can we just squash these all into a single patch so we can see them
> > all together?
> >
> > Hmm.  There are over 1300 callers of pm_runtime_get_sync(), and it
> > looks like many of them have similar issues, i.e., they have a pattern
> > like this
> >
> >   ret = pm_runtime_get_sync(dev);
> >   if (ret < 0)
> >     return;
> >
> >   pm_runtime_put(dev);
> >
> > where there is not a pm_runtime_put() to match every
> > pm_runtime_get_sync().  Random sample:
> >
> >   nds32_pmu_reserve_hardware
> >   sata_rcar_probe
> >   exynos_trng_probe
> >   ks_sa_rng_probe
> >   omap_aes_probe
> >   sun8i_ss_probe
> >   omap_aes_probe
> >   zynq_gpio_probe
> >   amdgpu_hwmon_show_power_avg
> >   mtk_crtc_ddp_hw_init
> >   ...
> >
> > Surely I'm missing something and these aren't all broken, right?
> 
> If they do what you've said, they are all broken I'm afraid.
> 
> They should all be doing something like
> 
>     ret = pm_runtime_get_sync(dev);
>     if (ret < 0)
>         goto out;
> 
>     ...
> 
> out:
>     pm_runtime_put(dev);
> 
> > Maybe we could put together a coccinelle script to scan the tree for
> > this issue?
> >
> > > Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
> > > ---
> > >  drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> > > index ae30a2fd3716..2c0d2ce16b47 100644
> > > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > > @@ -1623,7 +1623,7 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
> > >       ret = pinctrl_pm_select_default_state(dev);
> > >       if (ret < 0) {
> > >               dev_err(dev, "Failed to configure sideband pins: %d\n", ret);
> > > -             goto fail_pinctrl;
> > > +             goto fail_pm_get_sync;
> > >       }
> > >
> > >       tegra_pcie_init_controller(pcie);
> > > @@ -1650,9 +1650,8 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
> > >
> > >  fail_host_init:
> > >       tegra_pcie_deinit_controller(pcie);
> > > -fail_pinctrl:
> > > -     pm_runtime_put_sync(dev);
> > >  fail_pm_get_sync:
> > > +     pm_runtime_put_sync(dev);
> 
> Why not pm_runtime_put()?

Good question. For functions with PM decrement API somewhere, I 
will adopt it. If this API is not suitable here, please tell me.

> 
> > >       pm_runtime_disable(dev);
> > >       return ret;
> > >  }
> > > --
> > > 2.17.1
> > >
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index ae30a2fd3716..2c0d2ce16b47 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -1623,7 +1623,7 @@  static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
 	ret = pinctrl_pm_select_default_state(dev);
 	if (ret < 0) {
 		dev_err(dev, "Failed to configure sideband pins: %d\n", ret);
-		goto fail_pinctrl;
+		goto fail_pm_get_sync;
 	}
 
 	tegra_pcie_init_controller(pcie);
@@ -1650,9 +1650,8 @@  static int tegra_pcie_config_rp(struct tegra_pcie_dw *pcie)
 
 fail_host_init:
 	tegra_pcie_deinit_controller(pcie);
-fail_pinctrl:
-	pm_runtime_put_sync(dev);
 fail_pm_get_sync:
+	pm_runtime_put_sync(dev);
 	pm_runtime_disable(dev);
 	return ret;
 }