diff mbox series

[v13,13/35] drm/tegra: gr2d: Support generic power domain and runtime PM

Message ID 20210926224058.1252-14-digetx@gmail.com
State Superseded
Headers show
Series NVIDIA Tegra power management patches for 5.16 | expand

Commit Message

Dmitry Osipenko Sept. 26, 2021, 10:40 p.m. UTC
Add runtime power management and support generic power domains.

Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
 1 file changed, 147 insertions(+), 8 deletions(-)

Comments

Ulf Hansson Oct. 1, 2021, 1:39 p.m. UTC | #1
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> Add runtime power management and support generic power domains.
>
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--

[...]

>  static int gr2d_remove(struct platform_device *pdev)
> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
>                 return err;
>         }
>
> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> +       pm_runtime_disable(&pdev->dev);

There is no guarantee that the ->runtime_suspend() has been invoked
here, which means that clock may be left prepared/enabled beyond this
point.

I suggest you call pm_runtime_force_suspend(), instead of
pm_runtime_disable(), to make sure that gets done.

[...]

Kind regards
Uffe
Dmitry Osipenko Oct. 1, 2021, 2:29 p.m. UTC | #2
01.10.2021 16:39, Ulf Hansson пишет:
> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> Add runtime power management and support generic power domains.
>>
>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> 
> [...]
> 
>>  static int gr2d_remove(struct platform_device *pdev)
>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
>>                 return err;
>>         }
>>
>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
>> +       pm_runtime_disable(&pdev->dev);
> 
> There is no guarantee that the ->runtime_suspend() has been invoked
> here, which means that clock may be left prepared/enabled beyond this
> point.
> 
> I suggest you call pm_runtime_force_suspend(), instead of
> pm_runtime_disable(), to make sure that gets done.

The pm_runtime_disable() performs the final synchronization, please see [1].

[1]
https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412

Calling pm_runtime_force_suspend() isn't correct because each 'enable'
must have the corresponding 'disable'. Hence there is no problem here.
Ulf Hansson Oct. 1, 2021, 2:55 p.m. UTC | #3
On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 16:39, Ulf Hansson пишет:
> > On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> Add runtime power management and support generic power domains.
> >>
> >> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> >> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> >> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> >> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> >> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >> ---
> >>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> >
> > [...]
> >
> >>  static int gr2d_remove(struct platform_device *pdev)
> >> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
> >>                 return err;
> >>         }
> >>
> >> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> >> +       pm_runtime_disable(&pdev->dev);
> >
> > There is no guarantee that the ->runtime_suspend() has been invoked
> > here, which means that clock may be left prepared/enabled beyond this
> > point.
> >
> > I suggest you call pm_runtime_force_suspend(), instead of
> > pm_runtime_disable(), to make sure that gets done.
>
> The pm_runtime_disable() performs the final synchronization, please see [1].
>
> [1]
> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412

pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
cancel_work_sync() if dev->power.request_pending has been set.

If the work that was punted to the pm_wq in rpm_idle() has not been
started yet, we end up just canceling it. In other words, there are no
guarantees it runs to completion.

Moreover, use space may have bumped the usage count via sysfs for the
device (pm_runtime_forbid()) to keep the device runtime resumed.

>
> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
> must have the corresponding 'disable'. Hence there is no problem here.

pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
should be fine. No?

Kind regards
Uffe
Dmitry Osipenko Oct. 1, 2021, 7 p.m. UTC | #4
01.10.2021 17:55, Ulf Hansson пишет:
> On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 01.10.2021 16:39, Ulf Hansson пишет:
>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>>
>>>> Add runtime power management and support generic power domains.
>>>>
>>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
>>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
>>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
>>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>> ---
>>>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
>>>
>>> [...]
>>>
>>>>  static int gr2d_remove(struct platform_device *pdev)
>>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
>>>>                 return err;
>>>>         }
>>>>
>>>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
>>>> +       pm_runtime_disable(&pdev->dev);
>>>
>>> There is no guarantee that the ->runtime_suspend() has been invoked
>>> here, which means that clock may be left prepared/enabled beyond this
>>> point.
>>>
>>> I suggest you call pm_runtime_force_suspend(), instead of
>>> pm_runtime_disable(), to make sure that gets done.
>>
>> The pm_runtime_disable() performs the final synchronization, please see [1].
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412
> 
> pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
> cancel_work_sync() if dev->power.request_pending has been set.
> 
> If the work that was punted to the pm_wq in rpm_idle() has not been
> started yet, we end up just canceling it. In other words, there are no
> guarantees it runs to completion.

You're right. Although, in a case of this particular patch, the syncing
is actually implicitly done by pm_runtime_dont_use_autosuspend().

But for drivers which don't use auto-suspend, there is no sync. This
looks like a disaster, it's a very common pattern for drivers to
'put+disable'.

> Moreover, use space may have bumped the usage count via sysfs for the
> device (pm_runtime_forbid()) to keep the device runtime resumed.

Right, this is also a disaster in a case of driver removal.

>> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
>> must have the corresponding 'disable'. Hence there is no problem here.
> 
> pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
> should be fine. No?

[adding Rafael]

Rafael, could you please explain how drivers are supposed to properly
suspend and disable RPM to cut off power and reset state that was
altered by the driver's resume callback? What we're missing? Is Ulf's
suggestion acceptable?

The RPM state of a device is getting reset on driver's removal, hence
all refcounts that were bumped by the rpm-resume callback of the device
driver will be screwed up if device is kept resumed after removal. I
just verified that it's true in practice.
Ulf Hansson Oct. 4, 2021, 11:01 a.m. UTC | #5
On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 17:55, Ulf Hansson пишет:
> > On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> 01.10.2021 16:39, Ulf Hansson пишет:
> >>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>
> >>>> Add runtime power management and support generic power domains.
> >>>>
> >>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> >>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> >>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> >>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> >>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >>>> ---
> >>>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> >>>
> >>> [...]
> >>>
> >>>>  static int gr2d_remove(struct platform_device *pdev)
> >>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
> >>>>                 return err;
> >>>>         }
> >>>>
> >>>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> >>>> +       pm_runtime_disable(&pdev->dev);
> >>>
> >>> There is no guarantee that the ->runtime_suspend() has been invoked
> >>> here, which means that clock may be left prepared/enabled beyond this
> >>> point.
> >>>
> >>> I suggest you call pm_runtime_force_suspend(), instead of
> >>> pm_runtime_disable(), to make sure that gets done.
> >>
> >> The pm_runtime_disable() performs the final synchronization, please see [1].
> >>
> >> [1]
> >> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412
> >
> > pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
> > cancel_work_sync() if dev->power.request_pending has been set.
> >
> > If the work that was punted to the pm_wq in rpm_idle() has not been
> > started yet, we end up just canceling it. In other words, there are no
> > guarantees it runs to completion.
>
> You're right. Although, in a case of this particular patch, the syncing
> is actually implicitly done by pm_runtime_dont_use_autosuspend().
>
> But for drivers which don't use auto-suspend, there is no sync. This
> looks like a disaster, it's a very common pattern for drivers to
> 'put+disable'.
>
> > Moreover, use space may have bumped the usage count via sysfs for the
> > device (pm_runtime_forbid()) to keep the device runtime resumed.
>
> Right, this is also a disaster in a case of driver removal.
>
> >> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
> >> must have the corresponding 'disable'. Hence there is no problem here.
> >
> > pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
> > should be fine. No?
>
> [adding Rafael]
>
> Rafael, could you please explain how drivers are supposed to properly
> suspend and disable RPM to cut off power and reset state that was
> altered by the driver's resume callback? What we're missing? Is Ulf's
> suggestion acceptable?
>
> The RPM state of a device is getting reset on driver's removal, hence
> all refcounts that were bumped by the rpm-resume callback of the device
> driver will be screwed up if device is kept resumed after removal. I
> just verified that it's true in practice.

Note that, what makes the Tegra drivers a bit special is that they are
always built with CONFIG_PM being set (selected from the "SoC"
Kconfig).

Therefore, pm_runtime_force_suspend() can work for some of these
cases. Using this, would potentially avoid the driver from having to
runtime resume the device in ->remove(), according to the below
generic sequence, which is used in many drivers.

pm_runtime_get_sync()
clk_disable_unprepare() (+ additional things to turn off the device)
pm_runtime_disable()
pm_runtime_put_noidle()

Kind regards
Uffe
Dmitry Osipenko Oct. 4, 2021, 3:57 p.m. UTC | #6
04.10.2021 14:01, Ulf Hansson пишет:
> On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 01.10.2021 17:55, Ulf Hansson пишет:
>>> On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>>
>>>> 01.10.2021 16:39, Ulf Hansson пишет:
>>>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>>>>
>>>>>> Add runtime power management and support generic power domains.
>>>>>>
>>>>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
>>>>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
>>>>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
>>>>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
>>>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
>>>>>
>>>>> [...]
>>>>>
>>>>>>  static int gr2d_remove(struct platform_device *pdev)
>>>>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
>>>>>>                 return err;
>>>>>>         }
>>>>>>
>>>>>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
>>>>>> +       pm_runtime_disable(&pdev->dev);
>>>>>
>>>>> There is no guarantee that the ->runtime_suspend() has been invoked
>>>>> here, which means that clock may be left prepared/enabled beyond this
>>>>> point.
>>>>>
>>>>> I suggest you call pm_runtime_force_suspend(), instead of
>>>>> pm_runtime_disable(), to make sure that gets done.
>>>>
>>>> The pm_runtime_disable() performs the final synchronization, please see [1].
>>>>
>>>> [1]
>>>> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412
>>>
>>> pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
>>> cancel_work_sync() if dev->power.request_pending has been set.
>>>
>>> If the work that was punted to the pm_wq in rpm_idle() has not been
>>> started yet, we end up just canceling it. In other words, there are no
>>> guarantees it runs to completion.
>>
>> You're right. Although, in a case of this particular patch, the syncing
>> is actually implicitly done by pm_runtime_dont_use_autosuspend().
>>
>> But for drivers which don't use auto-suspend, there is no sync. This
>> looks like a disaster, it's a very common pattern for drivers to
>> 'put+disable'.
>>
>>> Moreover, use space may have bumped the usage count via sysfs for the
>>> device (pm_runtime_forbid()) to keep the device runtime resumed.
>>
>> Right, this is also a disaster in a case of driver removal.
>>
>>>> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
>>>> must have the corresponding 'disable'. Hence there is no problem here.
>>>
>>> pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
>>> should be fine. No?
>>
>> [adding Rafael]
>>
>> Rafael, could you please explain how drivers are supposed to properly
>> suspend and disable RPM to cut off power and reset state that was
>> altered by the driver's resume callback? What we're missing? Is Ulf's
>> suggestion acceptable?
>>
>> The RPM state of a device is getting reset on driver's removal, hence
>> all refcounts that were bumped by the rpm-resume callback of the device
>> driver will be screwed up if device is kept resumed after removal. I
>> just verified that it's true in practice.
> 
> Note that, what makes the Tegra drivers a bit special is that they are
> always built with CONFIG_PM being set (selected from the "SoC"
> Kconfig).
> 
> Therefore, pm_runtime_force_suspend() can work for some of these
> cases. Using this, would potentially avoid the driver from having to
> runtime resume the device in ->remove(), according to the below
> generic sequence, which is used in many drivers.
> 
> pm_runtime_get_sync()
> clk_disable_unprepare() (+ additional things to turn off the device)
> pm_runtime_disable()
> pm_runtime_put_noidle()

It's not a problem to change this patchset. The problem is that if
you'll grep mainline for 'pm_runtime_disable', you will find that there
are a lot of drivers in a potential trouble.

I'm proposing that we should change pm_runtime_disable() to perform the
syncing with this oneliner:

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index ec94049442b9..5c9f28165824 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -1380,6 +1380,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_barrier);
  */
 void __pm_runtime_disable(struct device *dev, bool check_resume)
 {
+	flush_work(&dev->power.work);
+
 	spin_lock_irq(&dev->power.lock);

 	if (dev->power.disable_depth > 0) {

Objections?

The sysfs rpm-forbid is a separate problem and it's less troublesome
since it requires root privileges. It's also not something that
userspace touches casually. For now I don't know what could be done
about it.
Ulf Hansson Oct. 5, 2021, 8:45 a.m. UTC | #7
On Mon, 4 Oct 2021 at 17:57, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 04.10.2021 14:01, Ulf Hansson пишет:
> > On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> 01.10.2021 17:55, Ulf Hansson пишет:
> >>> On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>
> >>>> 01.10.2021 16:39, Ulf Hansson пишет:
> >>>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>>>
> >>>>>> Add runtime power management and support generic power domains.
> >>>>>>
> >>>>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> >>>>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> >>>>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> >>>>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> >>>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >>>>>> ---
> >>>>>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>>  static int gr2d_remove(struct platform_device *pdev)
> >>>>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
> >>>>>>                 return err;
> >>>>>>         }
> >>>>>>
> >>>>>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> >>>>>> +       pm_runtime_disable(&pdev->dev);
> >>>>>
> >>>>> There is no guarantee that the ->runtime_suspend() has been invoked
> >>>>> here, which means that clock may be left prepared/enabled beyond this
> >>>>> point.
> >>>>>
> >>>>> I suggest you call pm_runtime_force_suspend(), instead of
> >>>>> pm_runtime_disable(), to make sure that gets done.
> >>>>
> >>>> The pm_runtime_disable() performs the final synchronization, please see [1].
> >>>>
> >>>> [1]
> >>>> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412
> >>>
> >>> pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
> >>> cancel_work_sync() if dev->power.request_pending has been set.
> >>>
> >>> If the work that was punted to the pm_wq in rpm_idle() has not been
> >>> started yet, we end up just canceling it. In other words, there are no
> >>> guarantees it runs to completion.
> >>
> >> You're right. Although, in a case of this particular patch, the syncing
> >> is actually implicitly done by pm_runtime_dont_use_autosuspend().
> >>
> >> But for drivers which don't use auto-suspend, there is no sync. This
> >> looks like a disaster, it's a very common pattern for drivers to
> >> 'put+disable'.
> >>
> >>> Moreover, use space may have bumped the usage count via sysfs for the
> >>> device (pm_runtime_forbid()) to keep the device runtime resumed.
> >>
> >> Right, this is also a disaster in a case of driver removal.
> >>
> >>>> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
> >>>> must have the corresponding 'disable'. Hence there is no problem here.
> >>>
> >>> pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
> >>> should be fine. No?
> >>
> >> [adding Rafael]
> >>
> >> Rafael, could you please explain how drivers are supposed to properly
> >> suspend and disable RPM to cut off power and reset state that was
> >> altered by the driver's resume callback? What we're missing? Is Ulf's
> >> suggestion acceptable?
> >>
> >> The RPM state of a device is getting reset on driver's removal, hence
> >> all refcounts that were bumped by the rpm-resume callback of the device
> >> driver will be screwed up if device is kept resumed after removal. I
> >> just verified that it's true in practice.
> >
> > Note that, what makes the Tegra drivers a bit special is that they are
> > always built with CONFIG_PM being set (selected from the "SoC"
> > Kconfig).
> >
> > Therefore, pm_runtime_force_suspend() can work for some of these
> > cases. Using this, would potentially avoid the driver from having to
> > runtime resume the device in ->remove(), according to the below
> > generic sequence, which is used in many drivers.
> >
> > pm_runtime_get_sync()
> > clk_disable_unprepare() (+ additional things to turn off the device)
> > pm_runtime_disable()
> > pm_runtime_put_noidle()
>
> It's not a problem to change this patchset. The problem is that if
> you'll grep mainline for 'pm_runtime_disable', you will find that there
> are a lot of drivers in a potential trouble.

Let's start by fixing this patchset, please - then we can consider
what to do with the other cases separately.

>
> I'm proposing that we should change pm_runtime_disable() to perform the
> syncing with this oneliner:
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index ec94049442b9..5c9f28165824 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -1380,6 +1380,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_barrier);
>   */
>  void __pm_runtime_disable(struct device *dev, bool check_resume)
>  {
> +       flush_work(&dev->power.work);
> +

What about the latency this may introduce? I am not sure that is
acceptable here!?

>         spin_lock_irq(&dev->power.lock);
>
>         if (dev->power.disable_depth > 0) {
>
> Objections?
>
> The sysfs rpm-forbid is a separate problem and it's less troublesome
> since it requires root privileges. It's also not something that
> userspace touches casually. For now I don't know what could be done
> about it.

As I said, the common method to address this problem is to run the
following sequence:

pm_runtime_get_sync()
"power off the device"
pm_runtime_disable()
pm_runtime_put_noidle()

This works even if user space, via sysfs, has triggered a call to
pm_runtime_forbid(). Or doesn't it?

If you don't like it, pm_runtime_force_suspend() should work too, at
least for your cases, I believe.

Kind regards
Uffe
Dmitry Osipenko Oct. 5, 2021, 5:16 p.m. UTC | #8
...
>> It's not a problem to change this patchset. The problem is that if
>> you'll grep mainline for 'pm_runtime_disable', you will find that there
>> are a lot of drivers in a potential trouble.
> 
> Let's start by fixing this patchset, please - then we can consider
> what to do with the other cases separately.

Yeah, should be better to discuss it separately.

...
>>  void __pm_runtime_disable(struct device *dev, bool check_resume)
>>  {
>> +       flush_work(&dev->power.work);
>> +
> 
> What about the latency this may introduce? I am not sure that is
> acceptable here!?

I'm not aware about any code which relies on the original 'cancelling'
behaviour, perhaps Rafael should have more insight.

...
>> The sysfs rpm-forbid is a separate problem and it's less troublesome
>> since it requires root privileges. It's also not something that
>> userspace touches casually. For now I don't know what could be done
>> about it.
> 
> As I said, the common method to address this problem is to run the
> following sequence:
> 
> pm_runtime_get_sync()
> "power off the device"
> pm_runtime_disable()
> pm_runtime_put_noidle()
> 
> This works even if user space, via sysfs, has triggered a call to
> pm_runtime_forbid(). Or doesn't it?
> 
> If you don't like it, pm_runtime_force_suspend() should work too, at
> least for your cases, I believe.

I'll update the patches, thank you.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index de288cba3905..13df8f118f75 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -7,11 +7,21 @@ 
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/of_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/reset.h>
+
+#include <soc/tegra/common.h>
 
 #include "drm.h"
 #include "gem.h"
 #include "gr2d.h"
 
+enum {
+	RST_MC,
+	RST_GR2D,
+	RST_GR2D_MAX,
+};
+
 struct gr2d_soc {
 	unsigned int version;
 };
@@ -21,6 +31,9 @@  struct gr2d {
 	struct host1x_channel *channel;
 	struct clk *clk;
 
+	struct reset_control_bulk_data resets[RST_GR2D_MAX];
+	unsigned int nresets;
+
 	const struct gr2d_soc *soc;
 
 	DECLARE_BITMAP(addr_regs, GR2D_NUM_REGS);
@@ -101,16 +114,24 @@  static int gr2d_open_channel(struct tegra_drm_client *client,
 			     struct tegra_drm_context *context)
 {
 	struct gr2d *gr2d = to_gr2d(client);
+	int err;
 
 	context->channel = host1x_channel_get(gr2d->channel);
 	if (!context->channel)
 		return -ENOMEM;
 
+	err = pm_runtime_resume_and_get(client->base.dev);
+	if (err) {
+		host1x_channel_put(context->channel);
+		return err;
+	}
+
 	return 0;
 }
 
 static void gr2d_close_channel(struct tegra_drm_context *context)
 {
+	pm_runtime_put_sync(context->client->base.dev);
 	host1x_channel_put(context->channel);
 }
 
@@ -190,6 +211,27 @@  static const u32 gr2d_addr_regs[] = {
 	GR2D_VA_BASE_ADDR_SB,
 };
 
+static int gr2d_get_resets(struct device *dev, struct gr2d *gr2d)
+{
+	int err;
+
+	gr2d->resets[RST_MC].id = "mc";
+	gr2d->resets[RST_GR2D].id = "2d";
+	gr2d->nresets = RST_GR2D_MAX;
+
+	err = devm_reset_control_bulk_get_optional_exclusive_released(
+				dev, gr2d->nresets, gr2d->resets);
+	if (err) {
+		dev_err(dev, "failed to get reset: %d\n", err);
+		return err;
+	}
+
+	if (WARN_ON(!gr2d->resets[RST_GR2D].rstc))
+		return -ENOENT;
+
+	return 0;
+}
+
 static int gr2d_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -202,6 +244,8 @@  static int gr2d_probe(struct platform_device *pdev)
 	if (!gr2d)
 		return -ENOMEM;
 
+	platform_set_drvdata(pdev, gr2d);
+
 	gr2d->soc = of_device_get_match_data(dev);
 
 	syncpts = devm_kzalloc(dev, sizeof(*syncpts), GFP_KERNEL);
@@ -214,11 +258,9 @@  static int gr2d_probe(struct platform_device *pdev)
 		return PTR_ERR(gr2d->clk);
 	}
 
-	err = clk_prepare_enable(gr2d->clk);
-	if (err) {
-		dev_err(dev, "cannot turn on clock\n");
+	err = gr2d_get_resets(dev, gr2d);
+	if (err)
 		return err;
-	}
 
 	INIT_LIST_HEAD(&gr2d->client.base.list);
 	gr2d->client.base.ops = &gr2d_client_ops;
@@ -231,20 +273,31 @@  static int gr2d_probe(struct platform_device *pdev)
 	gr2d->client.version = gr2d->soc->version;
 	gr2d->client.ops = &gr2d_ops;
 
+	pm_runtime_enable(dev);
+	pm_runtime_use_autosuspend(dev);
+	pm_runtime_set_autosuspend_delay(dev, 200);
+
+	err = devm_tegra_core_dev_init_opp_table_common(dev);
+	if (err)
+		goto disable_rpm;
+
 	err = host1x_client_register(&gr2d->client.base);
 	if (err < 0) {
 		dev_err(dev, "failed to register host1x client: %d\n", err);
-		clk_disable_unprepare(gr2d->clk);
-		return err;
+		goto disable_rpm;
 	}
 
 	/* initialize address register map */
 	for (i = 0; i < ARRAY_SIZE(gr2d_addr_regs); i++)
 		set_bit(gr2d_addr_regs[i], gr2d->addr_regs);
 
-	platform_set_drvdata(pdev, gr2d);
-
 	return 0;
+
+disable_rpm:
+	pm_runtime_dont_use_autosuspend(dev);
+	pm_runtime_disable(dev);
+
+	return err;
 }
 
 static int gr2d_remove(struct platform_device *pdev)
@@ -259,15 +312,101 @@  static int gr2d_remove(struct platform_device *pdev)
 		return err;
 	}
 
+	pm_runtime_dont_use_autosuspend(&pdev->dev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+static int __maybe_unused gr2d_runtime_suspend(struct device *dev)
+{
+	struct gr2d *gr2d = dev_get_drvdata(dev);
+	int err;
+
+	host1x_channel_stop(gr2d->channel);
+	reset_control_bulk_release(gr2d->nresets, gr2d->resets);
+
+	/*
+	 * GR2D module shouldn't be reset while hardware is idling, otherwise
+	 * host1x's cmdproc will stuck on trying to access any G2 register
+	 * after reset. GR2D module could be either hot-reset or reset after
+	 * power-gating of the HEG partition. Hence we will put in reset only
+	 * the memory client part of the module, the HEG GENPD will take care
+	 * of resetting GR2D module across power-gating.
+	 *
+	 * On Tegra20 there is no HEG partition, but it's okay to have
+	 * undetermined h/w state since userspace is expected to reprogram
+	 * the state on each job submission anyways.
+	 */
+	err = reset_control_acquire(gr2d->resets[RST_MC].rstc);
+	if (err) {
+		dev_err(dev, "failed to acquire MC reset: %d\n", err);
+		goto acquire_reset;
+	}
+
+	err = reset_control_assert(gr2d->resets[RST_MC].rstc);
+	reset_control_release(gr2d->resets[RST_MC].rstc);
+	if (err) {
+		dev_err(dev, "failed to assert MC reset: %d\n", err);
+		goto acquire_reset;
+	}
+
 	clk_disable_unprepare(gr2d->clk);
 
 	return 0;
+
+acquire_reset:
+	reset_control_bulk_acquire(gr2d->nresets, gr2d->resets);
+	reset_control_bulk_deassert(gr2d->nresets, gr2d->resets);
+
+	return err;
 }
 
+static int __maybe_unused gr2d_runtime_resume(struct device *dev)
+{
+	struct gr2d *gr2d = dev_get_drvdata(dev);
+	int err;
+
+	err = reset_control_bulk_acquire(gr2d->nresets, gr2d->resets);
+	if (err) {
+		dev_err(dev, "failed to acquire reset: %d\n", err);
+		return err;
+	}
+
+	err = clk_prepare_enable(gr2d->clk);
+	if (err) {
+		dev_err(dev, "failed to enable clock: %d\n", err);
+		goto release_reset;
+	}
+
+	/* this is a reset array which deasserts both 2D MC and 2D itself */
+	err = reset_control_bulk_deassert(gr2d->nresets, gr2d->resets);
+	if (err) {
+		dev_err(dev, "failed to deassert reset: %d\n", err);
+		goto disable_clk;
+	}
+
+	return 0;
+
+disable_clk:
+	clk_disable_unprepare(gr2d->clk);
+release_reset:
+	reset_control_bulk_release(gr2d->nresets, gr2d->resets);
+
+	return err;
+}
+
+static const struct dev_pm_ops tegra_gr2d_pm = {
+	SET_RUNTIME_PM_OPS(gr2d_runtime_suspend, gr2d_runtime_resume, NULL)
+	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+				pm_runtime_force_resume)
+};
+
 struct platform_driver tegra_gr2d_driver = {
 	.driver = {
 		.name = "tegra-gr2d",
 		.of_match_table = gr2d_match,
+		.pm = &tegra_gr2d_pm,
 	},
 	.probe = gr2d_probe,
 	.remove = gr2d_remove,