diff mbox series

[5/7] dt-bindings: remoteproc: qcom,pas: Add hwlocks

Message ID 20240516-hwspinlock-bust-v1-5-47a90a859238@quicinc.com
State Changes Requested
Headers show
Series Add support for hwspinlock bust | expand

Checks

Context Check Description
robh/checkpatch success
robh/patch-applied success
robh/dtbs-check warning build log
robh/dt-meta-schema success

Commit Message

Chris Lew May 16, 2024, 10:58 p.m. UTC
Add hwlocks property to describe the hwspinlock that remoteproc can try
to bust on behalf of the remoteproc's smem.

Signed-off-by: Chris Lew <quic_clew@quicinc.com>
---
 Documentation/devicetree/bindings/remoteproc/qcom,pas-common.yaml | 3 +++
 1 file changed, 3 insertions(+)

Comments

Krzysztof Kozlowski May 19, 2024, 5:36 p.m. UTC | #1
On 17/05/2024 00:58, Chris Lew wrote:
> Add hwlocks property to describe the hwspinlock that remoteproc can try
> to bust on behalf of the remoteproc's smem.

Sorry, as you wrote, the lock is part of smem, not here. Drivers do not
crash, so if your crashes as you imply in the cover letter, then first
fix the driver.

Best regards,
Krzysztof
Chris Lew May 21, 2024, 4:08 a.m. UTC | #2
On 5/19/2024 10:36 AM, Krzysztof Kozlowski wrote:
> On 17/05/2024 00:58, Chris Lew wrote:
>> Add hwlocks property to describe the hwspinlock that remoteproc can try
>> to bust on behalf of the remoteproc's smem.
> 
> Sorry, as you wrote, the lock is part of smem, not here. Drivers do not
> crash, so if your crashes as you imply in the cover letter, then first
> fix the driver.
>

Hi Krzysztof,

Sorry for the confusion, I dont think I meant that the smem driver will 
ever crash. The referred to crash in the cover letter is a crash in the 
firmware running on the remoteproc. The remoteproc could crash for any 
unexpected reason, related or unrelated to smem, while holding the tcsr 
mutex. I want to ensure that all resources that a remoteproc might be 
using are released as part of remoteproc stop.

The SMEM driver manages the lock/unlock operations on the tcsr mutex 
from the Linux CPU's perspective. This case is for cleaning up from the 
remote side's perspective.

In this case it's the hwspinlock used to synchronize SMEM, but it's 
conceivable that firmware running on the remoteproc has additional locks 
that need to be busted in order for the system to continue executing 
until the firmware is reinitialized.

We did consider tying this to the SMEM instance, but the entitiy 
relating to firmware is the remoteproc instance.

> Best regards,
> Krzysztof
>
Krzysztof Kozlowski May 21, 2024, 7:36 a.m. UTC | #3
On 21/05/2024 06:08, Chris Lew wrote:
> 
> 
> On 5/19/2024 10:36 AM, Krzysztof Kozlowski wrote:
>> On 17/05/2024 00:58, Chris Lew wrote:
>>> Add hwlocks property to describe the hwspinlock that remoteproc can try
>>> to bust on behalf of the remoteproc's smem.
>>
>> Sorry, as you wrote, the lock is part of smem, not here. Drivers do not
>> crash, so if your crashes as you imply in the cover letter, then first
>> fix the driver.
>>
> 
> Hi Krzysztof,
> 
> Sorry for the confusion, I dont think I meant that the smem driver will 
> ever crash. The referred to crash in the cover letter is a crash in the 
> firmware running on the remoteproc. The remoteproc could crash for any 
> unexpected reason, related or unrelated to smem, while holding the tcsr 
> mutex. I want to ensure that all resources that a remoteproc might be 
> using are released as part of remoteproc stop.
> 
> The SMEM driver manages the lock/unlock operations on the tcsr mutex 
> from the Linux CPU's perspective. This case is for cleaning up from the 
> remote side's perspective.
> 
> In this case it's the hwspinlock used to synchronize SMEM, but it's 
> conceivable that firmware running on the remoteproc has additional locks 
> that need to be busted in order for the system to continue executing 
> until the firmware is reinitialized.
> 
> We did consider tying this to the SMEM instance, but the entitiy 
> relating to firmware is the remoteproc instance.

I still do not understand why you have to add hwlock to remoteproc, even
though it is not directly used. Your driver problem looks like lack of
proper driver architecture - you want to control the locks not from the
layer took the lock, but one layer up. Sorry, no, fix the driver
architecture.

Best regards,
Krzysztof
Bjorn Andersson May 21, 2024, 7:17 p.m. UTC | #4
On Tue, May 21, 2024 at 09:36:03AM +0200, Krzysztof Kozlowski wrote:
> On 21/05/2024 06:08, Chris Lew wrote:
> > 
> > 
> > On 5/19/2024 10:36 AM, Krzysztof Kozlowski wrote:
> >> On 17/05/2024 00:58, Chris Lew wrote:
> >>> Add hwlocks property to describe the hwspinlock that remoteproc can try
> >>> to bust on behalf of the remoteproc's smem.
> >>
> >> Sorry, as you wrote, the lock is part of smem, not here. Drivers do not
> >> crash, so if your crashes as you imply in the cover letter, then first
> >> fix the driver.
> >>
> > 
> > Hi Krzysztof,
> > 
> > Sorry for the confusion, I dont think I meant that the smem driver will 
> > ever crash. The referred to crash in the cover letter is a crash in the 
> > firmware running on the remoteproc. The remoteproc could crash for any 
> > unexpected reason, related or unrelated to smem, while holding the tcsr 
> > mutex. I want to ensure that all resources that a remoteproc might be 
> > using are released as part of remoteproc stop.
> > 
> > The SMEM driver manages the lock/unlock operations on the tcsr mutex 
> > from the Linux CPU's perspective. This case is for cleaning up from the 
> > remote side's perspective.
> > 
> > In this case it's the hwspinlock used to synchronize SMEM, but it's 
> > conceivable that firmware running on the remoteproc has additional locks 
> > that need to be busted in order for the system to continue executing 
> > until the firmware is reinitialized.
> > 
> > We did consider tying this to the SMEM instance, but the entitiy 
> > relating to firmware is the remoteproc instance.
> 
> I still do not understand why you have to add hwlock to remoteproc, even
> though it is not directly used. Your driver problem looks like lack of
> proper driver architecture - you want to control the locks not from the
> layer took the lock, but one layer up. Sorry, no, fix the driver
> architecture.
> 

No, it is the firmware's reference to the lock that is represented in
the remoteproc node, while SMEM deals with Linux's reference to the lock.

This reference would be used to release the lock - on behalf of the
firmware - in the event that the firmware held it when it
stopped/crashed.

Regards,
Bjorn
Krzysztof Kozlowski May 22, 2024, 7:26 a.m. UTC | #5
On 21/05/2024 21:17, Bjorn Andersson wrote:
>>>
>>> Hi Krzysztof,
>>>
>>> Sorry for the confusion, I dont think I meant that the smem driver will 
>>> ever crash. The referred to crash in the cover letter is a crash in the 
>>> firmware running on the remoteproc. The remoteproc could crash for any 
>>> unexpected reason, related or unrelated to smem, while holding the tcsr 
>>> mutex. I want to ensure that all resources that a remoteproc might be 
>>> using are released as part of remoteproc stop.
>>>
>>> The SMEM driver manages the lock/unlock operations on the tcsr mutex 
>>> from the Linux CPU's perspective. This case is for cleaning up from the 
>>> remote side's perspective.
>>>
>>> In this case it's the hwspinlock used to synchronize SMEM, but it's 
>>> conceivable that firmware running on the remoteproc has additional locks 
>>> that need to be busted in order for the system to continue executing 
>>> until the firmware is reinitialized.
>>>
>>> We did consider tying this to the SMEM instance, but the entitiy 
>>> relating to firmware is the remoteproc instance.
>>
>> I still do not understand why you have to add hwlock to remoteproc, even
>> though it is not directly used. Your driver problem looks like lack of
>> proper driver architecture - you want to control the locks not from the
>> layer took the lock, but one layer up. Sorry, no, fix the driver
>> architecture.
>>
> 
> No, it is the firmware's reference to the lock that is represented in
> the remoteproc node, while SMEM deals with Linux's reference to the lock.
> 
> This reference would be used to release the lock - on behalf of the
> firmware - in the event that the firmware held it when it
> stopped/crashed.

I understood, but the remoteproc driver did not acquire the hardware
lock. It was taken by smem, if I got it correctly, so you should poke
smem to bust the spinlock.

The hwlock is not a property of remote proc, because remote proc does
not care, right? Other device cares... and now for every smem user you
will add new binding property?

No, you are adding a binding based on your driver solution.

Best regards,
Krzysztof
Bjorn Andersson May 22, 2024, 5:50 p.m. UTC | #6
On Wed, May 22, 2024 at 09:26:00AM +0200, Krzysztof Kozlowski wrote:
> On 21/05/2024 21:17, Bjorn Andersson wrote:
> >>>
> >>> Hi Krzysztof,
> >>>
> >>> Sorry for the confusion, I dont think I meant that the smem driver will 
> >>> ever crash. The referred to crash in the cover letter is a crash in the 
> >>> firmware running on the remoteproc. The remoteproc could crash for any 
> >>> unexpected reason, related or unrelated to smem, while holding the tcsr 
> >>> mutex. I want to ensure that all resources that a remoteproc might be 
> >>> using are released as part of remoteproc stop.
> >>>
> >>> The SMEM driver manages the lock/unlock operations on the tcsr mutex 
> >>> from the Linux CPU's perspective. This case is for cleaning up from the 
> >>> remote side's perspective.
> >>>
> >>> In this case it's the hwspinlock used to synchronize SMEM, but it's 
> >>> conceivable that firmware running on the remoteproc has additional locks 
> >>> that need to be busted in order for the system to continue executing 
> >>> until the firmware is reinitialized.
> >>>
> >>> We did consider tying this to the SMEM instance, but the entitiy 
> >>> relating to firmware is the remoteproc instance.
> >>
> >> I still do not understand why you have to add hwlock to remoteproc, even
> >> though it is not directly used. Your driver problem looks like lack of
> >> proper driver architecture - you want to control the locks not from the
> >> layer took the lock, but one layer up. Sorry, no, fix the driver
> >> architecture.
> >>
> > 
> > No, it is the firmware's reference to the lock that is represented in
> > the remoteproc node, while SMEM deals with Linux's reference to the lock.
> > 
> > This reference would be used to release the lock - on behalf of the
> > firmware - in the event that the firmware held it when it
> > stopped/crashed.
> 
> I understood, but the remoteproc driver did not acquire the hardware
> lock. It was taken by smem, if I got it correctly, so you should poke
> smem to bust the spinlock.
> 

The remoteproc instance is the closest representation of the entity that
took the lock (i.e. the firmware). SMEM here is just another consumer of
the same lock.

> The hwlock is not a property of remote proc, because remote proc does
> not care, right? Other device cares... and now for every smem user you
> will add new binding property?
> 

Right, the issue seen relates to SMEM, because the remote processor (not
the remoteproc driver) took the lock.

> No, you are adding a binding based on your driver solution.

Similar to how hwspinlocks are used in other platforms (e.g. TI) the
firmware could take multiple locks, e.g. to synchronize access to other
shared memory mechanism (i.e. not SMEM). While I am not aware of such
use case today, my expectation was that in such case we just list all
the hwlocks related to the firmware and bust those from the remoteproc
instance.

Having to export APIs from each one of such drivers and make the
remoteproc identify the relevant instances and call those APIs does
indeed seem inconvenient.
SMEM is special here because it's singleton, but this would not
necessarily be true for other cases.

Regards,
Bjorn
Krzysztof Kozlowski May 23, 2024, 6:15 a.m. UTC | #7
On 22/05/2024 19:50, Bjorn Andersson wrote:
>>>>>
>>>>> We did consider tying this to the SMEM instance, but the entitiy 
>>>>> relating to firmware is the remoteproc instance.
>>>>
>>>> I still do not understand why you have to add hwlock to remoteproc, even
>>>> though it is not directly used. Your driver problem looks like lack of
>>>> proper driver architecture - you want to control the locks not from the
>>>> layer took the lock, but one layer up. Sorry, no, fix the driver
>>>> architecture.
>>>>
>>>
>>> No, it is the firmware's reference to the lock that is represented in
>>> the remoteproc node, while SMEM deals with Linux's reference to the lock.
>>>
>>> This reference would be used to release the lock - on behalf of the
>>> firmware - in the event that the firmware held it when it
>>> stopped/crashed.
>>
>> I understood, but the remoteproc driver did not acquire the hardware
>> lock. It was taken by smem, if I got it correctly, so you should poke
>> smem to bust the spinlock.
>>
> 
> The remoteproc instance is the closest representation of the entity that
> took the lock (i.e. the firmware). SMEM here is just another consumer of
> the same lock.
> 
>> The hwlock is not a property of remote proc, because remote proc does
>> not care, right? Other device cares... and now for every smem user you
>> will add new binding property?
>>
> 
> Right, the issue seen relates to SMEM, because the remote processor (not
> the remoteproc driver) took the lock.
> 
>> No, you are adding a binding based on your driver solution.
> 
> Similar to how hwspinlocks are used in other platforms (e.g. TI) the
> firmware could take multiple locks, e.g. to synchronize access to other
> shared memory mechanism (i.e. not SMEM). While I am not aware of such
> use case today, my expectation was that in such case we just list all
> the hwlocks related to the firmware and bust those from the remoteproc
> instance.
> 
> Having to export APIs from each one of such drivers and make the
> remoteproc identify the relevant instances and call those APIs does
> indeed seem inconvenient.
> SMEM is special here because it's singleton, but this would not
> necessarily be true for other cases.

I don't think that exporting such API is unreasonable, but quite
opposite - expected. The remote processor crashed, so the remoteproc
driver is supposed to call some sort of smem_cleanup() or
smem_cleanup_on_crash() and call would bust/release the lock. That way
lock handling is encapsulated entirely in one driver which already takes
and releases the lock.

Just like freeing any memory. remoteproc driver does not free other
driver's memory only because processor crashed.


Best regards,
Krzysztof
Bjorn Andersson May 24, 2024, 7:23 p.m. UTC | #8
On Thu, May 23, 2024 at 08:15:54AM +0200, Krzysztof Kozlowski wrote:
> On 22/05/2024 19:50, Bjorn Andersson wrote:
> >>>>>
> >>>>> We did consider tying this to the SMEM instance, but the entitiy 
> >>>>> relating to firmware is the remoteproc instance.
> >>>>
> >>>> I still do not understand why you have to add hwlock to remoteproc, even
> >>>> though it is not directly used. Your driver problem looks like lack of
> >>>> proper driver architecture - you want to control the locks not from the
> >>>> layer took the lock, but one layer up. Sorry, no, fix the driver
> >>>> architecture.
> >>>>
> >>>
> >>> No, it is the firmware's reference to the lock that is represented in
> >>> the remoteproc node, while SMEM deals with Linux's reference to the lock.
> >>>
> >>> This reference would be used to release the lock - on behalf of the
> >>> firmware - in the event that the firmware held it when it
> >>> stopped/crashed.
> >>
> >> I understood, but the remoteproc driver did not acquire the hardware
> >> lock. It was taken by smem, if I got it correctly, so you should poke
> >> smem to bust the spinlock.
> >>
> > 
> > The remoteproc instance is the closest representation of the entity that
> > took the lock (i.e. the firmware). SMEM here is just another consumer of
> > the same lock.
> > 
> >> The hwlock is not a property of remote proc, because remote proc does
> >> not care, right? Other device cares... and now for every smem user you
> >> will add new binding property?
> >>
> > 
> > Right, the issue seen relates to SMEM, because the remote processor (not
> > the remoteproc driver) took the lock.
> > 
> >> No, you are adding a binding based on your driver solution.
> > 
> > Similar to how hwspinlocks are used in other platforms (e.g. TI) the
> > firmware could take multiple locks, e.g. to synchronize access to other
> > shared memory mechanism (i.e. not SMEM). While I am not aware of such
> > use case today, my expectation was that in such case we just list all
> > the hwlocks related to the firmware and bust those from the remoteproc
> > instance.
> > 
> > Having to export APIs from each one of such drivers and make the
> > remoteproc identify the relevant instances and call those APIs does
> > indeed seem inconvenient.
> > SMEM is special here because it's singleton, but this would not
> > necessarily be true for other cases.
> 
> I don't think that exporting such API is unreasonable, but quite
> opposite - expected. The remote processor crashed, so the remoteproc
> driver is supposed to call some sort of smem_cleanup() or
> smem_cleanup_on_crash() and call would bust/release the lock. That way
> lock handling is encapsulated entirely in one driver which already takes
> and releases the lock.
> 

I don't agree.

SMEM does indeed acquire and release the same, shared, lock. But the
SMEM driver instance on our side is not involved in taking the lock for
the firmware.

There exist an equivalent SMEM driver instance in the firmware that
crashed and that's the thing that needs to be released.


We're also not tearing down, or cleaning up anything in our SMEM
instance. It is simply "when remoteproc id N died, check if N is holding
the lock and if so force a release of the lock - so that others can grab
it".

> Just like freeing any memory. remoteproc driver does not free other
> driver's memory only because processor crashed.
> 

That's a good comparison. Because when the firmware running on the
remote processor crashes, it is indeed the job of the remoteproc driver
to clean up the memory allocated to run the remote processor.

Regards,
Bjorn
Krzysztof Kozlowski May 25, 2024, 4:45 p.m. UTC | #9
On 24/05/2024 21:23, Bjorn Andersson wrote:
> On Thu, May 23, 2024 at 08:15:54AM +0200, Krzysztof Kozlowski wrote:
>> On 22/05/2024 19:50, Bjorn Andersson wrote:
>>>>>>>
>>>>>>> We did consider tying this to the SMEM instance, but the entitiy 
>>>>>>> relating to firmware is the remoteproc instance.
>>>>>>
>>>>>> I still do not understand why you have to add hwlock to remoteproc, even
>>>>>> though it is not directly used. Your driver problem looks like lack of
>>>>>> proper driver architecture - you want to control the locks not from the
>>>>>> layer took the lock, but one layer up. Sorry, no, fix the driver
>>>>>> architecture.
>>>>>>
>>>>>
>>>>> No, it is the firmware's reference to the lock that is represented in
>>>>> the remoteproc node, while SMEM deals with Linux's reference to the lock.
>>>>>
>>>>> This reference would be used to release the lock - on behalf of the
>>>>> firmware - in the event that the firmware held it when it
>>>>> stopped/crashed.
>>>>
>>>> I understood, but the remoteproc driver did not acquire the hardware
>>>> lock. It was taken by smem, if I got it correctly, so you should poke
>>>> smem to bust the spinlock.
>>>>
>>>
>>> The remoteproc instance is the closest representation of the entity that
>>> took the lock (i.e. the firmware). SMEM here is just another consumer of
>>> the same lock.
>>>
>>>> The hwlock is not a property of remote proc, because remote proc does
>>>> not care, right? Other device cares... and now for every smem user you
>>>> will add new binding property?
>>>>
>>>
>>> Right, the issue seen relates to SMEM, because the remote processor (not
>>> the remoteproc driver) took the lock.
>>>
>>>> No, you are adding a binding based on your driver solution.
>>>
>>> Similar to how hwspinlocks are used in other platforms (e.g. TI) the
>>> firmware could take multiple locks, e.g. to synchronize access to other
>>> shared memory mechanism (i.e. not SMEM). While I am not aware of such
>>> use case today, my expectation was that in such case we just list all
>>> the hwlocks related to the firmware and bust those from the remoteproc
>>> instance.
>>>
>>> Having to export APIs from each one of such drivers and make the
>>> remoteproc identify the relevant instances and call those APIs does
>>> indeed seem inconvenient.
>>> SMEM is special here because it's singleton, but this would not
>>> necessarily be true for other cases.
>>
>> I don't think that exporting such API is unreasonable, but quite
>> opposite - expected. The remote processor crashed, so the remoteproc
>> driver is supposed to call some sort of smem_cleanup() or
>> smem_cleanup_on_crash() and call would bust/release the lock. That way
>> lock handling is encapsulated entirely in one driver which already takes
>> and releases the lock.
>>
> 
> I don't agree.
> 
> SMEM does indeed acquire and release the same, shared, lock. But the
> SMEM driver instance on our side is not involved in taking the lock for
> the firmware.
> 
> There exist an equivalent SMEM driver instance in the firmware that
> crashed and that's the thing that needs to be released.

Then please include relevant explanation in the commit msg.

> 
> 
> We're also not tearing down, or cleaning up anything in our SMEM
> instance. It is simply "when remoteproc id N died, check if N is holding
> the lock and if so force a release of the lock - so that others can grab
> it".
> 
>> Just like freeing any memory. remoteproc driver does not free other
>> driver's memory only because processor crashed.
>>
> 
> That's a good comparison. Because when the firmware running on the
> remote processor crashes, it is indeed the job of the remoteproc driver
> to clean up the memory allocated to run the remote processor.


Best regards,
Krzysztof
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,pas-common.yaml b/Documentation/devicetree/bindings/remoteproc/qcom,pas-common.yaml
index 63a82e7a8bf8..483a8ff5f47e 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,pas-common.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,pas-common.yaml
@@ -77,6 +77,9 @@  properties:
       and devices related to the ADSP.
     unevaluatedProperties: false
 
+  hwlocks:
+    maxItems: 1
+
 required:
   - clocks
   - clock-names